diff --git a/.gitignore b/.gitignore index accfc66..84ca392 100644 --- a/.gitignore +++ b/.gitignore @@ -5,3 +5,5 @@ tfl_journey_client **/node_modules **/__pycache__ **/dist +server-rs/target +.task diff --git a/CLAUDE.md b/CLAUDE.md index 9a7302a..f78b5c9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,66 +4,74 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project Overview -Property Map is a full-stack geospatial web application that visualizes UK property price data aggregated by H3 hexagonal spatial indices. It combines Land Registry price data with postcode geolocation to create an interactive map for exploring property markets. +Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend. ## Commands -All commands use [Task](https://taskfile.dev) runner. Install with: `curl -1sLf 'https://dl.cloudsmith.io/public/task/task/setup.deb.sh' | sudo -E bash` +All commands use [Task](https://taskfile.dev) runner. ```bash -# Initial setup (downloads ~GB of data, runs pipeline) -task prepare +task prepare # Full setup: install deps, download data (~GB), run pipeline +task server # Rust backend on :8001 (cargo run --release) +task frontend # Webpack dev server on :3030 (proxies /api to :8001) -# Development (run in separate terminals) -task server # FastAPI backend on :8001 -task frontend # Webpack dev server on :3030 (proxies /api to :8001) - -# Code quality -task lint # Lint Python (ruff) + TypeScript (ESLint + Prettier) -task format # Auto-fix formatting -task typecheck # TypeScript type checking -task check # All checks (lint + typecheck + build) - -# Production -task build # Build frontend -task prod # Serve built frontend via FastAPI +task lint # Lint Python (ruff) + TypeScript (ESLint + Prettier) +task format # Auto-fix formatting (ruff + ESLint + Prettier) +task typecheck # TypeScript type checking +task check # All checks (lint + typecheck + build) +task test # Run Python tests (fuzzy join) +task build # Build frontend for production ``` +Python commands use `uv run`. Frontend commands use `npm run` from `frontend/`. + ## Architecture -``` -frontend/ React + TypeScript SPA (deck.gl/MapLibre for visualization) - src/App.tsx Main component with filters and map state - src/components/ Map.tsx (deck.gl H3HexagonLayer), Filters UI +### Data Pipeline (`pipeline/`) -server/ FastAPI backend - main.py App setup, CORS, static file mounting - routes/hexagons.py GET /api/hexagons - returns aggregated price data +Python + Polars. Orchestrated by `pipeline/run.py` which builds `data_sources/processed/wide.parquet`: -pipeline/ Data processing (Polars + H3) - config.py Central config (H3 resolutions 6-11, year/price ranges) - sources/ Postcode loading, property price joins - processors/ H3 aggregation (count, avg/median/min/max by cell+year) +1. **Download** (`pipeline/download/`) — Fetches raw data into `data_sources/`: + - `arcgis.py` — Postcode → lat/lon/LSOA mappings + - `price_paid.py` — Land Registry price-paid records + - `pois/` — OpenStreetMap POIs via osmium (PBF parsing) + - `deprivation_data.py` — English Indices of Deprivation 2025 +2. **Join** (`pipeline/epc_pp.py`) — Fuzzy-joins EPC certificates with price-paid by address within postcode buckets → `epc_pp.parquet` +3. **Widen** (`pipeline/run.py`) — Joins epc_pp with GPS coords, journey times, IoD scores, POI proximity counts, derives `price_per_sqm` and numeric `construction_age_band` +4. **Transform POIs** (`pipeline/download/pois/transform.py`) — Drops unwanted categories, remaps to friendly names + emoji → `filtered_uk_pois.parquet` -tfl_journey_client/ Generated TFL API client (local package) -``` +Shared utilities live in `pipeline/utils/` (haversine distance for both numpy and Polars expressions, fuzzy address matching). -## Data Flow +### Backend (`server-rs/`) -1. **Download**: Land Registry prices + ArcGIS postcode→lat/lon mappings → `data_sources/` -2. **Pipeline**: Join data, compute H3 indices, aggregate stats → `data_sources/processed/aggregates/*.parquet` -3. **Serve**: Load parquet files into memory, filter by bounds/year/price, return as GeoJSON-like response -4. **Visualize**: Frontend fetches on viewport change, renders hexagons colored by average price +Rust + Axum. Loads `wide.parquet` and `filtered_uk_pois.parquet` into memory at startup with precomputed H3 indices (resolutions 7–11) and grid-based spatial indices (0.01° cells). -## Tech Stack +**API endpoints:** +- `GET /api/features` — Numeric column metadata with histograms and percentiles +- `GET /api/hexagons` — H3 aggregates filtered by bounds, resolution, and feature min/max +- `GET /api/pois` — POIs by bounds with optional category filter (max 5000) +- `GET /api/poi-categories` — Available POI categories -- **Frontend**: React 18, TypeScript, Webpack, TailwindCSS, deck.gl, MapLibre GL -- **Backend**: Python 3.12, FastAPI, Polars, H3 -- **Package managers**: `uv` (Python), `npm` (frontend) +Also serves `frontend/dist/` as static fallback. + +### Frontend (`frontend/`) + +React 18 + TypeScript SPA. deck.gl `H3HexagonLayer` over MapLibre GL basemap. Debounces API calls (150ms) on viewport changes. TailwindCSS for styling. ## Key Implementation Details -- Backend caches dataframes in memory and uses LRU cache on queries -- Bounds rounded to 0.01° precision to improve cache hits -- Results capped at 50,000 hexagons per request (truncated flag in response) -- Frontend debounces API calls on map movement +- Bounds quantized to 0.01° to improve cache hits on both backend and frontend +- H3 hexagon results capped at 50,000 per request (truncated flag in response) +- POI proximity counting uses a spatial grid (0.05° cells, ~5km) to avoid O(n×m) distance checks +- Fuzzy address matching uses `thefuzz.token_sort_ratio` with numeric token compatibility checks, parallelized across postcode buckets +- The Rust server writes JSON via direct string buffer (avoids serde_json::Value allocations) +- POI transform validates exhaustive category coverage — pipeline fails if any OSM category is unmapped + +## Data Sources + +- **Land Registry** — Price Paid bulk download +- **EPC** — Energy Performance Certificates (domestic) +- **ArcGIS** — Postcode → GPS/LSOA lookup +- **OpenStreetMap** — POIs from Geofabrik Great Britain PBF +- **IoD 2025** — English Indices of Deprivation (LSOA-level scores) +- **TFL API** — Journey time calculations to configurable destinations diff --git a/main.py b/main.py deleted file mode 100644 index b4bdfc2..0000000 --- a/main.py +++ /dev/null @@ -1,6 +0,0 @@ -def main(): - print("Hello from property-map!") - - -if __name__ == "__main__": - main() diff --git a/pyproject.toml b/pyproject.toml index a0e6a9f..b214f71 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -26,6 +26,7 @@ dependencies = [ "matplotlib>=3.10.8", "thefuzz>=0.22.1", "python-levenshtein>=0.27.3", + "scipy>=1.17.0", ] [tool.uv] diff --git a/uv.lock b/uv.lock index c661144..7f06e32 100644 --- a/uv.lock +++ b/uv.lock @@ -1588,6 +1588,7 @@ dependencies = [ { name = "pyarrow", marker = "python_full_version < '3.14' and sys_platform == 'linux'" }, { name = "python-dateutil", marker = "python_full_version < '3.14' and sys_platform == 'linux'" }, { name = "python-levenshtein", marker = "python_full_version < '3.14' and sys_platform == 'linux'" }, + { name = "scipy", marker = "python_full_version < '3.14' and sys_platform == 'linux'" }, { name = "thefuzz", marker = "python_full_version < '3.14' and sys_platform == 'linux'" }, { name = "tqdm", marker = "python_full_version < '3.14' and sys_platform == 'linux'" }, { name = "uvicorn", marker = "python_full_version < '3.14' and sys_platform == 'linux'" }, @@ -1619,6 +1620,7 @@ requires-dist = [ { name = "pyarrow", specifier = ">=15.0.0" }, { name = "python-dateutil", specifier = ">=2.8.0" }, { name = "python-levenshtein", specifier = ">=0.27.3" }, + { name = "scipy", specifier = ">=1.17.0" }, { name = "thefuzz", specifier = ">=0.22.1" }, { name = "tqdm", specifier = ">=4.67.1" }, { name = "uvicorn", specifier = ">=0.34.0" }, @@ -2192,6 +2194,37 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/6a/5b/aaf1dfbcc53a2811f6cc0a1759de24e4b03e02ba8762daabd9b6bd8c59e3/ruff-0.14.14-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:16bc890fb4cc9781bb05beb5ab4cd51be9e7cb376bf1dd3580512b24eb3fda2b", size = 11315626, upload-time = "2026-01-22T22:30:36.848Z" }, ] +[[package]] +name = "scipy" +version = "1.17.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy", marker = "python_full_version < '3.14' and sys_platform == 'linux'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/56/3e/9cca699f3486ce6bc12ff46dc2031f1ec8eb9ccc9a320fdaf925f1417426/scipy-1.17.0.tar.gz", hash = "sha256:2591060c8e648d8b96439e111ac41fd8342fdeff1876be2e19dea3fe8930454e", size = 30396830, upload-time = "2026-01-10T21:34:23.009Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/4a/69/7c347e857224fcaf32a34a05183b9d8a7aca25f8f2d10b8a698b8388561a/scipy-1.17.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5194c445d0a1c7a6c1a4a4681b6b7c71baad98ff66d96b949097e7513c9d6742", size = 32724197, upload-time = "2026-01-10T21:25:44.084Z" }, + { url = "https://files.pythonhosted.org/packages/d1/fe/66d73b76d378ba8cc2fe605920c0c75092e3a65ae746e1e767d9d020a75a/scipy-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9eeb9b5f5997f75507814ed9d298ab23f62cf79f5a3ef90031b1ee2506abdb5b", size = 35009148, upload-time = "2026-01-10T21:25:50.591Z" }, + { url = "https://files.pythonhosted.org/packages/af/07/07dec27d9dc41c18d8c43c69e9e413431d20c53a0339c388bcf72f353c4b/scipy-1.17.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:40052543f7bbe921df4408f46003d6f01c6af109b9e2c8a66dd1cf6cf57f7d5d", size = 34798766, upload-time = "2026-01-10T21:25:59.41Z" }, + { url = "https://files.pythonhosted.org/packages/81/61/0470810c8a093cdacd4ba7504b8a218fd49ca070d79eca23a615f5d9a0b0/scipy-1.17.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:0cf46c8013fec9d3694dc572f0b54100c28405d55d3e2cb15e2895b25057996e", size = 37405953, upload-time = "2026-01-10T21:26:07.75Z" }, + { url = "https://files.pythonhosted.org/packages/c9/10/be13397a0e434f98e0c79552b2b584ae5bb1c8b2be95db421533bbca5369/scipy-1.17.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fe508b5690e9eaaa9467fc047f833af58f1152ae51a0d0aed67aa5801f4dd7d6", size = 32696338, upload-time = "2026-01-10T21:26:55.521Z" }, + { url = "https://files.pythonhosted.org/packages/63/1e/12fbf2a3bb240161651c94bb5cdd0eae5d4e8cc6eaeceb74ab07b12a753d/scipy-1.17.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6680f2dfd4f6182e7d6db161344537da644d1cf85cf293f015c60a17ecf08752", size = 34977201, upload-time = "2026-01-10T21:27:03.501Z" }, + { url = "https://files.pythonhosted.org/packages/19/5b/1a63923e23ccd20bd32156d7dd708af5bbde410daa993aa2500c847ab2d2/scipy-1.17.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:eec3842ec9ac9de5917899b277428886042a93db0b227ebbe3a333b64ec7643d", size = 34777384, upload-time = "2026-01-10T21:27:11.423Z" }, + { url = "https://files.pythonhosted.org/packages/39/22/b5da95d74edcf81e540e467202a988c50fef41bd2011f46e05f72ba07df6/scipy-1.17.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:d7425fcafbc09a03731e1bc05581f5fad988e48c6a861f441b7ab729a49a55ea", size = 37379586, upload-time = "2026-01-10T21:27:20.171Z" }, + { url = "https://files.pythonhosted.org/packages/9b/9d/025cccdd738a72140efc582b1641d0dd4caf2e86c3fb127568dc80444e6e/scipy-1.17.0-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:130d12926ae34399d157de777472bf82e9061c60cc081372b3118edacafe1d00", size = 32815098, upload-time = "2026-01-10T21:27:54.389Z" }, + { url = "https://files.pythonhosted.org/packages/48/5f/09b879619f8bca15ce392bfc1894bd9c54377e01d1b3f2f3b595a1b4d945/scipy-1.17.0-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6e886000eb4919eae3a44f035e63f0fd8b651234117e8f6f29bad1cd26e7bc45", size = 35031342, upload-time = "2026-01-10T21:28:03.012Z" }, + { url = "https://files.pythonhosted.org/packages/f2/9a/f0f0a9f0aa079d2f106555b984ff0fbb11a837df280f04f71f056ea9c6e4/scipy-1.17.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:13c4096ac6bc31d706018f06a49abe0485f96499deb82066b94d19b02f664209", size = 34893199, upload-time = "2026-01-10T21:28:10.832Z" }, + { url = "https://files.pythonhosted.org/packages/90/b8/4f0f5cf0c5ea4d7548424e6533e6b17d164f34a6e2fb2e43ffebb6697b06/scipy-1.17.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:cacbaddd91fcffde703934897c5cd2c7cb0371fac195d383f4e1f1c5d3f3bd04", size = 37438061, upload-time = "2026-01-10T21:28:19.684Z" }, + { url = "https://files.pythonhosted.org/packages/80/5c/ea5d239cda2dd3d31399424967a24d556cf409fbea7b5b21412b0fd0a44f/scipy-1.17.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a38c3337e00be6fd8a95b4ed66b5d988bac4ec888fd922c2ea9fe5fb1603dd67", size = 32757834, upload-time = "2026-01-10T21:29:23.406Z" }, + { url = "https://files.pythonhosted.org/packages/b8/7e/8c917cc573310e5dc91cbeead76f1b600d3fb17cf0969db02c9cf92e3cfa/scipy-1.17.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:00fb5f8ec8398ad90215008d8b6009c9db9fa924fd4c7d6be307c6f945f9cd73", size = 34995775, upload-time = "2026-01-10T21:29:31.915Z" }, + { url = "https://files.pythonhosted.org/packages/c5/43/176c0c3c07b3f7df324e7cdd933d3e2c4898ca202b090bd5ba122f9fe270/scipy-1.17.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:f2a4942b0f5f7c23c7cd641a0ca1955e2ae83dedcff537e3a0259096635e186b", size = 34841240, upload-time = "2026-01-10T21:29:39.995Z" }, + { url = "https://files.pythonhosted.org/packages/44/8c/d1f5f4b491160592e7f084d997de53a8e896a3ac01cd07e59f43ca222744/scipy-1.17.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:dbf133ced83889583156566d2bdf7a07ff89228fe0c0cb727f777de92092ec6b", size = 37394463, upload-time = "2026-01-10T21:29:48.723Z" }, + { url = "https://files.pythonhosted.org/packages/7c/74/3498563a2c619e8a3ebb4d75457486c249b19b5b04a30600dfd9af06bea5/scipy-1.17.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5fb10d17e649e1446410895639f3385fd2bf4c3c7dfc9bea937bddcbc3d7b9ba", size = 32829770, upload-time = "2026-01-10T21:30:16.359Z" }, + { url = "https://files.pythonhosted.org/packages/48/d1/7b50cedd8c6c9d6f706b4b36fa8544d829c712a75e370f763b318e9638c1/scipy-1.17.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8547e7c57f932e7354a2319fab613981cde910631979f74c9b542bb167a8b9db", size = 35051093, upload-time = "2026-01-10T21:30:22.987Z" }, + { url = "https://files.pythonhosted.org/packages/e2/82/a2d684dfddb87ba1b3ea325df7c3293496ee9accb3a19abe9429bce94755/scipy-1.17.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:33af70d040e8af9d5e7a38b5ed3b772adddd281e3062ff23fec49e49681c38cf", size = 34909905, upload-time = "2026-01-10T21:30:28.704Z" }, + { url = "https://files.pythonhosted.org/packages/ef/5e/e565bd73991d42023eb82bb99e51c5b3d9e2c588ca9d4b3e2cc1d3ca62a6/scipy-1.17.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f9eb55bb97d00f8b7ab95cb64f873eb0bf54d9446264d9f3609130381233483f", size = 37457743, upload-time = "2026-01-10T21:30:34.819Z" }, +] + [[package]] name = "send2trash" version = "2.1.0"