This commit is contained in:
Andras Schmelczer 2026-01-31 10:19:48 +00:00
parent bf2d5de156
commit 0153e46478
5 changed files with 87 additions and 49 deletions

2
.gitignore vendored
View file

@ -5,3 +5,5 @@ tfl_journey_client
**/node_modules
**/__pycache__
**/dist
server-rs/target
.task

View file

@ -4,66 +4,74 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project Overview
Property Map is a full-stack geospatial web application that visualizes UK property price data aggregated by H3 hexagonal spatial indices. It combines Land Registry price data with postcode geolocation to create an interactive map for exploring property markets.
Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend.
## Commands
All commands use [Task](https://taskfile.dev) runner. Install with: `curl -1sLf 'https://dl.cloudsmith.io/public/task/task/setup.deb.sh' | sudo -E bash`
All commands use [Task](https://taskfile.dev) runner.
```bash
# Initial setup (downloads ~GB of data, runs pipeline)
task prepare
task prepare # Full setup: install deps, download data (~GB), run pipeline
task server # Rust backend on :8001 (cargo run --release)
task frontend # Webpack dev server on :3030 (proxies /api to :8001)
# Development (run in separate terminals)
task server # FastAPI backend on :8001
task frontend # Webpack dev server on :3030 (proxies /api to :8001)
# Code quality
task lint # Lint Python (ruff) + TypeScript (ESLint + Prettier)
task format # Auto-fix formatting
task typecheck # TypeScript type checking
task check # All checks (lint + typecheck + build)
# Production
task build # Build frontend
task prod # Serve built frontend via FastAPI
task lint # Lint Python (ruff) + TypeScript (ESLint + Prettier)
task format # Auto-fix formatting (ruff + ESLint + Prettier)
task typecheck # TypeScript type checking
task check # All checks (lint + typecheck + build)
task test # Run Python tests (fuzzy join)
task build # Build frontend for production
```
Python commands use `uv run`. Frontend commands use `npm run` from `frontend/`.
## Architecture
```
frontend/ React + TypeScript SPA (deck.gl/MapLibre for visualization)
src/App.tsx Main component with filters and map state
src/components/ Map.tsx (deck.gl H3HexagonLayer), Filters UI
### Data Pipeline (`pipeline/`)
server/ FastAPI backend
main.py App setup, CORS, static file mounting
routes/hexagons.py GET /api/hexagons - returns aggregated price data
Python + Polars. Orchestrated by `pipeline/run.py` which builds `data_sources/processed/wide.parquet`:
pipeline/ Data processing (Polars + H3)
config.py Central config (H3 resolutions 6-11, year/price ranges)
sources/ Postcode loading, property price joins
processors/ H3 aggregation (count, avg/median/min/max by cell+year)
1. **Download** (`pipeline/download/`) — Fetches raw data into `data_sources/`:
- `arcgis.py` — Postcode → lat/lon/LSOA mappings
- `price_paid.py` — Land Registry price-paid records
- `pois/` — OpenStreetMap POIs via osmium (PBF parsing)
- `deprivation_data.py` — English Indices of Deprivation 2025
2. **Join** (`pipeline/epc_pp.py`) — Fuzzy-joins EPC certificates with price-paid by address within postcode buckets → `epc_pp.parquet`
3. **Widen** (`pipeline/run.py`) — Joins epc_pp with GPS coords, journey times, IoD scores, POI proximity counts, derives `price_per_sqm` and numeric `construction_age_band`
4. **Transform POIs** (`pipeline/download/pois/transform.py`) — Drops unwanted categories, remaps to friendly names + emoji → `filtered_uk_pois.parquet`
tfl_journey_client/ Generated TFL API client (local package)
```
Shared utilities live in `pipeline/utils/` (haversine distance for both numpy and Polars expressions, fuzzy address matching).
## Data Flow
### Backend (`server-rs/`)
1. **Download**: Land Registry prices + ArcGIS postcode→lat/lon mappings → `data_sources/`
2. **Pipeline**: Join data, compute H3 indices, aggregate stats → `data_sources/processed/aggregates/*.parquet`
3. **Serve**: Load parquet files into memory, filter by bounds/year/price, return as GeoJSON-like response
4. **Visualize**: Frontend fetches on viewport change, renders hexagons colored by average price
Rust + Axum. Loads `wide.parquet` and `filtered_uk_pois.parquet` into memory at startup with precomputed H3 indices (resolutions 711) and grid-based spatial indices (0.01° cells).
## Tech Stack
**API endpoints:**
- `GET /api/features` — Numeric column metadata with histograms and percentiles
- `GET /api/hexagons` — H3 aggregates filtered by bounds, resolution, and feature min/max
- `GET /api/pois` — POIs by bounds with optional category filter (max 5000)
- `GET /api/poi-categories` — Available POI categories
- **Frontend**: React 18, TypeScript, Webpack, TailwindCSS, deck.gl, MapLibre GL
- **Backend**: Python 3.12, FastAPI, Polars, H3
- **Package managers**: `uv` (Python), `npm` (frontend)
Also serves `frontend/dist/` as static fallback.
### Frontend (`frontend/`)
React 18 + TypeScript SPA. deck.gl `H3HexagonLayer` over MapLibre GL basemap. Debounces API calls (150ms) on viewport changes. TailwindCSS for styling.
## Key Implementation Details
- Backend caches dataframes in memory and uses LRU cache on queries
- Bounds rounded to 0.01° precision to improve cache hits
- Results capped at 50,000 hexagons per request (truncated flag in response)
- Frontend debounces API calls on map movement
- Bounds quantized to 0.01° to improve cache hits on both backend and frontend
- H3 hexagon results capped at 50,000 per request (truncated flag in response)
- POI proximity counting uses a spatial grid (0.05° cells, ~5km) to avoid O(n×m) distance checks
- Fuzzy address matching uses `thefuzz.token_sort_ratio` with numeric token compatibility checks, parallelized across postcode buckets
- The Rust server writes JSON via direct string buffer (avoids serde_json::Value allocations)
- POI transform validates exhaustive category coverage — pipeline fails if any OSM category is unmapped
## Data Sources
- **Land Registry** — Price Paid bulk download
- **EPC** — Energy Performance Certificates (domestic)
- **ArcGIS** — Postcode → GPS/LSOA lookup
- **OpenStreetMap** — POIs from Geofabrik Great Britain PBF
- **IoD 2025** — English Indices of Deprivation (LSOA-level scores)
- **TFL API** — Journey time calculations to configurable destinations

View file

@ -1,6 +0,0 @@
def main():
print("Hello from property-map!")
if __name__ == "__main__":
main()

View file

@ -26,6 +26,7 @@ dependencies = [
"matplotlib>=3.10.8",
"thefuzz>=0.22.1",
"python-levenshtein>=0.27.3",
"scipy>=1.17.0",
]
[tool.uv]

33
uv.lock generated
View file

@ -1588,6 +1588,7 @@ dependencies = [
{ name = "pyarrow", marker = "python_full_version < '3.14' and sys_platform == 'linux'" },
{ name = "python-dateutil", marker = "python_full_version < '3.14' and sys_platform == 'linux'" },
{ name = "python-levenshtein", marker = "python_full_version < '3.14' and sys_platform == 'linux'" },
{ name = "scipy", marker = "python_full_version < '3.14' and sys_platform == 'linux'" },
{ name = "thefuzz", marker = "python_full_version < '3.14' and sys_platform == 'linux'" },
{ name = "tqdm", marker = "python_full_version < '3.14' and sys_platform == 'linux'" },
{ name = "uvicorn", marker = "python_full_version < '3.14' and sys_platform == 'linux'" },
@ -1619,6 +1620,7 @@ requires-dist = [
{ name = "pyarrow", specifier = ">=15.0.0" },
{ name = "python-dateutil", specifier = ">=2.8.0" },
{ name = "python-levenshtein", specifier = ">=0.27.3" },
{ name = "scipy", specifier = ">=1.17.0" },
{ name = "thefuzz", specifier = ">=0.22.1" },
{ name = "tqdm", specifier = ">=4.67.1" },
{ name = "uvicorn", specifier = ">=0.34.0" },
@ -2192,6 +2194,37 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/6a/5b/aaf1dfbcc53a2811f6cc0a1759de24e4b03e02ba8762daabd9b6bd8c59e3/ruff-0.14.14-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:16bc890fb4cc9781bb05beb5ab4cd51be9e7cb376bf1dd3580512b24eb3fda2b", size = 11315626, upload-time = "2026-01-22T22:30:36.848Z" },
]
[[package]]
name = "scipy"
version = "1.17.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "numpy", marker = "python_full_version < '3.14' and sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/56/3e/9cca699f3486ce6bc12ff46dc2031f1ec8eb9ccc9a320fdaf925f1417426/scipy-1.17.0.tar.gz", hash = "sha256:2591060c8e648d8b96439e111ac41fd8342fdeff1876be2e19dea3fe8930454e", size = 30396830, upload-time = "2026-01-10T21:34:23.009Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/4a/69/7c347e857224fcaf32a34a05183b9d8a7aca25f8f2d10b8a698b8388561a/scipy-1.17.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5194c445d0a1c7a6c1a4a4681b6b7c71baad98ff66d96b949097e7513c9d6742", size = 32724197, upload-time = "2026-01-10T21:25:44.084Z" },
{ url = "https://files.pythonhosted.org/packages/d1/fe/66d73b76d378ba8cc2fe605920c0c75092e3a65ae746e1e767d9d020a75a/scipy-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9eeb9b5f5997f75507814ed9d298ab23f62cf79f5a3ef90031b1ee2506abdb5b", size = 35009148, upload-time = "2026-01-10T21:25:50.591Z" },
{ url = "https://files.pythonhosted.org/packages/af/07/07dec27d9dc41c18d8c43c69e9e413431d20c53a0339c388bcf72f353c4b/scipy-1.17.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:40052543f7bbe921df4408f46003d6f01c6af109b9e2c8a66dd1cf6cf57f7d5d", size = 34798766, upload-time = "2026-01-10T21:25:59.41Z" },
{ url = "https://files.pythonhosted.org/packages/81/61/0470810c8a093cdacd4ba7504b8a218fd49ca070d79eca23a615f5d9a0b0/scipy-1.17.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:0cf46c8013fec9d3694dc572f0b54100c28405d55d3e2cb15e2895b25057996e", size = 37405953, upload-time = "2026-01-10T21:26:07.75Z" },
{ url = "https://files.pythonhosted.org/packages/c9/10/be13397a0e434f98e0c79552b2b584ae5bb1c8b2be95db421533bbca5369/scipy-1.17.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fe508b5690e9eaaa9467fc047f833af58f1152ae51a0d0aed67aa5801f4dd7d6", size = 32696338, upload-time = "2026-01-10T21:26:55.521Z" },
{ url = "https://files.pythonhosted.org/packages/63/1e/12fbf2a3bb240161651c94bb5cdd0eae5d4e8cc6eaeceb74ab07b12a753d/scipy-1.17.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6680f2dfd4f6182e7d6db161344537da644d1cf85cf293f015c60a17ecf08752", size = 34977201, upload-time = "2026-01-10T21:27:03.501Z" },
{ url = "https://files.pythonhosted.org/packages/19/5b/1a63923e23ccd20bd32156d7dd708af5bbde410daa993aa2500c847ab2d2/scipy-1.17.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:eec3842ec9ac9de5917899b277428886042a93db0b227ebbe3a333b64ec7643d", size = 34777384, upload-time = "2026-01-10T21:27:11.423Z" },
{ url = "https://files.pythonhosted.org/packages/39/22/b5da95d74edcf81e540e467202a988c50fef41bd2011f46e05f72ba07df6/scipy-1.17.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:d7425fcafbc09a03731e1bc05581f5fad988e48c6a861f441b7ab729a49a55ea", size = 37379586, upload-time = "2026-01-10T21:27:20.171Z" },
{ url = "https://files.pythonhosted.org/packages/9b/9d/025cccdd738a72140efc582b1641d0dd4caf2e86c3fb127568dc80444e6e/scipy-1.17.0-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:130d12926ae34399d157de777472bf82e9061c60cc081372b3118edacafe1d00", size = 32815098, upload-time = "2026-01-10T21:27:54.389Z" },
{ url = "https://files.pythonhosted.org/packages/48/5f/09b879619f8bca15ce392bfc1894bd9c54377e01d1b3f2f3b595a1b4d945/scipy-1.17.0-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6e886000eb4919eae3a44f035e63f0fd8b651234117e8f6f29bad1cd26e7bc45", size = 35031342, upload-time = "2026-01-10T21:28:03.012Z" },
{ url = "https://files.pythonhosted.org/packages/f2/9a/f0f0a9f0aa079d2f106555b984ff0fbb11a837df280f04f71f056ea9c6e4/scipy-1.17.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:13c4096ac6bc31d706018f06a49abe0485f96499deb82066b94d19b02f664209", size = 34893199, upload-time = "2026-01-10T21:28:10.832Z" },
{ url = "https://files.pythonhosted.org/packages/90/b8/4f0f5cf0c5ea4d7548424e6533e6b17d164f34a6e2fb2e43ffebb6697b06/scipy-1.17.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:cacbaddd91fcffde703934897c5cd2c7cb0371fac195d383f4e1f1c5d3f3bd04", size = 37438061, upload-time = "2026-01-10T21:28:19.684Z" },
{ url = "https://files.pythonhosted.org/packages/80/5c/ea5d239cda2dd3d31399424967a24d556cf409fbea7b5b21412b0fd0a44f/scipy-1.17.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a38c3337e00be6fd8a95b4ed66b5d988bac4ec888fd922c2ea9fe5fb1603dd67", size = 32757834, upload-time = "2026-01-10T21:29:23.406Z" },
{ url = "https://files.pythonhosted.org/packages/b8/7e/8c917cc573310e5dc91cbeead76f1b600d3fb17cf0969db02c9cf92e3cfa/scipy-1.17.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:00fb5f8ec8398ad90215008d8b6009c9db9fa924fd4c7d6be307c6f945f9cd73", size = 34995775, upload-time = "2026-01-10T21:29:31.915Z" },
{ url = "https://files.pythonhosted.org/packages/c5/43/176c0c3c07b3f7df324e7cdd933d3e2c4898ca202b090bd5ba122f9fe270/scipy-1.17.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:f2a4942b0f5f7c23c7cd641a0ca1955e2ae83dedcff537e3a0259096635e186b", size = 34841240, upload-time = "2026-01-10T21:29:39.995Z" },
{ url = "https://files.pythonhosted.org/packages/44/8c/d1f5f4b491160592e7f084d997de53a8e896a3ac01cd07e59f43ca222744/scipy-1.17.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:dbf133ced83889583156566d2bdf7a07ff89228fe0c0cb727f777de92092ec6b", size = 37394463, upload-time = "2026-01-10T21:29:48.723Z" },
{ url = "https://files.pythonhosted.org/packages/7c/74/3498563a2c619e8a3ebb4d75457486c249b19b5b04a30600dfd9af06bea5/scipy-1.17.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5fb10d17e649e1446410895639f3385fd2bf4c3c7dfc9bea937bddcbc3d7b9ba", size = 32829770, upload-time = "2026-01-10T21:30:16.359Z" },
{ url = "https://files.pythonhosted.org/packages/48/d1/7b50cedd8c6c9d6f706b4b36fa8544d829c712a75e370f763b318e9638c1/scipy-1.17.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8547e7c57f932e7354a2319fab613981cde910631979f74c9b542bb167a8b9db", size = 35051093, upload-time = "2026-01-10T21:30:22.987Z" },
{ url = "https://files.pythonhosted.org/packages/e2/82/a2d684dfddb87ba1b3ea325df7c3293496ee9accb3a19abe9429bce94755/scipy-1.17.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:33af70d040e8af9d5e7a38b5ed3b772adddd281e3062ff23fec49e49681c38cf", size = 34909905, upload-time = "2026-01-10T21:30:28.704Z" },
{ url = "https://files.pythonhosted.org/packages/ef/5e/e565bd73991d42023eb82bb99e51c5b3d9e2c588ca9d4b3e2cc1d3ca62a6/scipy-1.17.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f9eb55bb97d00f8b7ab95cb64f873eb0bf54d9446264d9f3609130381233483f", size = 37457743, upload-time = "2026-01-10T21:30:34.819Z" },
]
[[package]]
name = "send2trash"
version = "2.1.0"