77 lines
4 KiB
Markdown
77 lines
4 KiB
Markdown
# CLAUDE.md
|
||
|
||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||
|
||
## Project Overview
|
||
|
||
Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend.
|
||
|
||
## Commands
|
||
|
||
All commands use [Task](https://taskfile.dev) runner.
|
||
|
||
```bash
|
||
task prepare # Full setup: install deps, download data (~GB), run pipeline
|
||
task server # Rust backend on :8001 (cargo run --release)
|
||
task frontend # Webpack dev server on :3030 (proxies /api to :8001)
|
||
|
||
task lint # Lint Python (ruff) + TypeScript (ESLint + Prettier)
|
||
task format # Auto-fix formatting (ruff + ESLint + Prettier)
|
||
task typecheck # TypeScript type checking
|
||
task check # All checks (lint + typecheck + build)
|
||
task test # Run Python tests (fuzzy join)
|
||
task build # Build frontend for production
|
||
```
|
||
|
||
Python commands use `uv run`. Frontend commands use `npm run` from `frontend/`.
|
||
|
||
## Architecture
|
||
|
||
### Data Pipeline (`pipeline/`)
|
||
|
||
Python + Polars. Orchestrated by `pipeline/run.py` which builds `data_sources/processed/wide.parquet`:
|
||
|
||
1. **Download** (`pipeline/download/`) — Fetches raw data into `data_sources/`:
|
||
- `arcgis.py` — Postcode → lat/lon/LSOA mappings
|
||
- `price_paid.py` — Land Registry price-paid records
|
||
- `pois/` — OpenStreetMap POIs via osmium (PBF parsing)
|
||
- `deprivation_data.py` — English Indices of Deprivation 2025
|
||
2. **Join** (`pipeline/epc_pp.py`) — Fuzzy-joins EPC certificates with price-paid by address within postcode buckets → `epc_pp.parquet`
|
||
3. **Widen** (`pipeline/run.py`) — Joins epc_pp with GPS coords, journey times, IoD scores, POI proximity counts, derives `price_per_sqm` and numeric `construction_age_band`
|
||
4. **Transform POIs** (`pipeline/download/pois/transform.py`) — Drops unwanted categories, remaps to friendly names + emoji → `filtered_uk_pois.parquet`
|
||
|
||
Shared utilities live in `pipeline/utils/` (haversine distance for both numpy and Polars expressions, fuzzy address matching).
|
||
|
||
### Backend (`server-rs/`)
|
||
|
||
Rust + Axum. Loads `wide.parquet` and `filtered_uk_pois.parquet` into memory at startup with precomputed H3 indices (resolutions 7–11) and grid-based spatial indices (0.01° cells).
|
||
|
||
**API endpoints:**
|
||
- `GET /api/features` — Numeric column metadata with histograms and percentiles
|
||
- `GET /api/hexagons` — H3 aggregates filtered by bounds, resolution, and feature min/max
|
||
- `GET /api/pois` — POIs by bounds with optional category filter (max 5000)
|
||
- `GET /api/poi-categories` — Available POI categories
|
||
|
||
Also serves `frontend/dist/` as static fallback.
|
||
|
||
### Frontend (`frontend/`)
|
||
|
||
React 18 + TypeScript SPA. deck.gl `H3HexagonLayer` over MapLibre GL basemap. Debounces API calls (150ms) on viewport changes. TailwindCSS for styling.
|
||
|
||
## Key Implementation Details
|
||
|
||
- Bounds quantized to 0.01° to improve cache hits on both backend and frontend
|
||
- H3 hexagon results capped at 50,000 per request (truncated flag in response)
|
||
- POI proximity counting uses a spatial grid (0.05° cells, ~5km) to avoid O(n×m) distance checks
|
||
- Fuzzy address matching uses `thefuzz.token_sort_ratio` with numeric token compatibility checks, parallelized across postcode buckets
|
||
- The Rust server writes JSON via direct string buffer (avoids serde_json::Value allocations)
|
||
- POI transform validates exhaustive category coverage — pipeline fails if any OSM category is unmapped
|
||
|
||
## Data Sources
|
||
|
||
- **Land Registry** — Price Paid bulk download
|
||
- **EPC** — Energy Performance Certificates (domestic)
|
||
- **ArcGIS** — Postcode → GPS/LSOA lookup
|
||
- **OpenStreetMap** — POIs from Geofabrik Great Britain PBF
|
||
- **IoD 2025** — English Indices of Deprivation (LSOA-level scores)
|
||
- **TFL API** — Journey time calculations to configurable destinations
|