6.8 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, crime statistics, ethnicity data, broadband speeds, school ratings, road noise, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend.
Commands
All commands use Task runner. Python uses uv run. Frontend uses npm run from frontend/.
# Development servers
task dev:server # Rust backend on :8001 (cargo run --release)
task dev:frontend # Webpack dev server on :3030 (proxies /api to :8001)
# Data pipeline
task prepare # Build wide.parquet from all pre-downloaded sources
# Quality
task lint # Lint all: Python (ruff) + TypeScript (ESLint+Prettier) + Rust (clippy+fmt)
task format # Auto-fix formatting for all languages
task test # Python tests (fuzzy join, haversine, POI counts)
task check # Full validation: lint + build + test
# Building
task build:frontend # TypeScript typecheck + webpack production build
task build:server # cargo build --release (NOTE: dir is wrong in Taskfile, run from server-rs/)
# Granular lint/format
task lint:python # uv run ruff check .
task lint:frontend # eslint + prettier --check
task lint:rust # cargo clippy -- -D warnings && cargo fmt --check
task format:python # ruff check --fix && ruff format
task format:frontend # eslint --fix + prettier --write
task format:rust # cargo fmt --all
Running individual tests:
uv run pytest pipeline/utils/test_haversine.py # Single test file
uv run pytest pipeline/utils/test_haversine.py -k "test_name" # Single test
Architecture
Data Flow
Raw sources → [Download scripts] → data/*.parquet
→ [Fuzzy join EPC ↔ Price-Paid] → epc_pp.parquet
→ [Merge all datasets] → wide.parquet
→ [Rust server loads into memory + precomputes H3 + spatial grid]
→ [Frontend renders deck.gl H3HexagonLayer over MapLibre GL]
Data Pipeline (pipeline/)
Python + Polars. Two phases:
- Download (
pipeline/download/) — Each script fetches one raw dataset intodata/ - Transform (
pipeline/transform/) — Joins and derives features:join_epc_pp.py— Fuzzy-joins EPC ↔ price-paid by address within postcode bucketsmerge.py— Main pipeline: joins all datasets →wide.parquetwith human-readable column namestransform_poi.py— Filters POIs, maps to friendly names + emoji (exhaustive category validation)poi_proximity.py— Counts POIs within 2km per postcode using 0.05° spatial gridcrime.py— Aggregates crime CSVs into yearly averages by LSOA
Critical: column renaming in merge.py — The pipeline renames columns from snake_case to human-readable names before writing wide.parquet. The Rust server auto-discovers features from whatever column names exist in the parquet. Key renames:
pp_address→Address per Property Registerpostcode→Postcodelatest_price→Last known priceduration→Leashold/Freeholdtotal_floor_area→Total floor area (sqm)current_energy_rating→Current energy rating
The server and frontend must handle these human-readable names. See the full rename map in merge.py.
Backend (server-rs/)
Rust + Axum. Loads parquet into memory at startup.
Structure:
data/property.rs— Loadswide.parquet, auto-discovers numeric + enum features, computes histograms, sorts rows by spatial locality, precomputes H3 cells (resolutions 4–12)data/poi.rs— Loadsfiltered_uk_pois.parquetindex.rs—GridIndex: 0.01° spatial grid for O(1) cell lookupfilter.rs— Parses filter strings and checks rows. Format:name:min:max(numeric),name:val1|val2(enum)routes/— One file per endpointconsts.rs— Key constants (histogram bins, H3 range, max enum cardinality, excluded columns)
API endpoints:
GET /api/features— Feature metadata with histograms and 2nd/98th percentilesGET /api/hexagons?resolution=&bounds=&filters=— H3 aggregates (min/max per feature per hex)GET /api/hexagon-properties?h3=&resolution=&filters=&limit=&offset=— Paginated properties within a hexagonGET /api/pois?bounds=&categories=— POIs by bounds (max 5000)GET /api/poi-categories— Available POI category names
Serves frontend/dist/ as static fallback in production.
Data representation:
- Numeric features: row-major flat
Vec<f64>, NaN = null - Enum features:
Vec<u8>indices into value list, 255 = null - String fields (address, postcode):
Vec<String>, empty = null - The server accepts the parquet path as a CLI argument (defaults to
data_sources/processed/wide.parquet)
Frontend (frontend/)
React 18 + TypeScript. deck.gl H3HexagonLayer over MapLibre GL. TailwindCSS. No state management library — pure React hooks.
Key patterns:
App.tsxmanages all state, API fetching (150ms debounce), and URL state sync (300ms debounce)- URL encodes view/filters/POI categories/active tab as query params for shareable links
- AbortControllers cancel in-flight requests on new queries
- Zoom → H3 resolution:
<7→7, <9.5→8, <11→9, <13→10, ≥13→11 - Bounds quantized to 0.01° to match backend caching
- Properties pane uses feature names from API response (human-readable), not hardcoded field names
- Proxy: dev server on :3030 proxies
/apito :8001; also handles VS Code/proxy/PORTpatterns
Key Implementation Details
- Spatial sort: Rows sorted by 0.01° grid cell at load time for cache-friendly sequential access
- Row-major layout:
feature_data[row * num_features + feat_idx]— all features for one property are contiguous - H3 precomputation: Resolutions 4–12 computed in parallel (rayon) at startup
- Histogram percentiles without sorting: O(n) two-pass algorithm — build histogram, interpolate percentiles
- Direct JSON writing: Hexagon endpoint writes JSON via string buffer, avoids serde_json::Value allocations
- POI transform validation: Fails if any OSM category is unmapped — guarantees exhaustive coverage
- Fuzzy join: Groups by postcode, uses
thefuzz.token_sort_ratiowith numeric token compatibility, greedy assignment from highest score - Filter bounds format:
south,west,north,east(not standard bbox order) - POI proximity: Uses 0.05° grid (~5km cells) to reduce candidates before haversine distance check