# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend. ## Commands All commands use [Task](https://taskfile.dev) runner. ```bash task prepare # Full setup: install deps, download data (~GB), run pipeline task server # Rust backend on :8001 (cargo run --release) task frontend # Webpack dev server on :3030 (proxies /api to :8001) task lint # Lint Python (ruff) + TypeScript (ESLint + Prettier) task format # Auto-fix formatting (ruff + ESLint + Prettier) task typecheck # TypeScript type checking task check # All checks (lint + typecheck + build) task test # Run Python tests (fuzzy join) task build # Build frontend for production ``` Python commands use `uv run`. Frontend commands use `npm run` from `frontend/`. ## Architecture ### Data Pipeline (`pipeline/`) Python + Polars. Orchestrated by `pipeline/run.py` which builds `data_sources/processed/wide.parquet`: 1. **Download** (`pipeline/download/`) — Fetches raw data into `data_sources/`: - `arcgis.py` — Postcode → lat/lon/LSOA mappings - `price_paid.py` — Land Registry price-paid records - `pois/` — OpenStreetMap POIs via osmium (PBF parsing) - `deprivation_data.py` — English Indices of Deprivation 2025 2. **Join** (`pipeline/epc_pp.py`) — Fuzzy-joins EPC certificates with price-paid by address within postcode buckets → `epc_pp.parquet` 3. **Widen** (`pipeline/run.py`) — Joins epc_pp with GPS coords, journey times, IoD scores, POI proximity counts, derives `price_per_sqm` and numeric `construction_age_band` 4. **Transform POIs** (`pipeline/download/pois/transform.py`) — Drops unwanted categories, remaps to friendly names + emoji → `filtered_uk_pois.parquet` Shared utilities live in `pipeline/utils/` (haversine distance for both numpy and Polars expressions, fuzzy address matching). ### Backend (`server-rs/`) Rust + Axum. Loads `wide.parquet` and `filtered_uk_pois.parquet` into memory at startup with precomputed H3 indices (resolutions 7–11) and grid-based spatial indices (0.01° cells). **API endpoints:** - `GET /api/features` — Numeric column metadata with histograms and percentiles - `GET /api/hexagons` — H3 aggregates filtered by bounds, resolution, and feature min/max - `GET /api/pois` — POIs by bounds with optional category filter (max 5000) - `GET /api/poi-categories` — Available POI categories Also serves `frontend/dist/` as static fallback. ### Frontend (`frontend/`) React 18 + TypeScript SPA. deck.gl `H3HexagonLayer` over MapLibre GL basemap. Debounces API calls (150ms) on viewport changes. TailwindCSS for styling. ## Key Implementation Details - Bounds quantized to 0.01° to improve cache hits on both backend and frontend - H3 hexagon results capped at 50,000 per request (truncated flag in response) - POI proximity counting uses a spatial grid (0.05° cells, ~5km) to avoid O(n×m) distance checks - Fuzzy address matching uses `thefuzz.token_sort_ratio` with numeric token compatibility checks, parallelized across postcode buckets - The Rust server writes JSON via direct string buffer (avoids serde_json::Value allocations) - POI transform validates exhaustive category coverage — pipeline fails if any OSM category is unmapped ## Data Sources - **Land Registry** — Price Paid bulk download - **EPC** — Energy Performance Certificates (domestic) - **ArcGIS** — Postcode → GPS/LSOA lookup - **OpenStreetMap** — POIs from Geofabrik Great Britain PBF - **IoD 2025** — English Indices of Deprivation (LSOA-level scores) - **TFL API** — Journey time calculations to configurable destinations