# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. NEVER EVER RUN GIT COMMANDS!! ## Project Overview Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, crime statistics, ethnicity data, broadband speeds, school ratings, road noise, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend. ## Commands All commands use [Task](https://taskfile.dev) runner. Python uses `uv run`. Frontend uses `npm run` from `frontend/`. ```bash # Development servers task dev:server # Rust backend on :8001 (cargo run --release) task dev:frontend # Webpack dev server on :3030 (proxies /api to :8001) # Data pipeline task prepare # Build wide.parquet from all pre-downloaded sources # Quality task lint # Lint all: Python (ruff) + TypeScript (ESLint+Prettier) + Rust (clippy+fmt) task format # Auto-fix formatting for all languages task test # Python tests (fuzzy join, haversine, POI counts) task check # Full validation: lint + build + test # Building task build:frontend # TypeScript typecheck + webpack production build task build:server # cargo build --release (NOTE: dir is wrong in Taskfile, run from server-rs/) # Granular lint/format task lint:python # uv run ruff check . task lint:frontend # eslint + prettier --check task lint:rust # cargo clippy -- -D warnings && cargo fmt --check task format:python # ruff check --fix && ruff format task format:frontend # eslint --fix + prettier --write task format:rust # cargo fmt --all ``` Running individual tests: ```bash uv run pytest pipeline/utils/test_haversine.py # Single test file uv run pytest pipeline/utils/test_haversine.py -k "test_name" # Single test ``` ## Architecture ### Data Flow ``` Raw sources → [Download scripts] → data/*.parquet → [Fuzzy join EPC ↔ Price-Paid] → epc_pp.parquet → [Merge all datasets] → wide.parquet → [Rust server loads into memory + precomputes H3 + spatial grid] → [Frontend renders deck.gl H3HexagonLayer over MapLibre GL] ``` ### Data Pipeline (`pipeline/`) Python + Polars. Two phases: 1. **Download** (`pipeline/download/`) — Each script fetches one raw dataset into `data/` 2. **Transform** (`pipeline/transform/`) — Joins and derives features: - `join_epc_pp.py` — Fuzzy-joins EPC ↔ price-paid by address within postcode buckets - `merge.py` — **Main pipeline**: joins all datasets → `wide.parquet` with human-readable column names - `transform_poi.py` — Filters POIs, maps to friendly names + emoji (exhaustive category validation) - `poi_proximity.py` — Counts POIs within 2km per postcode using 0.05° spatial grid - `crime.py` — Aggregates crime CSVs into yearly averages by LSOA **Critical: column renaming in `merge.py`** — The pipeline renames columns from snake_case to human-readable names before writing `wide.parquet`. The Rust server auto-discovers features from whatever column names exist in the parquet. Key renames: - `pp_address` → `Address per Property Register` - `postcode` → `Postcode` - `latest_price` → `Last known price` - `duration` → `Leashold/Freehold` - `total_floor_area` → `Total floor area (sqm)` - `current_energy_rating` → `Current energy rating` The server and frontend must handle these human-readable names. See the full rename map in `merge.py`. ### Backend (`server-rs/`) Rust + Axum. Loads parquet into memory at startup. **Structure:** - `data/property.rs` — Loads `wide.parquet`, auto-discovers numeric + enum features, computes histograms, sorts rows by spatial locality, precomputes H3 cells (resolutions 4–12) - `data/poi.rs` — Loads `filtered_uk_pois.parquet` - `index.rs` — `GridIndex`: 0.01° spatial grid for O(1) cell lookup - `filter.rs` — Parses filter strings and checks rows. Format: `name:min:max` (numeric), `name:val1|val2` (enum) - `routes/` — One file per endpoint - `consts.rs` — Key constants (histogram bins, H3 range, max enum cardinality, excluded columns) **API endpoints:** - `GET /api/features` — Feature metadata with histograms and 2nd/98th percentiles - `GET /api/hexagons?resolution=&bounds=&filters=` — H3 aggregates (min/max per feature per hex) - `GET /api/hexagon-properties?h3=&resolution=&filters=&limit=&offset=` — Paginated properties within a hexagon - `GET /api/pois?bounds=&categories=` — POIs by bounds (max 5000) - `GET /api/poi-categories` — Available POI category names Serves `frontend/dist/` as static fallback in production. **Data representation:** - Numeric features: row-major flat `Vec`, NaN = null - Enum features: `Vec` indices into value list, 255 = null - String fields (address, postcode): `Vec`, empty = null - The server accepts the parquet path as a CLI argument (defaults to `data_sources/processed/wide.parquet`) ### Frontend (`frontend/`) React 18 + TypeScript. deck.gl `H3HexagonLayer` over MapLibre GL. TailwindCSS. No state management library — pure React hooks. **Key patterns:** - `App.tsx` manages all state, API fetching (150ms debounce), and URL state sync (300ms debounce) - URL encodes view/filters/POI categories/active tab as query params for shareable links - AbortControllers cancel in-flight requests on new queries - Zoom → H3 resolution: `<7→7, <9.5→8, <11→9, <13→10, ≥13→11` - Bounds quantized to 0.01° to match backend caching - Properties pane uses feature names from API response (human-readable), not hardcoded field names - Proxy: dev server on :3030 proxies `/api` to :8001; also handles VS Code `/proxy/PORT` patterns ## Frontend Design Guide (STRICT — must be followed for all UI changes) The frontend uses Tailwind's `darkMode: 'class'` strategy. The `dark` class is toggled on ``. Every visible element must have both light and dark styles. **Never add a light-only color class without its `dark:` counterpart.** Run `task build:frontend` after any UI change to verify. ### Theme System - **State**: `App.tsx` owns a `theme` state (`'light' | 'dark' | 'system'`), persisted in `localStorage` under the key `theme`, default `'system'`. - **Effective theme**: When `'system'`, resolved via `window.matchMedia('(prefers-color-scheme: dark)')`. A `change` listener re-renders on OS preference flip. - **Toggle cycle**: light → dark → system → light. Three-way, not binary. - **Flash prevention**: `index.html` contains an inline `