# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. NEVER EVER RUN GIT COMMANDS!! ## Project Overview Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, crime statistics, ethnicity data, broadband speeds, school ratings, road noise, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend. ## Commands All commands use [Task](https://taskfile.dev) runner. Python uses `uv run`. Frontend uses `npm run` from `frontend/`. ```bash # Development servers task dev:server # Rust backend on :8001 (cargo run --release) task dev:frontend # Webpack dev server on :3001 (proxies /api to :8001) # Data pipeline (uses Make, not Task — see Makefile.data) make -f Makefile.data prepare # Build properties.parquet (merge + price estimation) make -f Makefile.data merge # Just the merge step (no price estimation) # Assets task download:map-assets # Download font glyphs + twemoji PNGs into frontend/public/assets/ # Quality task lint # Lint all: Python (ruff) + TypeScript (ESLint+Prettier) + Rust (clippy+fmt) task format # Auto-fix formatting for all languages task test # Python tests (fuzzy join, haversine, POI counts) task check # Full validation: lint + build + test # Building task build:frontend # TypeScript typecheck + webpack production build task build:server # cargo build --release (NOTE: dir is wrong in Taskfile, run from server-rs/) # Granular lint/format task lint:python # uv run ruff check . task lint:frontend # eslint + prettier --check task lint:rust # cargo clippy -- -D warnings && cargo fmt --check task format:python # ruff check --fix && ruff format task format:frontend # eslint --fix + prettier --write task format:rust # cargo fmt --all ``` Running individual tests: ```bash uv run pytest pipeline/utils/test_haversine.py # Single test file uv run pytest pipeline/utils/test_haversine.py -k "test_name" # Single test ``` ## Architecture ### Data Flow ``` Raw sources → [Download scripts] → data/*.parquet → [Fuzzy join EPC ↔ Price-Paid] → epc_pp.parquet → [Merge all datasets] → properties.parquet → [Price estimation] → properties.parquet (augmented with estimated prices) → [Rust server loads into memory + precomputes H3 + spatial grid] → [Frontend renders deck.gl H3HexagonLayer over MapLibre GL] ``` ### Data Pipeline (`pipeline/`) Python + Polars. Orchestrated by `Makefile.data` (Make DAG with sentinel files like `.merge_done`, `.prices_done`). Two phases: 1. **Download** (`pipeline/download/`) — Each script fetches one raw dataset into `data/` 2. **Transform** (`pipeline/transform/`) — Joins and derives features: - `join_epc_pp.py` — Fuzzy-joins EPC ↔ price-paid by address within postcode buckets - `merge.py` — **Main pipeline**: joins all datasets → `properties.parquet` with human-readable column names - `price_estimation/` — Post-merge step: adds "Estimated current price" and "Est. price per sqm" columns to `properties.parquet`. Uses repeat-sales price index + kNN spatial blending. Requires `price_index.parquet` (built by `price_estimation/index.py`). Run via `make -f Makefile.data prepare` (the `merge` target alone skips this). - `transform_poi.py` — Filters POIs, maps to friendly names + emoji (exhaustive category validation) - `poi_proximity.py` — Counts POIs within 2km per postcode using 0.05° spatial grid - `crime.py` — Aggregates crime CSVs into yearly averages by LSOA **Critical: column renaming in `merge.py`** — The pipeline renames columns from snake_case to human-readable names before writing `properties.parquet`. The Rust server and frontend use **only** these human-readable names — there are no fallbacks to snake_case. Key renames: - `pp_address` → `Address per Property Register` - `postcode` → `Postcode` - `latest_price` → `Last known price` - `duration` → `Leasehold/Freehold` - `total_floor_area` → `Total floor area (sqm)` - `current_energy_rating` → `Current energy rating` The server requires these exact column names at startup (will error if missing). See the full rename map in `merge.py`. ### Backend (`server-rs/`) Rust + Axum. Loads parquet into memory at startup. **Structure** (uses Rust 2018 module style — `foo.rs` + `foo/` directory, not `foo/mod.rs`): - `data.rs` + `data/` — Property and POI data loading - `parsing.rs` + `parsing/` — Filter parsing and bounds parsing - `routes.rs` + `routes/` — One file per endpoint. `properties.rs` exports shared `build_property()` used by both hexagon and postcode property endpoints - `utils.rs` + `utils/` — GridIndex, hashing, interned columns - `consts.rs` — Key constants (histogram bins, H3 range, max enum cardinality, excluded columns) **API endpoints:** - `GET /api/features` — Feature metadata with histograms and 2nd/98th percentiles - `GET /api/hexagons?resolution=&bounds=&filters=&fields=` — H3 aggregates (min/max per feature per hex), AABB-filtered to bounds - `GET /api/postcodes?bounds=&filters=&fields=` — Postcode polygon aggregates, AABB-filtered to bounds - `GET /api/postcode/:postcode` — Single postcode lookup (centroid + polygon) - `GET /api/hexagon-properties?h3=&resolution=&filters=&limit=&offset=` — Paginated properties within a hexagon - `GET /api/postcode-properties?postcode=&filters=&limit=&offset=` — Paginated properties within a postcode - `GET /api/pois?bounds=&categories=` — POIs by bounds (max 5000) - `GET /api/poi-categories` — Available POI category names Serves `frontend/dist/` as static fallback in production **only** when `--dist` is explicitly provided. When `--dist` is set, the server panics at startup if `index.html` is unreadable. When omitted (dev mode), static serving and OG injection are disabled. **Data representation (unified model):** - All features (numeric and enum): row-major flat `Vec`, NaN = null - Enum features: stored as f32 indices (0.0, 1.0, 2.0...) with `enum_values: FxHashMap>` mapping feature index → string values - String fields (address, postcode): interned/packed for memory efficiency - All CLI args are required (no hidden defaults). Optional services use `Option`: `r5_url` (travel time disabled when None), `pocketbase_admin_email`/`password` (collection auto-creation skipped when None). Required config like `gemini_model` and `public_url` must be explicitly provided via env or CLI. ### Frontend (`frontend/`) React 18 + TypeScript. deck.gl `H3HexagonLayer` over MapLibre GL. TailwindCSS. No state management library — pure React hooks. **Architecture:** - `App.tsx` — Minimal router: loads features/POI categories, handles page navigation. Page type is `'home' | 'dashboard' | 'learn' | 'pricing' | 'account' | 'saved' | 'invites' | 'invite'`. Auth-required pages (`account`, `saved`, `invites`) redirect to home with login modal when unauthenticated. `pageToPath()` / `pathToPage()` map between Page values and URL paths. - `AccountPage.tsx` — Exports three separate page components: `SavedPage` (`/saved` — saved searches + saved properties with sub-tabs), `InvitesPage` (`/invites` — invite link generation + history), and `AccountPage` (default export, `/account` — email, subscription, newsletter, support). Note: `'invite'` (singular, `/invite/:code`) is the invite *redemption* flow — distinct from `'invites'` (plural, `/invites`) which is the invite *management* page. - `MapPage.tsx` — Dashboard layout: composes map + left/right panes, uses custom hooks for all logic - Custom hooks in `hooks/` encapsulate stateful logic: - `useMapData` — Hexagon/postcode fetching, bounds, loading state, color range calculation - `useFilters` — Filter state and handlers (add/remove/change/drag/pin) - `useHexagonSelection` — Selection state, area stats, properties fetching (supports both hexagons and postcodes) - `usePOIData` — POI fetching with debounce - `usePaneResize` — Reusable pane resize handlers - `useTheme` — Theme state with localStorage persistence - `useUrlSync` — URL state synchronization **Key patterns:** - URL encodes view/filters/POI categories/active tab as query params for shareable links. Only the current format is supported — no legacy parameter parsing (old `v=`, `f=`, or tab abbreviations are not handled). `tmode` is always serialized when travel time is active (no implicit default); parsing throws if `tmode` is missing when `dest` is present. - AbortControllers cancel in-flight requests on new queries (150ms debounce) - Zoom → H3 resolution defined in `consts.ts` `ZOOM_TO_RESOLUTION_THRESHOLDS`: `<7.5→5, <9.5→6, <10.5→8, <12→9, ≥12→10` - `POSTCODE_ZOOM_THRESHOLD = 15`: below 15 shows H3 hexagons, at/above 15 shows postcode polygons - Viewport bounds computed via `getBoundsFromViewState()` in `map-utils.ts` — uses Web Mercator math with **TILE_SIZE=512** (MapLibre/deck.gl convention, NOT 256) - Properties pane uses feature names from API response (human-readable), not hardcoded field names - Proxy: dev server on :3001 proxies `/api` to :8001; also handles VS Code `/proxy/PORT` patterns - **Nav links must be `` tags, not `