perfect-postcode/CLAUDE.md
2026-01-31 10:19:48 +00:00

77 lines
4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend.
## Commands
All commands use [Task](https://taskfile.dev) runner.
```bash
task prepare # Full setup: install deps, download data (~GB), run pipeline
task server # Rust backend on :8001 (cargo run --release)
task frontend # Webpack dev server on :3030 (proxies /api to :8001)
task lint # Lint Python (ruff) + TypeScript (ESLint + Prettier)
task format # Auto-fix formatting (ruff + ESLint + Prettier)
task typecheck # TypeScript type checking
task check # All checks (lint + typecheck + build)
task test # Run Python tests (fuzzy join)
task build # Build frontend for production
```
Python commands use `uv run`. Frontend commands use `npm run` from `frontend/`.
## Architecture
### Data Pipeline (`pipeline/`)
Python + Polars. Orchestrated by `pipeline/run.py` which builds `data_sources/processed/wide.parquet`:
1. **Download** (`pipeline/download/`) — Fetches raw data into `data_sources/`:
- `arcgis.py` — Postcode → lat/lon/LSOA mappings
- `price_paid.py` — Land Registry price-paid records
- `pois/` — OpenStreetMap POIs via osmium (PBF parsing)
- `deprivation_data.py` — English Indices of Deprivation 2025
2. **Join** (`pipeline/epc_pp.py`) — Fuzzy-joins EPC certificates with price-paid by address within postcode buckets → `epc_pp.parquet`
3. **Widen** (`pipeline/run.py`) — Joins epc_pp with GPS coords, journey times, IoD scores, POI proximity counts, derives `price_per_sqm` and numeric `construction_age_band`
4. **Transform POIs** (`pipeline/download/pois/transform.py`) — Drops unwanted categories, remaps to friendly names + emoji → `filtered_uk_pois.parquet`
Shared utilities live in `pipeline/utils/` (haversine distance for both numpy and Polars expressions, fuzzy address matching).
### Backend (`server-rs/`)
Rust + Axum. Loads `wide.parquet` and `filtered_uk_pois.parquet` into memory at startup with precomputed H3 indices (resolutions 711) and grid-based spatial indices (0.01° cells).
**API endpoints:**
- `GET /api/features` — Numeric column metadata with histograms and percentiles
- `GET /api/hexagons` — H3 aggregates filtered by bounds, resolution, and feature min/max
- `GET /api/pois` — POIs by bounds with optional category filter (max 5000)
- `GET /api/poi-categories` — Available POI categories
Also serves `frontend/dist/` as static fallback.
### Frontend (`frontend/`)
React 18 + TypeScript SPA. deck.gl `H3HexagonLayer` over MapLibre GL basemap. Debounces API calls (150ms) on viewport changes. TailwindCSS for styling.
## Key Implementation Details
- Bounds quantized to 0.01° to improve cache hits on both backend and frontend
- H3 hexagon results capped at 50,000 per request (truncated flag in response)
- POI proximity counting uses a spatial grid (0.05° cells, ~5km) to avoid O(n×m) distance checks
- Fuzzy address matching uses `thefuzz.token_sort_ratio` with numeric token compatibility checks, parallelized across postcode buckets
- The Rust server writes JSON via direct string buffer (avoids serde_json::Value allocations)
- POI transform validates exhaustive category coverage — pipeline fails if any OSM category is unmapped
## Data Sources
- **Land Registry** — Price Paid bulk download
- **EPC** — Energy Performance Certificates (domestic)
- **ArcGIS** — Postcode → GPS/LSOA lookup
- **OpenStreetMap** — POIs from Geofabrik Great Britain PBF
- **IoD 2025** — English Indices of Deprivation (LSOA-level scores)
- **TFL API** — Journey time calculations to configurable destinations