perfect-postcode/CLAUDE.md
2026-02-01 19:29:07 +00:00

229 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
NEVER EVER RUN GIT COMMANDS!!
## Project Overview
Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, crime statistics, ethnicity data, broadband speeds, school ratings, road noise, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend.
## Commands
All commands use [Task](https://taskfile.dev) runner. Python uses `uv run`. Frontend uses `npm run` from `frontend/`.
```bash
# Development servers
task dev:server # Rust backend on :8001 (cargo run --release)
task dev:frontend # Webpack dev server on :3030 (proxies /api to :8001)
# Data pipeline
task prepare # Build wide.parquet from all pre-downloaded sources
# Quality
task lint # Lint all: Python (ruff) + TypeScript (ESLint+Prettier) + Rust (clippy+fmt)
task format # Auto-fix formatting for all languages
task test # Python tests (fuzzy join, haversine, POI counts)
task check # Full validation: lint + build + test
# Building
task build:frontend # TypeScript typecheck + webpack production build
task build:server # cargo build --release (NOTE: dir is wrong in Taskfile, run from server-rs/)
# Granular lint/format
task lint:python # uv run ruff check .
task lint:frontend # eslint + prettier --check
task lint:rust # cargo clippy -- -D warnings && cargo fmt --check
task format:python # ruff check --fix && ruff format
task format:frontend # eslint --fix + prettier --write
task format:rust # cargo fmt --all
```
Running individual tests:
```bash
uv run pytest pipeline/utils/test_haversine.py # Single test file
uv run pytest pipeline/utils/test_haversine.py -k "test_name" # Single test
```
## Architecture
### Data Flow
```
Raw sources → [Download scripts] → data/*.parquet
→ [Fuzzy join EPC ↔ Price-Paid] → epc_pp.parquet
→ [Merge all datasets] → wide.parquet
→ [Rust server loads into memory + precomputes H3 + spatial grid]
→ [Frontend renders deck.gl H3HexagonLayer over MapLibre GL]
```
### Data Pipeline (`pipeline/`)
Python + Polars. Two phases:
1. **Download** (`pipeline/download/`) — Each script fetches one raw dataset into `data/`
2. **Transform** (`pipeline/transform/`) — Joins and derives features:
- `join_epc_pp.py` — Fuzzy-joins EPC ↔ price-paid by address within postcode buckets
- `merge.py`**Main pipeline**: joins all datasets → `wide.parquet` with human-readable column names
- `transform_poi.py` — Filters POIs, maps to friendly names + emoji (exhaustive category validation)
- `poi_proximity.py` — Counts POIs within 2km per postcode using 0.05° spatial grid
- `crime.py` — Aggregates crime CSVs into yearly averages by LSOA
**Critical: column renaming in `merge.py`** — The pipeline renames columns from snake_case to human-readable names before writing `wide.parquet`. The Rust server auto-discovers features from whatever column names exist in the parquet. Key renames:
- `pp_address``Address per Property Register`
- `postcode``Postcode`
- `latest_price``Last known price`
- `duration``Leashold/Freehold`
- `total_floor_area``Total floor area (sqm)`
- `current_energy_rating``Current energy rating`
The server and frontend must handle these human-readable names. See the full rename map in `merge.py`.
### Backend (`server-rs/`)
Rust + Axum. Loads parquet into memory at startup.
**Structure:**
- `data/property.rs` — Loads `wide.parquet`, auto-discovers numeric + enum features, computes histograms, sorts rows by spatial locality, precomputes H3 cells (resolutions 412)
- `data/poi.rs` — Loads `filtered_uk_pois.parquet`
- `index.rs``GridIndex`: 0.01° spatial grid for O(1) cell lookup
- `filter.rs` — Parses filter strings and checks rows. Format: `name:min:max` (numeric), `name:val1|val2` (enum)
- `routes/` — One file per endpoint
- `consts.rs` — Key constants (histogram bins, H3 range, max enum cardinality, excluded columns)
**API endpoints:**
- `GET /api/features` — Feature metadata with histograms and 2nd/98th percentiles
- `GET /api/hexagons?resolution=&bounds=&filters=` — H3 aggregates (min/max per feature per hex)
- `GET /api/hexagon-properties?h3=&resolution=&filters=&limit=&offset=` — Paginated properties within a hexagon
- `GET /api/pois?bounds=&categories=` — POIs by bounds (max 5000)
- `GET /api/poi-categories` — Available POI category names
Serves `frontend/dist/` as static fallback in production.
**Data representation:**
- Numeric features: row-major flat `Vec<f64>`, NaN = null
- Enum features: `Vec<u8>` indices into value list, 255 = null
- String fields (address, postcode): `Vec<String>`, empty = null
- The server accepts the parquet path as a CLI argument (defaults to `data_sources/processed/wide.parquet`)
### Frontend (`frontend/`)
React 18 + TypeScript. deck.gl `H3HexagonLayer` over MapLibre GL. TailwindCSS. No state management library — pure React hooks.
**Key patterns:**
- `App.tsx` manages all state, API fetching (150ms debounce), and URL state sync (300ms debounce)
- URL encodes view/filters/POI categories/active tab as query params for shareable links
- AbortControllers cancel in-flight requests on new queries
- Zoom → H3 resolution: `<7→7, <9.5→8, <11→9, <13→10, ≥13→11`
- Bounds quantized to 0.01° to match backend caching
- Properties pane uses feature names from API response (human-readable), not hardcoded field names
- Proxy: dev server on :3030 proxies `/api` to :8001; also handles VS Code `/proxy/PORT` patterns
## Frontend Design Guide (STRICT — must be followed for all UI changes)
The frontend uses Tailwind's `darkMode: 'class'` strategy. The `dark` class is toggled on `<html>`. Every visible element must have both light and dark styles. **Never add a light-only color class without its `dark:` counterpart.** Run `task build:frontend` after any UI change to verify.
### Theme System
- **State**: `App.tsx` owns a `theme` state (`'light' | 'dark' | 'system'`), persisted in `localStorage` under the key `theme`, default `'system'`.
- **Effective theme**: When `'system'`, resolved via `window.matchMedia('(prefers-color-scheme: dark)')`. A `change` listener re-renders on OS preference flip.
- **Toggle cycle**: light → dark → system → light. Three-way, not binary.
- **Flash prevention**: `index.html` contains an inline `<script>` that applies the `dark` class before first paint. If the localStorage/matchMedia logic in that script changes, update it to match `App.tsx`.
- **Prop plumbing**: `effectiveTheme` (`'light' | 'dark'`) is passed as a prop to `<Map>` and `<HomePage>`. Components that need the resolved theme must receive it as a prop — do not read localStorage or matchMedia inside child components.
### Color Token Reference
Every UI element must use the correct token from this table. Do not invent new pairings.
| Role | Light class | Dark class | Hex (dark) |
|------|------------|------------|------------|
| **Page / pane background** | `bg-warm-50` or `bg-white` | `dark:bg-warm-900` | #1c1917 |
| **Card / elevated surface** | `bg-white` | `dark:bg-warm-800` | #292524 |
| **Inset / recessed surface** | `bg-warm-100` or `bg-warm-50` | `dark:bg-warm-800` | #292524 |
| **Input / select background** | `bg-white` | `dark:bg-warm-800` or `dark:bg-warm-900` | |
| **Primary border** | `border-warm-200` | `dark:border-warm-700` | #44403c |
| **Subtle border (dividers)** | `border-warm-100` | `dark:border-warm-800` | #292524 |
| **Primary text (headings)** | `text-navy-950` or implicit dark | `dark:text-warm-100` | #f5f5f4 |
| **Body text** | `text-warm-700` | `dark:text-warm-300` | #d6d3d1 |
| **Secondary text (labels, hints)** | `text-warm-500` or `text-warm-600` | `dark:text-warm-400` | #a8a29e |
| **Disabled / placeholder text** | `text-warm-400` / `placeholder-warm-400` | `dark:text-warm-500` / `dark:placeholder-warm-500` | #78716c |
| **Accent text (links, actions)** | `text-teal-600` | `dark:text-teal-400` | #1de4c3 |
| **Accent hover text** | `hover:text-teal-800` | `dark:hover:text-teal-300` | #51f7d9 |
| **Accent background (highlights)** | `bg-teal-50` | `dark:bg-teal-900/30` | |
| **Active ring / focus ring** | `ring-teal-400` | same — works in both | |
| **Price / key metric text** | `text-teal-700` | `dark:text-teal-400` | |
| **Remove / close button** | `text-warm-400 hover:text-warm-700` | `dark:hover:text-warm-300` | |
| **Checkbox accent** | `accent-teal-600` | same — works in both | |
| **Header (unchanged both modes)** | `bg-navy-900 text-white` | same | |
### Mapping Rules for Specific Contexts
**Sidebars (Filters, POIPane, PropertiesPane, right-pane tabs):**
- Container: `bg-white dark:bg-warm-900`
- Inner cards / dropdown menus: `bg-white dark:bg-warm-800`
- Borders: `border-warm-200 dark:border-warm-700`
- Tab text (active): add `dark:text-warm-100`
- Tab text (inactive): `text-warm-600 dark:text-warm-400`
**Map overlays (PostcodeSearch, MapLegend, POI popup, loading indicator):**
- Background: `bg-white dark:bg-warm-800`
- Text: `dark:text-warm-200`
- Semi-transparent variants: use `/90` opacity suffix (e.g. `dark:bg-warm-800/90`)
- Deck.gl tooltip (inline styles, not Tailwind): use `#292524` bg / `#e7e5e4` text / `rgba(0,0,0,0.5)` shadow in dark.
- Deck.gl postcode labels (RGB arrays): `[220,220,220,220]` text / `[30,30,30,200]` outline in dark; inverse in light.
**Map basemaps:**
- Light: `https://basemaps.cartocdn.com/gl/voyager-gl-style/style.json`
- Dark: `https://basemaps.cartocdn.com/gl/dark-matter-gl-style/style.json`
- `handleMapLoad` must only apply label/water tweaks in light mode. Dark Matter has good defaults.
**HomePage (landing page):**
- Page bg: `bg-warm-50 dark:bg-warm-900`
- Cards: `bg-white dark:bg-warm-800` with `border-warm-200 dark:border-warm-700`
- Backdrop-blur panels: use `/60` or `/40` opacity on both `bg-warm-50` and `dark:bg-warm-900`
- HexCanvas: reads `isDark` ref; uses dimmer fill (`#058172`) and stroke (`#0a665b`) at 60% opacity multiplier.
- All headings: `dark:text-warm-100`. All body: `dark:text-warm-300` or `dark:text-warm-400`.
**DataSourcesPage:**
- Same card pattern as above. Footer is already dark (`bg-navy-900`) — no changes needed.
- License badges: `bg-warm-100 dark:bg-warm-700 text-warm-600 dark:text-warm-300`
- Links: `text-teal-600 dark:text-teal-400`
**DataSources floating button (on map):**
- `bg-white/90 dark:bg-warm-800/90` with `text-teal-600 dark:text-teal-400`
### Rules for New Components
1. **Every `bg-white` needs `dark:bg-warm-800` or `dark:bg-warm-900`.** Pane-level = warm-900, card-level = warm-800.
2. **Every `border-warm-200` needs `dark:border-warm-700`.**
3. **Every `text-warm-*` needs a `dark:text-warm-*` counterpart.** Follow the token table — don't guess.
4. **Every `text-teal-600` needs `dark:text-teal-400`.** Every `hover:text-teal-800` needs `dark:hover:text-teal-300`.
5. **Every `bg-teal-50` needs `dark:bg-teal-900/30`.**
6. **Every `hover:bg-warm-50` needs `dark:hover:bg-warm-700` or `dark:hover:bg-warm-800`.**
7. **Inputs and selects**: always add `dark:bg-warm-800 dark:text-warm-200 dark:border-warm-700`. Placeholders get `dark:placeholder-warm-500`.
8. **Checkboxes**: always include `accent-teal-600 rounded`.
9. **Do not use Tailwind `dark:` classes inside deck.gl layers or canvas code.** Use the `theme` prop / ref and conditional JS values.
10. **Do not add `transition-*` classes for theme switching.** The global CSS rule in `index.css` handles transitions for `background-color`, `border-color`, and `color` on all standard HTML elements. Adding per-element transition classes will conflict.
11. **Never hardcode hex colors in JSX `style=` props for themed elements** (except deck.gl tooltip and canvas, which can't use Tailwind). Use the Tailwind classes from the token table instead.
12. **The header (`bg-navy-900`) is identical in both themes.** Do not add dark variants to it.
### Verification Checklist (for any UI PR)
- [ ] `task build:frontend` passes with no errors
- [ ] Every new `bg-*`, `text-*`, `border-*` class has a `dark:` counterpart (search your diff)
- [ ] Toggle through all three modes (light → dark → system) with no flash
- [ ] Map basemap switches when theme changes
- [ ] Sidebars, dropdowns, and popups are readable in both modes
- [ ] HomePage and DataSourcesPage adapt correctly
## Key Implementation Details
- **Spatial sort**: Rows sorted by 0.01° grid cell at load time for cache-friendly sequential access
- **Row-major layout**: `feature_data[row * num_features + feat_idx]` — all features for one property are contiguous
- **H3 precomputation**: Resolutions 412 computed in parallel (rayon) at startup
- **Histogram percentiles without sorting**: O(n) two-pass algorithm — build histogram, interpolate percentiles
- **Direct JSON writing**: Hexagon endpoint writes JSON via string buffer, avoids serde_json::Value allocations
- **POI transform validation**: Fails if any OSM category is unmapped — guarantees exhaustive coverage
- **Fuzzy join**: Groups by postcode, uses `thefuzz.token_sort_ratio` with numeric token compatibility, greedy assignment from highest score
- **Filter bounds format**: `south,west,north,east` (not standard bbox order)
- **POI proximity**: Uses 0.05° grid (~5km cells) to reduce candidates before haversine distance check