perfect-postcode/CLAUDE.md

22 KiB
Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

NEVER EVER RUN GIT COMMANDS!!

Project Overview

Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, crime statistics, ethnicity data, broadband speeds, school ratings, road noise, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend.

Commands

All commands use Task runner. Python uses uv run. Frontend uses npm run from frontend/.

# Development servers
task dev:server           # Rust backend on :8001 (cargo run --release)
task dev:frontend         # Webpack dev server on :3001 (proxies /api to :8001)

# Data pipeline
task prepare              # Build wide.parquet from all pre-downloaded sources

# Assets
task download:map-assets  # Download font glyphs + twemoji PNGs into frontend/public/assets/

# Quality
task lint                 # Lint all: Python (ruff) + TypeScript (ESLint+Prettier) + Rust (clippy+fmt)
task format               # Auto-fix formatting for all languages
task test                 # Python tests (fuzzy join, haversine, POI counts)
task check                # Full validation: lint + build + test

# Building
task build:frontend       # TypeScript typecheck + webpack production build
task build:server         # cargo build --release (NOTE: dir is wrong in Taskfile, run from server-rs/)

# Granular lint/format
task lint:python          # uv run ruff check .
task lint:frontend        # eslint + prettier --check
task lint:rust            # cargo clippy -- -D warnings && cargo fmt --check
task format:python        # ruff check --fix && ruff format
task format:frontend      # eslint --fix + prettier --write
task format:rust          # cargo fmt --all

Running individual tests:

uv run pytest pipeline/utils/test_haversine.py       # Single test file
uv run pytest pipeline/utils/test_haversine.py -k "test_name"  # Single test

Architecture

Data Flow

Raw sources → [Download scripts] → data/*.parquet
  → [Fuzzy join EPC ↔ Price-Paid] → epc_pp.parquet
  → [Merge all datasets] → wide.parquet
  → [Rust server loads into memory + precomputes H3 + spatial grid]
  → [Frontend renders deck.gl H3HexagonLayer over MapLibre GL]

Data Pipeline (pipeline/)

Python + Polars. Two phases:

  1. Download (pipeline/download/) — Each script fetches one raw dataset into data/
  2. Transform (pipeline/transform/) — Joins and derives features:
    • join_epc_pp.py — Fuzzy-joins EPC ↔ price-paid by address within postcode buckets
    • merge.pyMain pipeline: joins all datasets → wide.parquet with human-readable column names
    • transform_poi.py — Filters POIs, maps to friendly names + emoji (exhaustive category validation)
    • poi_proximity.py — Counts POIs within 2km per postcode using 0.05° spatial grid
    • crime.py — Aggregates crime CSVs into yearly averages by LSOA

Critical: column renaming in merge.py — The pipeline renames columns from snake_case to human-readable names before writing wide.parquet. The Rust server auto-discovers features from whatever column names exist in the parquet. Key renames:

  • pp_addressAddress per Property Register
  • postcodePostcode
  • latest_priceLast known price
  • durationLeashold/Freehold
  • total_floor_areaTotal floor area (sqm)
  • current_energy_ratingCurrent energy rating

The server and frontend must handle these human-readable names. See the full rename map in merge.py.

Backend (server-rs/)

Rust + Axum. Loads parquet into memory at startup.

Structure (uses Rust 2018 module style — foo.rs + foo/ directory, not foo/mod.rs):

  • data.rs + data/ — Property and POI data loading
  • parsing.rs + parsing/ — Filter parsing and bounds parsing
  • routes.rs + routes/ — One file per endpoint
  • utils.rs + utils/ — GridIndex, hashing, interned columns
  • consts.rs — Key constants (histogram bins, H3 range, max enum cardinality, excluded columns)

API endpoints:

  • GET /api/features — Feature metadata with histograms and 2nd/98th percentiles
  • GET /api/hexagons?resolution=&bounds=&filters=&fields= — H3 aggregates (min/max per feature per hex), AABB-filtered to bounds
  • GET /api/postcodes?bounds=&filters=&fields= — Postcode polygon aggregates, AABB-filtered to bounds
  • GET /api/postcode/:postcode — Single postcode lookup (centroid + polygon)
  • GET /api/hexagon-properties?h3=&resolution=&filters=&limit=&offset= — Paginated properties within a hexagon
  • GET /api/pois?bounds=&categories= — POIs by bounds (max 5000)
  • GET /api/poi-categories — Available POI category names

Serves frontend/dist/ as static fallback in production.

Data representation (unified model):

  • All features (numeric and enum): row-major flat Vec<f32>, NaN = null
  • Enum features: stored as f32 indices (0.0, 1.0, 2.0...) with enum_values: FxHashMap<usize, Vec<String>> mapping feature index → string values
  • String fields (address, postcode): interned/packed for memory efficiency
  • The server accepts the parquet path as a CLI argument (defaults to data_sources/processed/wide.parquet)

Frontend (frontend/)

React 18 + TypeScript. deck.gl H3HexagonLayer over MapLibre GL. TailwindCSS. No state management library — pure React hooks.

Architecture:

  • App.tsx — Minimal router: loads features/POI categories, handles page navigation (home/dashboard/data-sources/faq)
  • MapPage.tsx — Dashboard layout: composes map + left/right panes, uses custom hooks for all logic
  • Custom hooks in hooks/ encapsulate stateful logic:
    • useMapData — Hexagon/postcode fetching, bounds, loading state, color range calculation
    • useFilters — Filter state and handlers (add/remove/change/drag/pin)
    • useHexagonSelection — Selection state, area stats, properties fetching
    • usePOIData — POI fetching with debounce
    • usePaneResize — Reusable pane resize handlers
    • useTheme — Theme state with localStorage persistence
    • useUrlSync — URL state synchronization

Key patterns:

  • URL encodes view/filters/POI categories/active tab as query params for shareable links
  • AbortControllers cancel in-flight requests on new queries (150ms debounce)
  • Zoom → H3 resolution defined in consts.ts ZOOM_TO_RESOLUTION_THRESHOLDS: <7.5→5, <9.5→6, <10.5→8, <12→9, ≥12→10
  • POSTCODE_ZOOM_THRESHOLD = 15: below 15 shows H3 hexagons, at/above 15 shows postcode polygons
  • Viewport bounds computed via getBoundsFromViewState() in map-utils.ts — uses Web Mercator math with TILE_SIZE=512 (MapLibre/deck.gl convention, NOT 256)
  • Properties pane uses feature names from API response (human-readable), not hardcoded field names
  • Proxy: dev server on :3001 proxies /api to :8001; also handles VS Code /proxy/PORT patterns

Shared UI Components (frontend/src/components/ui/):

  • icons/ — One file per icon (CloseIcon, InfoIcon, EyeIcon, PlusIcon, ChevronIcon, FilterIcon, LightbulbIcon, DownloadIcon, MapPinIcon, CheckIcon, ClipboardIcon, SunIcon, MoonIcon, SpinnerIcon). All accept className prop. Never inline SVGs — always extract to this folder.
  • IconButton.tsx — Reusable icon button wrapper with consistent hover states. Accepts active prop for teal highlight.
  • SearchInput.tsx — Styled search input with dark mode support. Used in Filters, POIPane, PropertiesPane.
  • PaneHeader.tsx — Reusable pane header with title, optional subtitle, info button, and close button.
  • SelectionButtons.tsx — "All" / "None" selection buttons for checkbox lists.
  • TabButton.tsx — Tab button with active state styling. Used in right pane tabs.
  • EmptyState.tsx — Empty state display with icon, title, description. Also exports PaneEmptyState for centered pane messages.
  • CheckboxList.tsx — Checkbox list with toggle logic. Variants for array and Set-based selection.

Shared Components (frontend/src/components/):

  • FeatureInfoPopup.tsx — Popup showing feature name, description, detail, and "View data source" link.
  • FeatureIcons.tsxFeatureActions component combining eye/info/add/remove icons for feature rows.

Shared Utilities (frontend/src/lib/):

  • api.tsapiUrl(endpoint, params?) builds API URLs. logNonAbortError(label, err) and isAbortError(err) for error handling.
  • features.tsgroupFeaturesByCategory(features) groups FeatureMeta[] by their group field.
  • format.tsformatNumber(value, decimals) for number formatting. calculateHistogramMean(histogram) for weighted mean calculation.
  • property-fields.tsgetNum(property, ...keys) for getting numeric property values with fallback field names.

When adding new UI, prefer using these shared components over inline implementations to maintain consistency.

When to extract vs inline:

  • Extract to hooks/: Stateful logic with useState/useEffect/useCallback that can be named as a cohesive unit (e.g., useFilters, useMapData). If a component has 5+ related state variables and handlers, extract them to a hook.
  • Extract to page component: Layout + hook composition for a major view (e.g., MapPage composes useMapData + useFilters + child components). Keep App.tsx focused on routing.
  • Extract to ui/ component: Repeated 3+ times with same styling (buttons, inputs, icons)
  • Extract to lib/: Pure functions used across components (formatting, calculations, lookups)
  • Keep inline: One-off UI specific to a single component

Component size guideline: If a component exceeds ~300 lines, look for extraction opportunities. Large components are usually doing too much — split into hooks (for logic) and child components (for UI sections).

Naming conventions:

  • UI components: PascalCase, noun-based (TabButton, EmptyState)
  • Utilities: camelCase verb-based (formatNumber, calculateHistogramMean)

Frontend Design Guide (STRICT — must be followed for all UI changes)

The frontend uses Tailwind's darkMode: 'class' strategy. The dark class is toggled on <html>. Every visible element must have both light and dark styles. Never add a light-only color class without its dark: counterpart. Run task build:frontend after any UI change to verify.

Theme System

  • State: App.tsx owns a theme state ('light' | 'dark' | 'system'), persisted in localStorage under the key theme, default 'system'.
  • Effective theme: When 'system', resolved via window.matchMedia('(prefers-color-scheme: dark)'). A change listener re-renders on OS preference flip.
  • Toggle cycle: light → dark → system → light. Three-way, not binary.
  • Flash prevention: index.html contains an inline <script> that applies the dark class before first paint. If the localStorage/matchMedia logic in that script changes, update it to match App.tsx.
  • Prop plumbing: effectiveTheme ('light' | 'dark') is passed as a prop to <Map> and <HomePage>. Components that need the resolved theme must receive it as a prop — do not read localStorage or matchMedia inside child components.

Color Token Reference

Every UI element must use the correct token from this table. Do not invent new pairings.

Role Light class Dark class Hex (dark)
Page / pane background bg-warm-50 or bg-white dark:bg-warm-900 #1c1917
Card / elevated surface bg-white dark:bg-warm-800 #292524
Inset / recessed surface bg-warm-100 or bg-warm-50 dark:bg-warm-800 #292524
Input / select background bg-white dark:bg-warm-800 or dark:bg-warm-900
Primary border border-warm-200 dark:border-warm-700 #44403c
Subtle border (dividers) border-warm-100 dark:border-warm-800 #292524
Primary text (headings) text-navy-950 or implicit dark dark:text-warm-100 #f5f5f4
Body text text-warm-700 dark:text-warm-300 #d6d3d1
Secondary text (labels, hints) text-warm-500 or text-warm-600 dark:text-warm-400 #a8a29e
Disabled / placeholder text text-warm-400 / placeholder-warm-400 dark:text-warm-500 / dark:placeholder-warm-500 #78716c
Accent text (links, actions) text-teal-600 dark:text-teal-400 #1de4c3
Accent hover text hover:text-teal-800 dark:hover:text-teal-300 #51f7d9
Accent background (highlights) bg-teal-50 dark:bg-teal-900/30
Active ring / focus ring ring-teal-400 same — works in both
Price / key metric text text-teal-700 dark:text-teal-400
Remove / close button text-warm-400 hover:text-warm-700 dark:hover:text-warm-300
Checkbox accent accent-teal-600 same — works in both
Header (unchanged both modes) bg-navy-900 text-white same

Mapping Rules for Specific Contexts

Sidebars (Filters, POIPane, PropertiesPane, right-pane tabs):

  • Container: bg-white dark:bg-warm-900
  • Inner cards / dropdown menus: bg-white dark:bg-warm-800
  • Borders: border-warm-200 dark:border-warm-700
  • Tab text (active): add dark:text-warm-100
  • Tab text (inactive): text-warm-600 dark:text-warm-400

Map overlays (PostcodeSearch, MapLegend, POI popup, loading indicator):

  • Background: bg-white dark:bg-warm-800
  • Text: dark:text-warm-200
  • Semi-transparent variants: use /90 opacity suffix (e.g. dark:bg-warm-800/90)
  • Deck.gl tooltip (inline styles, not Tailwind): use #292524 bg / #e7e5e4 text / rgba(0,0,0,0.5) shadow in dark.
  • Deck.gl postcode labels (RGB arrays): [220,220,220,220] text / [30,30,30,200] outline in dark; inverse in light.

Map basemaps:

  • Self-hosted Protomaps tiles served from PMTiles via /api/tiles/{z}/{x}/{y}
  • Style built by @protomaps/basemaps library with namedFlavor(theme) for light/dark
  • Font glyphs and twemoji PNGs served locally from frontend/public/assets/ (no external CDN deps at runtime)
  • CopyWebpackPlugin copies frontend/public/dist/ on build; Rust ServeDir fallback serves them in prod
  • Download assets with task download:map-assets (script: pipeline/download/map_assets.py)

HomePage (landing page):

  • Page bg: bg-warm-50 dark:bg-warm-900
  • Cards: bg-white dark:bg-warm-800 with border-warm-200 dark:border-warm-700
  • Backdrop-blur panels: use /60 or /40 opacity on both bg-warm-50 and dark:bg-warm-900
  • HexCanvas: reads isDark ref; uses dimmer fill (#058172) and stroke (#0a665b) at 60% opacity multiplier.
  • All headings: dark:text-warm-100. All body: dark:text-warm-300 or dark:text-warm-400.

DataSourcesPage:

  • Same card pattern as above. Footer is already dark (bg-navy-900) — no changes needed.
  • License badges: bg-warm-100 dark:bg-warm-700 text-warm-600 dark:text-warm-300
  • Links: text-teal-600 dark:text-teal-400

DataSources floating button (on map):

  • bg-white/90 dark:bg-warm-800/90 with text-teal-600 dark:text-teal-400

Rules for New Components

  1. Every bg-white needs dark:bg-warm-800 or dark:bg-warm-900. Pane-level = warm-900, card-level = warm-800.
  2. Every border-warm-200 needs dark:border-warm-700.
  3. Every text-warm-* needs a dark:text-warm-* counterpart. Follow the token table — don't guess.
  4. Every text-teal-600 needs dark:text-teal-400. Every hover:text-teal-800 needs dark:hover:text-teal-300.
  5. Every bg-teal-50 needs dark:bg-teal-900/30.
  6. Every hover:bg-warm-50 needs dark:hover:bg-warm-700 or dark:hover:bg-warm-800.
  7. Inputs and selects: always add dark:bg-warm-800 dark:text-warm-200 dark:border-warm-700. Placeholders get dark:placeholder-warm-500.
  8. Checkboxes: always include accent-teal-600 rounded.
  9. Do not use Tailwind dark: classes inside deck.gl layers or canvas code. Use the theme prop / ref and conditional JS values.
  10. Do not add transition-* classes for theme switching. The global CSS rule in index.css handles transitions for background-color, border-color, and color on all standard HTML elements. Adding per-element transition classes will conflict.
  11. Never hardcode hex colors in JSX style= props for themed elements (except deck.gl tooltip and canvas, which can't use Tailwind). Use the Tailwind classes from the token table instead.
  12. The header (bg-navy-900) is identical in both themes. Do not add dark variants to it.

Verification Checklist (for any UI PR)

  • task build:frontend passes with no errors
  • Every new bg-*, text-*, border-* class has a dark: counterpart (search your diff)
  • Toggle through all three modes (light → dark → system) with no flash
  • Map basemap switches when theme changes
  • Sidebars, dropdowns, and popups are readable in both modes
  • HomePage and DataSourcesPage adapt correctly

Coding Preferences

  • Unified data models over special-casing: Prefer storing different data types uniformly (e.g., enums as f32 indices alongside numeric features) rather than maintaining separate code paths
  • Terse tests: Test what matters in as few tests as possible — don't overcomplicate with excessive setup or edge cases that don't add value
  • Extract and organize: Group related utilities into proper modules (e.g., utils/, parsing/) rather than leaving helpers scattered
  • Inline module tests: Place #[cfg(test)] mod tests { } at the bottom of each module file rather than in separate test files
  • Decompose large React components: Extract stateful logic into custom hooks (useXxx), extract page layouts into page components. App.tsx should only handle routing and initial data loading. Each hook should encapsulate one cohesive concern (e.g., useFilters owns filter state + all filter handlers).

Rust Code Style (server-rs)

Follow these conventions in all Rust code:

  1. Module style: Use Rust 2018 module naming — foo.rs + foo/ directory, NOT foo/mod.rs
  2. Imports over inline paths: Import items at the top of the file, don't use crate:: inline in code
    // Good
    use crate::utils::generate_priorities;
    let p = generate_priorities(n);
    
    // Bad
    let p = crate::utils::generate_priorities(n);
    
  3. Tracing macros: Import and use short form, not fully qualified
    // Good
    use tracing::{info, warn};
    info!("message");
    
    // Bad
    tracing::info!("message");
    
  4. JSON serialization: Use serde_json with #[derive(Serialize)] structs, not manual string building
  5. Precompute at startup: For static/rarely-changing responses, compute once at startup and store in AppState
  6. Unique placeholders: When injecting content into HTML, use distinctive markers like __PERFECT_POSTCODES_OG_TAGS__ that won't accidentally match other content

Key Implementation Details

  • Spatial sort: Rows sorted by 0.01° grid cell at load time for cache-friendly sequential access
  • Row-major layout: feature_data[row * num_features + feat_idx] — all features (numeric and enum) for one property are contiguous
  • H3 precomputation: Resolutions 412 computed in parallel (rayon) at startup
  • Histogram percentiles without sorting: O(n) two-pass algorithm — build histogram, interpolate percentiles
  • Startup precomputation: Static responses (like /api/features) are computed once at startup and cached in AppState
  • POI transform validation: Fails if any OSM category is unmapped — guarantees exhaustive coverage
  • Fuzzy join: Groups by postcode, uses thefuzz.token_sort_ratio with numeric token compatibility, greedy assignment from highest score
  • Filter bounds format: south,west,north,east (not standard bbox order)
  • Server-side AABB filtering: Both /api/hexagons and /api/postcodes filter results by bounding-box intersection with query bounds. Hexagons use h3_cell_bounds() (h3o returns degrees, not radians). Postcodes compute polygon AABB from vertices. See bounds_intersect() in parsing/bounds.rs.
  • GridIndex returns slightly more than requested: The 0.01° grid cells mean properties up to ~1km outside the viewport may be returned. The AABB filter in the route handlers catches these extras.
  • POI proximity: Uses 0.05° grid (~5km cells) to reduce candidates before haversine distance check
  • OG tag injection: Uses <meta name="x-og-placeholder" content="__PERFECT_POSTCODES_OG_TAGS__"/> placeholder in HTML, replaced at runtime by middleware

Rust Performance Patterns (server-rs)

Lookup optimization:

  • AppState.feature_name_to_index: FxHashMap<String, usize> for O(1) feature lookups (used in filter parsing, field selection)
  • Never use .position() on feature_names in hot paths — always use the prebuilt HashMap
  • Enum filters use FxHashSet<u32> (f32 bits) for O(1) contains checks instead of Vec::contains

Hot loop patterns:

  • Hoist conditional branches outside loops when possible (e.g., if has_selective check moved outside aggregation loop in hexagons.rs)
  • Use into_par_iter() for file I/O (postcode GeoJSON loading) and CPU-bound startup work (H3 precomputation)

Cardinality counting:

  • Use FxHashSet with f32::to_bits() for O(n) unique value counting instead of collect→sort→dedup O(n log n)
  • For enum ordering, convert order slice to FxHashSet before filtering to get O(1) contains

Data structure choices:

  • CSR (Compressed Sparse Row) for GridIndex — single flat values array + offsets array eliminates per-cell Vec overhead
  • Box<[f32]> for fixed-size aggregation arrays — avoids Vec capacity field (8 bytes saved per cell)
  • Bit-packed booleans for flags like is_approx_build_date — 8x memory savings vs Vec<bool>

What NOT to optimize:

  • String cloning in JSON responses (~10-20 small strings) — negligible vs serialization overhead
  • GridIndex 3-pass build (min/max → count → fill) — necessary for CSR without O(n) extra memory
  • Arc for enum values — complexity not worth modest benefit