perfect-postcode/CLAUDE.md
2026-01-31 10:19:48 +00:00

4 KiB
Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Property Map is a full-stack geospatial application for visualizing UK property data on an interactive map. It combines Land Registry price-paid data, EPC energy certificates, postcode geolocation, TFL journey times, Index of Deprivation scores, and OpenStreetMap POIs into a single wide parquet file, then serves aggregated H3 hexagon statistics and POI data via a Rust backend.

Commands

All commands use Task runner.

task prepare        # Full setup: install deps, download data (~GB), run pipeline
task server         # Rust backend on :8001 (cargo run --release)
task frontend       # Webpack dev server on :3030 (proxies /api to :8001)

task lint           # Lint Python (ruff) + TypeScript (ESLint + Prettier)
task format         # Auto-fix formatting (ruff + ESLint + Prettier)
task typecheck      # TypeScript type checking
task check          # All checks (lint + typecheck + build)
task test           # Run Python tests (fuzzy join)
task build          # Build frontend for production

Python commands use uv run. Frontend commands use npm run from frontend/.

Architecture

Data Pipeline (pipeline/)

Python + Polars. Orchestrated by pipeline/run.py which builds data_sources/processed/wide.parquet:

  1. Download (pipeline/download/) — Fetches raw data into data_sources/:
    • arcgis.py — Postcode → lat/lon/LSOA mappings
    • price_paid.py — Land Registry price-paid records
    • pois/ — OpenStreetMap POIs via osmium (PBF parsing)
    • deprivation_data.py — English Indices of Deprivation 2025
  2. Join (pipeline/epc_pp.py) — Fuzzy-joins EPC certificates with price-paid by address within postcode buckets → epc_pp.parquet
  3. Widen (pipeline/run.py) — Joins epc_pp with GPS coords, journey times, IoD scores, POI proximity counts, derives price_per_sqm and numeric construction_age_band
  4. Transform POIs (pipeline/download/pois/transform.py) — Drops unwanted categories, remaps to friendly names + emoji → filtered_uk_pois.parquet

Shared utilities live in pipeline/utils/ (haversine distance for both numpy and Polars expressions, fuzzy address matching).

Backend (server-rs/)

Rust + Axum. Loads wide.parquet and filtered_uk_pois.parquet into memory at startup with precomputed H3 indices (resolutions 711) and grid-based spatial indices (0.01° cells).

API endpoints:

  • GET /api/features — Numeric column metadata with histograms and percentiles
  • GET /api/hexagons — H3 aggregates filtered by bounds, resolution, and feature min/max
  • GET /api/pois — POIs by bounds with optional category filter (max 5000)
  • GET /api/poi-categories — Available POI categories

Also serves frontend/dist/ as static fallback.

Frontend (frontend/)

React 18 + TypeScript SPA. deck.gl H3HexagonLayer over MapLibre GL basemap. Debounces API calls (150ms) on viewport changes. TailwindCSS for styling.

Key Implementation Details

  • Bounds quantized to 0.01° to improve cache hits on both backend and frontend
  • H3 hexagon results capped at 50,000 per request (truncated flag in response)
  • POI proximity counting uses a spatial grid (0.05° cells, ~5km) to avoid O(n×m) distance checks
  • Fuzzy address matching uses thefuzz.token_sort_ratio with numeric token compatibility checks, parallelized across postcode buckets
  • The Rust server writes JSON via direct string buffer (avoids serde_json::Value allocations)
  • POI transform validates exhaustive category coverage — pipeline fails if any OSM category is unmapped

Data Sources

  • Land Registry — Price Paid bulk download
  • EPC — Energy Performance Certificates (domestic)
  • ArcGIS — Postcode → GPS/LSOA lookup
  • OpenStreetMap — POIs from Geofabrik Great Britain PBF
  • IoD 2025 — English Indices of Deprivation (LSOA-level scores)
  • TFL API — Journey time calculations to configurable destinations