This commit is contained in:
Andras Schmelczer 2026-02-14 12:53:29 +00:00
parent 3a3f899ea2
commit 128b3191e7
68 changed files with 28060 additions and 1152 deletions

View file

@ -102,13 +102,13 @@ Rust + Axum. Loads parquet into memory at startup.
- `GET /api/pois?bounds=&categories=` — POIs by bounds (max 5000)
- `GET /api/poi-categories` — Available POI category names
Serves `frontend/dist/` as static fallback in production.
Serves `frontend/dist/` as static fallback in production **only** when `--dist` is explicitly provided. When `--dist` is set, the server panics at startup if `index.html` is unreadable. When omitted (dev mode), static serving and OG injection are disabled.
**Data representation (unified model):**
- All features (numeric and enum): row-major flat `Vec<f32>`, NaN = null
- Enum features: stored as f32 indices (0.0, 1.0, 2.0...) with `enum_values: FxHashMap<usize, Vec<String>>` mapping feature index → string values
- String fields (address, postcode): interned/packed for memory efficiency
- The server accepts the parquet path as a CLI argument (defaults to `data_sources/processed/wide.parquet`)
- All CLI args are required (no hidden defaults). Optional services use `Option<String>`: `r5_url` (travel time disabled when None), `pocketbase_admin_email`/`password` (collection auto-creation skipped when None). Required config like `ollama_model` and `public_url` must be explicitly provided via env or CLI.
### Frontend (`frontend/`)
@ -127,7 +127,7 @@ React 18 + TypeScript. deck.gl `H3HexagonLayer` over MapLibre GL. TailwindCSS. N
- `useUrlSync` — URL state synchronization
**Key patterns:**
- URL encodes view/filters/POI categories/active tab as query params for shareable links. Only the current format is supported — no legacy parameter parsing (old `v=`, `f=`, or tab abbreviations are not handled).
- URL encodes view/filters/POI categories/active tab as query params for shareable links. Only the current format is supported — no legacy parameter parsing (old `v=`, `f=`, or tab abbreviations are not handled). `tmode` is always serialized when travel time is active (no implicit default); parsing throws if `tmode` is missing when `dest` is present.
- AbortControllers cancel in-flight requests on new queries (150ms debounce)
- Zoom → H3 resolution defined in `consts.ts` `ZOOM_TO_RESOLUTION_THRESHOLDS`: `<7.5→5, <9.5→6, <10.5→8, <12→9, ≥12→10`
- `POSTCODE_ZOOM_THRESHOLD = 15`: below 15 shows H3 hexagons, at/above 15 shows postcode polygons
@ -271,7 +271,12 @@ Every UI element must use the correct token from this table. Do not invent new p
## Coding Preferences
- **No backwards compatibility, no silent fallbacks**: Never add fallback codepaths for old data formats, legacy URL parameters, or alternate field names. Never silently swallow errors — always error loudly (return an error, panic, or at minimum log). If something is wrong, the code should fail visibly. One canonical name per field, one format per API, one way to do things.
- **No backwards compatibility, no silent fallbacks**: Never add fallback codepaths for old data formats, legacy URL parameters, or alternate field names. Never silently swallow errors — always error loudly (return an error, panic, or at minimum log). If something is wrong, the code should fail visibly. One canonical name per field, one format per API, one way to do things. Specific patterns:
- Use `Option<String>` for truly optional config, never `default_value = ""` with `.is_empty()` checks
- Use `expect()` not `unwrap_or(0.0)` when a value is logically guaranteed to be present
- Return error responses on upstream failures, never silently drop results
- Don't add `#[serde(default)]` on `Option<T>` fields — serde already defaults them to `None`
- Required query params should be non-Option types so serde rejects missing params with 400 automatically
- **Unified data models over special-casing**: Prefer storing different data types uniformly (e.g., enums as f32 indices alongside numeric features) rather than maintaining separate code paths
- **Terse tests**: Test what matters in as few tests as possible — don't overcomplicate with excessive setup or edge cases that don't add value
- **Extract and organize**: Group related utilities into proper modules (e.g., `utils/`, `parsing/`) rather than leaving helpers scattered
@ -316,6 +321,7 @@ Follow these conventions in all Rust code:
- **Fuzzy join**: Groups by postcode, uses `thefuzz.token_sort_ratio` with numeric token compatibility, greedy assignment from highest score
- **Filter parsing is strict**: `parse_filters()` returns `Result` — malformed entries, unknown feature names, and unparseable numbers all return 400 Bad Request. No silent skipping of invalid filters.
- **Data loading is strict**: `extract_string_col` and `lookup_enum_value` take a single column name (no fallback names). H3 precomputation panics on invalid coordinates. Required parquet columns must exist at startup.
- **Travel time is strict**: `mode` param is required (400) when `destination` is set — no silent default to "car". R5 failures return 502 Bad Gateway, not silent omission. `r5_url` is `Option<String>` — returns 503 if travel time requested without R5 configured.
- **Filter bounds format**: `south,west,north,east` (not standard bbox order)
- **Server-side AABB filtering**: Both `/api/hexagons` and `/api/postcodes` filter results by bounding-box intersection with query bounds. Hexagons use `h3_cell_bounds()` (h3o returns degrees, not radians). Postcodes compute polygon AABB from vertices. See `bounds_intersect()` in `parsing/bounds.rs`.
- **GridIndex returns slightly more than requested**: The 0.01° grid cells mean properties up to ~1km outside the viewport may be returned. The AABB filter in the route handlers catches these extras.

View file

@ -15,7 +15,7 @@ RUN cargo build --release
# Stage 3: Runtime
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates && rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates curl && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=server /app/server-rs/target/release/property-map-server ./
COPY --from=frontend /app/frontend/dist ./frontend/dist/
@ -27,5 +27,7 @@ COPY property-data/uk.pmtiles ./data/
COPY manual-data/postcode_boundaries ./data/postcode_boundaries/
EXPOSE 8001
HEALTHCHECK --interval=30s --timeout=5s --start-period=120s --retries=3 \
CMD curl -f http://localhost:8001/health || exit 1
ENTRYPOINT ["./property-map-server"]
CMD ["--data", "/app/data/wide.parquet", "--pois", "/app/data/filtered_uk_pois.parquet", "--places", "/app/data/places.parquet", "--tiles", "/app/data/uk.pmtiles", "--postcodes", "/app/data/postcode_boundaries"]
CMD ["--data", "/app/data/wide.parquet", "--pois", "/app/data/filtered_uk_pois.parquet", "--places", "/app/data/places.parquet", "--tiles", "/app/data/uk.pmtiles", "--postcodes", "/app/data/postcode_boundaries", "--dist", "/app/frontend/dist"]

View file

@ -24,6 +24,8 @@ POI_PROXIMITY := $(DATA_DIR)/poi_proximity.parquet
EPC_PP := $(DATA_DIR)/epc_pp.parquet
WIDE := $(DATA_DIR)/wide.parquet
PRICE_INDEX := $(DATA_DIR)/price_index.parquet
RENO_PREMIUM := $(DATA_DIR)/renovation_premium.parquet
HEDONIC_MODEL := $(DATA_DIR)/hedonic_model.json
PRICES_STAMP := $(DATA_DIR)/.prices_done
EPC := $(MANUAL_DATA)/certificates.csv
JT_BANK := $(MANUAL_DATA)/journey_times_bank.parquet
@ -263,6 +265,13 @@ $(WIDE): $(EPC_PP) $(ARCGIS) $(IOD) $(POI_PROXIMITY) $(JT_BANK) $(JT_FITZROVIA)
$(PRICE_INDEX): $(WIDE)
uv run python -m pipeline.transform.price_index --input $(WIDE) --output $@
$(PRICES_STAMP): $(WIDE) $(PRICE_INDEX)
uv run python -m pipeline.transform.price_estimate --input $(WIDE) --index $(PRICE_INDEX)
$(RENO_PREMIUM): $(WIDE) $(PRICE_INDEX)
uv run python -m pipeline.transform.renovation_premium --input $(WIDE) --index $(PRICE_INDEX) --output $@
$(HEDONIC_MODEL): $(WIDE)
uv run python -m pipeline.transform.hedonic_quality --input $(WIDE) --output $@
$(PRICES_STAMP): $(WIDE) $(PRICE_INDEX) $(RENO_PREMIUM) $(HEDONIC_MODEL)
uv run python -m pipeline.transform.price_estimate --input $(WIDE) --index $(PRICE_INDEX) \
--renovation-premium $(RENO_PREMIUM) --hedonic-model $(HEDONIC_MODEL)
@touch $@

View file

@ -34,9 +34,6 @@ rm data/d29f0314840ef7dcbb5cde66e383fe08059dab5a.zip
https://xploria.co.uk/data-sources/
epc oopt out
We all care about different things in our homes and living environments. Some of us are weary of noise and would like to avoid living next to a loud airfield as much as possible. And some of us are avid plane spotters.
@ -77,17 +74,9 @@ make -f Makefile.data tiles
Add licensing to the app. By default, anonymous users can use the map but only in central london. if they try zooming out, the server refuses to provide data and the users will be prompted to buy a lifetime license to continue (or zoom back in). Just before buying a license, they have to register by providing their email address and password, then they need to complete the stripe check out workflow. Implement the full pocketbase/server/frontend integration. For admins, give an option to generate an invite link, opening which prompts you to register and gives you a free license forever. Have a cool animation with party poppers on the successful acquiring of a license. For non-admin users, allow inviting friends for 30% off the price. Also add a support page that shows my email address, and add a FAQ on the same page too.
-
- the area stastics are missing for postcodes, they only work for hexagons
- add blue/green rollout
Stop wrapping everything in cards. Be bold and stop being lazy around text formatting.
uv run python scripts/remove_bg.py house-og.png 200 house.png

View file

@ -19,7 +19,7 @@ tasks:
download:places:
desc: Extract place names from OSM PBF
cmds:
- uv run python -m pipeline.download.places --output ./property_data/places.parquet {{.CLI_ARGS}}
- uv run python -m pipeline.download.places --output ./property-data/places.parquet {{.CLI_ARGS}}
test:
desc: Run all tests (Python and Rust)

View file

@ -1,3 +1,7 @@
x-credentials:
pb-email: &pb-email admin@propertymap.local
pb-password: &pb-password propertymap-dev-2024
services:
server:
image: rust:1.84
@ -21,11 +25,14 @@ services:
- ./property-data:/app/data:ro
environment:
POCKETBASE_URL: http://pocketbase:8090
POCKETBASE_ADMIN_EMAIL: ${POCKETBASE_ADMIN_EMAIL:-}
POCKETBASE_ADMIN_PASSWORD: ${POCKETBASE_ADMIN_PASSWORD:-}
POCKETBASE_ADMIN_EMAIL: *pb-email
POCKETBASE_ADMIN_PASSWORD: *pb-password
SCREENSHOT_URL: http://screenshot:8002
OLLAMA_URL: http://host.docker.internal:11434
OLLAMA_MODEL: gpt-oss:20b
PUBLIC_URL: https://perfectpostcodes.schmelczer.dev
R5_URL: http://r5:8003
GOOGLE_MAPS_API_KEY: "AIzaSyBgBn9LjrxHCjb9j1LZbLYpEdCJj-NkHPY"
depends_on:
pocketbase:
condition: service_healthy
@ -83,6 +90,9 @@ services:
- pb-data:/pb/pb_data
networks:
- dev-network
environment:
PB_ADMIN_EMAIL: *pb-email
PB_ADMIN_PASSWORD: *pb-password
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8090/api/health"]
interval: 10s
@ -90,6 +100,47 @@ services:
retries: 3
start_period: 5s
gluetun:
image: qmcgaw/gluetun:v3.40.4
volumes:
- gluetun-cache-v2:/gluetun
- gluetun-auth:/gluetun/auth:ro
environment:
# See https://github.com/qdm12/gluetun-wiki/tree/main/setup#setup
VPN_SERVICE_PROVIDER: mullvad
VPN_TYPE: wireguard
WIREGUARD_PRIVATE_KEY: "8FFKmtTvDsZlShnKl/opDDwCwb9v2ox4+Kkl3wX+9Gw="
WIREGUARD_ADDRESSES: "10.66.109.86/32"
OWNED_ONLY: "yes"
UPDATER_PERIOD: 24h
SERVER_COUNTRIES: Serbia,Slovakia,Croatia,Austria,Denmark,Finland
TZ: $TIME_ZONE
restart: unless-stopped
ports:
- "1234:1234"
healthcheck:
test: "wget -q https://www.google.com || exit 1"
interval: 1m
timeout: 15s
retries: 2
cap_add:
- NET_ADMIN
devices:
- /dev/net/tun:/dev/net/tun
finder:
build: ./finder
init: true
network_mode: service:gluetun
volumes:
- ./finder:/app
- ./property-data/arcgis_data.parquet:/data/arcgis_data.parquet:ro
depends_on:
gluetun:
condition: service_healthy
restart: unless-stopped
r5:
init: true
build: ./r5-java
@ -119,6 +170,8 @@ volumes:
frontend-node-modules:
screenshot-cache:
r5-network:
gluetun-cache-v2:
gluetun-auth:
networks:
dev-network:

11
finder/Dockerfile Normal file
View file

@ -0,0 +1,11 @@
FROM python:3.12-slim
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app
COPY pyproject.toml ./
RUN uv pip install --system -r pyproject.toml
COPY main.py ./
CMD ["python3", "main.py"]

710
finder/main.py Normal file
View file

@ -0,0 +1,710 @@
import logging
import math
import os
import random
import re
import threading
import time
from collections import defaultdict
from dataclasses import dataclass, field
from pathlib import Path
import httpx
import polars as pl
from flask import Flask, jsonify, send_from_directory
# ---------------------------------------------------------------------------
# Logging
# ---------------------------------------------------------------------------
LOG_DIR = Path("/app/data")
LOG_DIR.mkdir(parents=True, exist_ok=True)
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
logging.StreamHandler(),
logging.FileHandler(LOG_DIR / "rightmove.log"),
],
)
log = logging.getLogger("rightmove")
log.setLevel(logging.DEBUG)
logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("httpcore").setLevel(logging.WARNING)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
ARCGIS_PATH = os.environ.get("ARCGIS_PATH", "/data/arcgis_data.parquet")
DATA_DIR = Path("/app/data")
PAGE_SIZE = 24
MAX_PAGES_PER_OUTCODE = 42 # 24*42 = 1008, safety cap per outcode
DELAY_BETWEEN_PAGES = 1.0
DELAY_BETWEEN_OUTCODES = 2.0
MAX_RETRIES = 3
RETRY_BASE_DELAY = 2.0
GRID_CELL_SIZE = 0.01 # degrees for postcode spatial index
SEED = 42
TYPEAHEAD_URL = "https://los.rightmove.co.uk/typeahead"
SEARCH_URL = "https://www.rightmove.co.uk/api/property-search/listing/search"
RIGHTMOVE_BASE = "https://www.rightmove.co.uk"
PROPERTY_TYPE_MAP = {
"Detached": "Detached",
"Semi-Detached": "Semi-Detached",
"Terraced": "Terraced",
"End of Terrace": "Terraced",
"Mid Terrace": "Terraced",
"Flat": "Flat",
"Maisonette": "Flat",
"Studio": "Flat",
"Apartment": "Flat",
"Penthouse": "Flat",
"Ground Flat": "Flat",
"Detached Bungalow": "Detached",
"Semi-Detached Bungalow": "Semi-Detached",
"Town House": "Terraced",
"Link Detached": "Detached",
"Link Detached House": "Detached",
"Bungalow": "Other",
"Cottage": "Other",
"Park Home": "Other",
"Land": "Other",
"Farm / Barn": "Other",
"House": "Detached",
"Not Specified": "Other",
"Chalet": "Other",
"Barn Conversion": "Other",
"Coach House": "Other",
"Character Property": "Other",
"Cluster House": "Other",
"Retirement Property": "Flat",
"Plot": "Other",
"Garages": "Other",
"Mews": "Terraced",
}
CHANNELS = [
{"channel": "BUY", "transactionType": "BUY", "sortType": "2"},
{"channel": "RENT", "transactionType": "LETTING", "sortType": "6"},
]
# ---------------------------------------------------------------------------
# Postcode spatial index
# ---------------------------------------------------------------------------
class PostcodeSpatialIndex:
"""Grid-based spatial index over arcgis postcodes for nearest-lookup."""
def __init__(self, lats: list[float], lngs: list[float], postcodes: list[str]):
self.grid: dict[tuple[int, int], list[tuple[float, float, str]]] = defaultdict(list)
for lat, lng, pcd in zip(lats, lngs, postcodes):
gx = int(math.floor(lng / GRID_CELL_SIZE))
gy = int(math.floor(lat / GRID_CELL_SIZE))
self.grid[(gx, gy)].append((lat, lng, pcd))
log.info("Postcode spatial index: %d cells, %d postcodes", len(self.grid), len(lats))
def nearest(self, lat: float, lng: float) -> str | None:
gx = int(math.floor(lng / GRID_CELL_SIZE))
gy = int(math.floor(lat / GRID_CELL_SIZE))
best_dist = float("inf")
best_pcd = None
for dx in range(-1, 2):
for dy in range(-1, 2):
for plat, plng, pcd in self.grid.get((gx + dx, gy + dy), []):
d = (plat - lat) ** 2 + (plng - lng) ** 2
if d < best_dist:
best_dist = d
best_pcd = pcd
return best_pcd
# ---------------------------------------------------------------------------
# Scrape status
# ---------------------------------------------------------------------------
@dataclass
class ScrapeStatus:
state: str = "idle" # idle | running | done | error
channel: str = ""
outcode: str = ""
outcodes_done: int = 0
outcodes_total: int = 0
properties_buy: int = 0
properties_rent: int = 0
errors: list[str] = field(default_factory=list)
started_at: float = 0.0
finished_at: float = 0.0
status = ScrapeStatus()
status_lock = threading.Lock()
debug_data: dict = {"last_response": None, "outcode_cache": {}}
# ---------------------------------------------------------------------------
# HTTP helpers
# ---------------------------------------------------------------------------
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
)
# Gluetun control API — runs on port 8000 inside the gluetun container.
# Since finder uses network_mode: service:gluetun, localhost IS gluetun.
GLUETUN_API = "http://127.0.0.1:8000"
_ip_rotate_lock = threading.Lock()
def rotate_ip() -> bool:
"""Ask gluetun to reconnect to a different VPN server, getting a new IP.
Returns True if the IP changed successfully."""
with _ip_rotate_lock:
log.info("Rotating VPN IP via gluetun...")
try:
# Get current IP
with httpx.Client(timeout=10) as ctl:
old_ip_resp = ctl.get(f"{GLUETUN_API}/v1/publicip/ip")
old_ip = old_ip_resp.json().get("public_ip", "unknown") if old_ip_resp.status_code == 200 else "unknown"
log.info("Current IP: %s", old_ip)
# Trigger server change — PUT with empty JSON body picks a random server
resp = ctl.put(f"{GLUETUN_API}/v1/vpn/status", json={"status": "stopped"})
if resp.status_code != 200:
log.error("Failed to stop VPN: %d %s", resp.status_code, resp.text)
return False
time.sleep(2)
resp = ctl.put(f"{GLUETUN_API}/v1/vpn/status", json={"status": "running"})
if resp.status_code != 200:
log.error("Failed to start VPN: %d %s", resp.status_code, resp.text)
return False
# Wait for reconnection
for _ in range(30):
time.sleep(2)
try:
with httpx.Client(timeout=10) as ctl:
new_ip_resp = ctl.get(f"{GLUETUN_API}/v1/publicip/ip")
if new_ip_resp.status_code == 200:
new_ip = new_ip_resp.json().get("public_ip", "")
if new_ip and new_ip != old_ip:
log.info("IP rotated: %s%s", old_ip, new_ip)
return True
except Exception:
pass # VPN still reconnecting
log.warning("IP rotation timed out (may still be same IP)")
return False
except Exception as e:
log.error("IP rotation failed: %s", e)
return False
def make_client() -> httpx.Client:
return httpx.Client(
timeout=30,
headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
follow_redirects=True,
)
def fetch_with_retry(
client: httpx.Client, url: str, params: dict | None = None, on_403: bool = True
) -> dict | None:
"""GET JSON with retries on 429/5xx/connection errors. Returns None on permanent failure.
On 403, triggers IP rotation and retries once."""
for attempt in range(MAX_RETRIES):
try:
resp = client.get(url, params=params)
if resp.status_code == 200:
return resp.json()
if resp.status_code == 403 and on_403:
log.warning("HTTP 403 — IP likely blocked, rotating...")
if rotate_ip():
# Retry once with new IP (but don't recurse on 403 again)
return fetch_with_retry(client, url, params, on_403=False)
log.error("IP rotation failed, giving up on %s", url)
return None
if resp.status_code in (429, 500, 502, 503, 504):
delay = RETRY_BASE_DELAY * (2**attempt) + random.uniform(0, 1)
log.warning("HTTP %d from %s, retry %d/%d in %.1fs", resp.status_code, url, attempt + 1, MAX_RETRIES, delay)
time.sleep(delay)
continue
log.error("HTTP %d from %s (non-retryable)", resp.status_code, url)
return None
except (httpx.ConnectError, httpx.ReadTimeout, httpx.WriteTimeout, httpx.PoolTimeout) as e:
delay = RETRY_BASE_DELAY * (2**attempt) + random.uniform(0, 1)
log.warning("%s from %s, retry %d/%d in %.1fs", type(e).__name__, url, attempt + 1, MAX_RETRIES, delay)
time.sleep(delay)
log.error("All %d retries exhausted for %s", MAX_RETRIES, url)
return None
# ---------------------------------------------------------------------------
# Rightmove API
# ---------------------------------------------------------------------------
def resolve_outcode_id(client: httpx.Client, outcode: str) -> str | None:
"""Look up Rightmove's internal ID for an outcode via typeahead API."""
if outcode in debug_data["outcode_cache"]:
return debug_data["outcode_cache"][outcode]
data = fetch_with_retry(client, TYPEAHEAD_URL, {"query": outcode, "limit": "10", "exclude": "STREET"})
if not data:
return None
for match in data.get("matches", []):
if match.get("type") == "OUTCODE" and match.get("displayName") == outcode:
rid = str(match["id"])
debug_data["outcode_cache"][outcode] = rid
return rid
log.debug("Outcode %s not found in typeahead results", outcode)
return None
def search_outcode(
client: httpx.Client,
outcode_id: str,
outcode: str,
channel_cfg: dict,
pc_index: PostcodeSpatialIndex,
) -> list[dict]:
"""Paginate through search results for one outcode+channel. Returns transformed properties."""
properties = []
index = 0
for page in range(MAX_PAGES_PER_OUTCODE):
params = {
"useLocationIdentifier": "true",
"locationIdentifier": f"OUTCODE^{outcode_id}",
"index": str(index),
"sortType": channel_cfg["sortType"],
"channel": channel_cfg["channel"],
"transactionType": channel_cfg["transactionType"],
}
data = fetch_with_retry(client, SEARCH_URL, params)
if not data:
log.warning("Failed to fetch page %d for %s/%s", page, outcode, channel_cfg["channel"])
break
debug_data["last_response"] = data
raw_props = data.get("properties", [])
if not raw_props:
break
for prop in raw_props:
transformed = transform_property(prop, outcode, pc_index)
if transformed:
properties.append(transformed)
# Check if there are more pages
result_count_str = data.get("resultCount", "0")
result_count = int(result_count_str.replace(",", ""))
index += PAGE_SIZE
if index >= result_count:
break
if page < MAX_PAGES_PER_OUTCODE - 1:
time.sleep(DELAY_BETWEEN_PAGES)
return properties
# ---------------------------------------------------------------------------
# Property transformation
# ---------------------------------------------------------------------------
def parse_display_size(display_size: str | None) -> float | None:
"""Parse displaySize like '499 sq. ft.' or '4,124 sq. ft.' to sqm."""
if not display_size:
return None
# Try sq. ft. first
m = re.search(r"([\d,]+(?:\.\d+)?)\s*sq\.?\s*ft", display_size, re.IGNORECASE)
if m:
sqft = float(m.group(1).replace(",", ""))
return round(sqft * 0.092903, 1)
# Try sq. m.
m = re.search(r"([\d,]+(?:\.\d+)?)\s*sq\.?\s*m", display_size, re.IGNORECASE)
if m:
return round(float(m.group(1).replace(",", "")), 1)
return None
def map_property_type(sub_type: str | None) -> str:
"""Map propertySubType to canonical type."""
if not sub_type:
return "Other"
canonical = PROPERTY_TYPE_MAP.get(sub_type)
if canonical:
return canonical
log.warning("Unknown propertySubType: %r — mapping to Other", sub_type)
return "Other"
def extract_tenure(tenure_obj: dict | None) -> str | None:
"""Extract tenure string from tenure object."""
if not tenure_obj:
return None
tt = tenure_obj.get("tenureType", "")
if tt == "FREEHOLD":
return "Freehold"
if tt == "LEASEHOLD":
return "Leasehold"
return None
def fix_coords(lat: float, lng: float) -> tuple[float, float]:
"""Swap lat/lng if they look reversed. England: lat ~4956, lng ~-72."""
if 49 <= lat <= 56 and -7 <= lng <= 2:
return lat, lng
if 49 <= lng <= 56 and -7 <= lat <= 2:
log.debug("Swapping reversed coords: lat=%.4f lng=%.4f → lat=%.4f lng=%.4f", lat, lng, lng, lat)
return lng, lat
log.warning("Coords outside England bounds even after swap attempt: lat=%.4f lng=%.4f", lat, lng)
return lat, lng
def normalize_price(amount: int, frequency: str) -> int:
"""Normalize price to monthly for rentals (weekly × 52/12, yearly ÷ 12)."""
if frequency == "weekly":
return round(amount * 52 / 12)
if frequency == "yearly":
return round(amount / 12)
return amount
def transform_property(prop: dict, outcode: str, pc_index: PostcodeSpatialIndex) -> dict | None:
"""Transform a raw Rightmove property dict into our output schema."""
loc = prop.get("location")
if not loc:
return None
raw_lat = loc.get("latitude")
raw_lng = loc.get("longitude")
if raw_lat is None or raw_lng is None:
return None
lat, lng = fix_coords(raw_lat, raw_lng)
price_obj = prop.get("price", {})
amount = price_obj.get("amount")
if amount is None:
return None
frequency = price_obj.get("frequency", "")
price = normalize_price(int(amount), frequency)
display_prices = price_obj.get("displayPrices", [])
price_qualifier = display_prices[0].get("displayPriceQualifier", "") if display_prices else ""
sub_type = prop.get("propertySubType", "")
bedrooms = prop.get("bedrooms", 0) or 0
bathrooms = prop.get("bathrooms", 0) or 0
key_features = [kf.get("description", "") for kf in prop.get("keyFeatures", []) if kf.get("description")]
listing_update = prop.get("listingUpdate", {})
update_date = listing_update.get("listingUpdateDate", "")
postcode = pc_index.nearest(lat, lng)
return {
"id": prop.get("id"),
"bedrooms": bedrooms,
"bathrooms": bathrooms,
"total_rooms": bedrooms + bathrooms,
"longitude": lng,
"latitude": lat,
"postcode": postcode,
"address": prop.get("displayAddress", ""),
"tenure": extract_tenure(prop.get("tenure")),
"property_type": map_property_type(sub_type),
"property_sub_type": sub_type or "Unknown",
"price": price,
"price_frequency": frequency,
"price_qualifier": price_qualifier,
"floorspace_sqm": parse_display_size(prop.get("displaySize")),
"url": RIGHTMOVE_BASE + prop.get("propertyUrl", ""),
"features": key_features,
"first_visible_date": prop.get("firstVisibleDate", ""),
"update_date": update_date,
"outcode": outcode,
"house_share": sub_type == "House Share",
}
# ---------------------------------------------------------------------------
# Parquet writing
# ---------------------------------------------------------------------------
def write_parquet(properties: list[dict], path: Path) -> None:
"""Write properties list to parquet using Polars."""
if not properties:
log.warning("No properties to write to %s", path)
return
df = pl.DataFrame(
{
"id": [p["id"] for p in properties],
"bedrooms": [p["bedrooms"] for p in properties],
"bathrooms": [p["bathrooms"] for p in properties],
"total_rooms": [p["total_rooms"] for p in properties],
"longitude": [p["longitude"] for p in properties],
"latitude": [p["latitude"] for p in properties],
"postcode": [p["postcode"] for p in properties],
"address": [p["address"] for p in properties],
"tenure": [p["tenure"] for p in properties],
"property_type": [p["property_type"] for p in properties],
"property_sub_type": [p["property_sub_type"] for p in properties],
"price": [p["price"] for p in properties],
"price_frequency": [p["price_frequency"] for p in properties],
"price_qualifier": [p["price_qualifier"] for p in properties],
"floorspace_sqm": [p["floorspace_sqm"] for p in properties],
"url": [p["url"] for p in properties],
"features": [p["features"] for p in properties],
"first_visible_date": [p["first_visible_date"] for p in properties],
"update_date": [p["update_date"] for p in properties],
"outcode": [p["outcode"] for p in properties],
"house_share": [p["house_share"] for p in properties],
},
schema={
"id": pl.Int64,
"bedrooms": pl.Int32,
"bathrooms": pl.Int32,
"total_rooms": pl.Int32,
"longitude": pl.Float64,
"latitude": pl.Float64,
"postcode": pl.Utf8,
"address": pl.Utf8,
"tenure": pl.Utf8,
"property_type": pl.Utf8,
"property_sub_type": pl.Utf8,
"price": pl.Int64,
"price_frequency": pl.Utf8,
"price_qualifier": pl.Utf8,
"floorspace_sqm": pl.Float64,
"url": pl.Utf8,
"features": pl.List(pl.Utf8),
"first_visible_date": pl.Utf8,
"update_date": pl.Utf8,
"outcode": pl.Utf8,
"house_share": pl.Boolean,
},
)
df.write_parquet(path)
log.info("Wrote %d properties to %s", len(df), path)
# ---------------------------------------------------------------------------
# Scrape orchestration
# ---------------------------------------------------------------------------
def load_outcodes() -> list[str]:
"""Load England-only outcodes from arcgis parquet."""
log.info("Loading outcodes from %s", ARCGIS_PATH)
df = pl.read_parquet(ARCGIS_PATH, columns=["pcd", "ctry", "lat", "long"])
england = df.filter(pl.col("ctry") == "E92000001")
log.info("England postcodes: %d", len(england))
outcodes = (
england.select(pl.col("pcd").str.extract(r"^([A-Z]{1,2}\d[A-Z0-9]?)", 1).alias("outcode"))
.drop_nulls()
.get_column("outcode")
.unique()
.sort()
.to_list()
)
log.info("Unique England outcodes: %d", len(outcodes))
return outcodes
def build_postcode_index() -> PostcodeSpatialIndex:
"""Build spatial index from arcgis England postcodes."""
log.info("Building postcode spatial index from %s", ARCGIS_PATH)
df = pl.read_parquet(ARCGIS_PATH, columns=["pcd", "ctry", "lat", "long"])
england = df.filter(pl.col("ctry") == "E92000001").drop_nulls(subset=["lat", "long"])
return PostcodeSpatialIndex(
england.get_column("lat").to_list(),
england.get_column("long").to_list(),
england.get_column("pcd").to_list(),
)
def run_scrape(outcodes: list[str], pc_index: PostcodeSpatialIndex) -> None:
"""Main scrape loop — runs in background thread."""
global status
with status_lock:
status.state = "running"
status.started_at = time.time()
status.errors = []
status.properties_buy = 0
status.properties_rent = 0
# Shuffle for geographic diversity
shuffled = list(outcodes)
random.seed(SEED)
random.shuffle(shuffled)
client = make_client()
try:
for channel_cfg in CHANNELS:
channel_name = channel_cfg["channel"]
file_suffix = "buy" if channel_name == "BUY" else "rent"
all_properties: dict[int, dict] = {} # dedup by id
with status_lock:
status.channel = channel_name
status.outcodes_done = 0
status.outcodes_total = len(shuffled)
log.info("=== Starting %s channel (%d outcodes) ===", channel_name, len(shuffled))
for i, outcode in enumerate(shuffled):
with status_lock:
status.outcode = outcode
status.outcodes_done = i
log.debug("Outcode %s (%d/%d) — %d properties so far",
outcode, i + 1, len(shuffled), len(all_properties))
try:
outcode_id = resolve_outcode_id(client, outcode)
if not outcode_id:
log.debug("No Rightmove ID for outcode %s, skipping", outcode)
continue
props = search_outcode(client, outcode_id, outcode, channel_cfg, pc_index)
for p in props:
pid = p["id"]
if pid not in all_properties:
all_properties[pid] = p
with status_lock:
if channel_name == "BUY":
status.properties_buy = len(all_properties)
else:
status.properties_rent = len(all_properties)
log.info("Outcode %s: got %d properties (total: %d)", outcode, len(props), len(all_properties))
except Exception as e:
msg = f"Error scraping {outcode}/{channel_name}: {e}"
log.error(msg)
with status_lock:
status.errors.append(msg)
if i < len(shuffled) - 1:
time.sleep(DELAY_BETWEEN_OUTCODES)
# Write parquet
deduped = list(all_properties.values())
output_path = DATA_DIR / f"rightmove_{file_suffix}.parquet"
write_parquet(deduped, output_path)
with status_lock:
if channel_name == "BUY":
status.properties_buy = len(deduped)
else:
status.properties_rent = len(deduped)
status.outcodes_done = len(shuffled)
log.info("=== %s channel complete: %d unique properties ===", channel_name, len(deduped))
with status_lock:
status.state = "done"
status.finished_at = time.time()
elapsed = status.finished_at - status.started_at
log.info("Scrape complete in %.0fs — buy: %d, rent: %d",
elapsed, status.properties_buy, status.properties_rent)
except Exception as e:
log.exception("Fatal scrape error")
with status_lock:
status.state = "error"
status.errors.append(f"Fatal: {e}")
status.finished_at = time.time()
finally:
client.close()
# ---------------------------------------------------------------------------
# Startup: load data
# ---------------------------------------------------------------------------
log.info("Loading arcgis data...")
OUTCODES = load_outcodes()
PC_INDEX = build_postcode_index()
log.info("Ready — %d outcodes, postcode index built", len(OUTCODES))
# ---------------------------------------------------------------------------
# Flask app
# ---------------------------------------------------------------------------
app = Flask(__name__)
@app.route("/run", methods=["POST"])
def trigger_run():
with status_lock:
if status.state == "running":
return jsonify({"error": "Scrape already running"}), 409
status.state = "running"
thread = threading.Thread(target=run_scrape, args=(OUTCODES, PC_INDEX), daemon=True)
thread.start()
return jsonify({"message": "Scrape started"}), 200
@app.route("/status")
def get_status():
with status_lock:
elapsed = 0.0
if status.started_at:
end = status.finished_at if status.finished_at else time.time()
elapsed = end - status.started_at
return jsonify({
"state": status.state,
"channel": status.channel,
"outcode": status.outcode,
"outcodes_done": status.outcodes_done,
"outcodes_total": status.outcodes_total,
"properties_buy": status.properties_buy,
"properties_rent": status.properties_rent,
"errors": status.errors[-20:], # last 20 errors
"elapsed_seconds": round(elapsed, 1),
})
@app.route("/debug")
def get_debug():
return jsonify({
"last_response": debug_data["last_response"],
"outcode_cache_size": len(debug_data["outcode_cache"]),
"outcode_cache_sample": dict(list(debug_data["outcode_cache"].items())[:20]),
})
@app.route("/data/<filename>")
def serve_data(filename):
if not filename.endswith(".parquet"):
return jsonify({"error": "Only parquet files served"}), 400
return send_from_directory(DATA_DIR, filename)
if __name__ == "__main__":
app.run(host="0.0.0.0", port=1234, debug=False)

View file

@ -0,0 +1,6 @@
Hit the following url with the outcode as the location-id and the page. So for E13, page 2 it's:
https://www.onthemarket.com/async/search/properties-v2/?search-type=for-sale&location-id=e13&page=2&view=map-list
and the response is in [[response.json]]

File diff suppressed because it is too large Load diff

9
finder/pyproject.toml Normal file
View file

@ -0,0 +1,9 @@
[project]
name = "finder"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = [
"flask",
"httpx",
"polars",
]

10918
finder/rightmove/buy.json Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,52 @@
The API works as follows, you must search for outcodes, such as E11, then hit https://los.rightmove.co.uk/typeahead?query=E11&limit=10&exclude=STREET which will return something like:
{
"matches": [
{
"id": "746",
"type": "OUTCODE",
"displayName": "E11",
"highlighting": "<span class='highlightLetter'>E11</span>",
"highlights": [
{
"text": "E11",
"highlighted": true
}
]
},
{
"id": "749",
"type": "OUTCODE",
"displayName": "E14",
"highlighting": "displayName",
"highlights": []
},
{
"id": "752",
"type": "OUTCODE",
"displayName": "E17",
"highlighting": "displayName",
"highlights": []
},
...
]
}
We need to find the id of the object which has "type": "OUTCODE", and displayName matching the outcode we searched for, in this case E11, which is 746. Then we can hit the search endpoint with that id, and it will return the properties for that outcode:
https://www.rightmove.co.uk/api/property-search/listing/search?useLocationIdentifier=true&locationIdentifier=OUTCODE%5E746&buy=For+sale&_includeSSTC=on&index=0&sortType=2&channel=BUY&transactionType=BUY&displayLocationIdentifier=E12.html
You can see the example response to this at [[buy.json]]
You must set locationIdentifier=OUTCODE%5E{id} where id is 746 in this case, so it's 746 locationIdentifier=OUTCODE%5E746. Paging works by increasing index by the number of results per page, which is 24. So the next page would be index=24, then index=48, etc.
The rental endpoint works similarly:
https://www.rightmove.co.uk/api/property-search/listing/search?locationIdentifier=OUTCODE%5E745&index=0&sortType=6&channel=RENT&transactionType=LETTING&displayLocationIdentifier=E16.html
https://www.rightmove.co.uk/api/property-search/listing/search?locationIdentifier=OUTCODE%5E752&index=48&sortType=6&channel=RENT&transactionType=LETTING&displayLocationIdentifier=E17.html
See a response example for the rental endpoint at [[rent.json]]

8247
finder/rightmove/rental.json Normal file

File diff suppressed because it is too large Load diff

View file

@ -1,7 +1,7 @@
import { useState, useEffect, useCallback, useRef, useMemo } from 'react';
import MapComponent from '../map/Map';
import { Slider } from '../ui/Slider';
import { apiUrl, authHeaders, logNonAbortError } from '../../lib/api';
import { apiUrl, assertOk, authHeaders, logNonAbortError } from '../../lib/api';
import { formatValue } from '../../lib/format';
import { FEATURE_GRADIENT, DENSITY_GRADIENT, DENSITY_GRADIENT_DARK } from '../../lib/consts';
import { gradientToCss } from '../../lib/utils';
@ -88,7 +88,10 @@ export default function HomeDemo({ features, theme }: HomeDemoProps) {
abortRef.current = new AbortController();
setFetching(true);
fetch(apiUrl('hexagons', params), authHeaders({ signal: abortRef.current.signal }))
.then((res) => res.json())
.then((res) => {
assertOk(res, 'hexagons');
return res.json();
})
.then((data: { features: HexagonData[] }) => {
setHexData(data.features);
setLoading(false);
@ -142,7 +145,10 @@ export default function HomeDemo({ features, theme }: HomeDemoProps) {
dragAbortRef.current?.abort();
dragAbortRef.current = new AbortController();
fetch(apiUrl('hexagons', params), authHeaders({ signal: dragAbortRef.current.signal }))
.then((res) => res.json())
.then((res) => {
assertOk(res, 'hexagons');
return res.json();
})
.then((data: { features: HexagonData[] }) => setDragHexData(data.features))
.catch((err) => logNonAbortError('Failed to fetch demo drag data', err));
},

View file

@ -4,10 +4,11 @@ import { SpinnerIcon } from '../ui/icons/SpinnerIcon';
interface AiFilterInputProps {
loading: boolean;
error: string | null;
notes: string | null;
onSubmit: (query: string) => void;
}
export default memo(function AiFilterInput({ loading, error, onSubmit }: AiFilterInputProps) {
export default memo(function AiFilterInput({ loading, error, notes, onSubmit }: AiFilterInputProps) {
const [query, setQuery] = useState('');
const handleSubmit = useCallback(
@ -48,6 +49,11 @@ export default memo(function AiFilterInput({ loading, error, onSubmit }: AiFilte
{error}
</p>
)}
{notes && !error && (
<p className="mt-1 text-xs text-warm-500 dark:text-warm-400 italic">
{notes}
</p>
)}
</div>
);
});

View file

@ -155,10 +155,10 @@ export default function AreaPane({
const stackedEnumCharts = STACKED_ENUM_GROUPS[group.name];
// Features that are part of a stacked enum config (rendered as compact charts)
const stackedEnumFeatureNames = new Set(
(stackedEnumCharts?.flatMap((c) =>
[c.feature, ...c.components].filter(Boolean)
) as string[]) ?? []
const stackedEnumFeatureNames = new Set<string>(
stackedEnumCharts?.flatMap((c) =>
[c.feature, ...c.components].filter((s): s is string => Boolean(s))
) ?? []
);
const isExpanded = !collapsedGroups.has(group.name);

View file

@ -11,6 +11,7 @@ import { FeatureActions } from '../ui/FeatureIcons';
import { FeatureLabel } from '../ui/FeatureLabel';
import { RouteIcon, PlusIcon } from '../ui/icons';
import { IconButton } from '../ui/IconButton';
import { TRANSPORT_MODES, MODE_LABELS, type TransportMode } from '../../hooks/useTravelTime';
interface FeatureBrowserProps {
availableFeatures: FeatureMeta[];
@ -21,8 +22,8 @@ interface FeatureBrowserProps {
onNavigateToSource?: (slug: string, featureName: string) => void;
openInfoFeature?: string | null;
onClearOpenInfoFeature?: () => void;
travelTimeEnabled?: boolean;
onEnableTravelTime?: () => void;
activeTravelModes: TransportMode[];
onEnableTravelMode: (mode: TransportMode) => void;
}
export default function FeatureBrowser({
@ -34,8 +35,8 @@ export default function FeatureBrowser({
onNavigateToSource,
openInfoFeature,
onClearOpenInfoFeature,
travelTimeEnabled,
onEnableTravelTime,
activeTravelModes,
onEnableTravelMode,
}: FeatureBrowserProps) {
const [search, setSearch] = useState('');
const [infoFeature, setInfoFeature] = useState<FeatureMeta | null>(null);
@ -60,32 +61,42 @@ export default function FeatureBrowser({
// When searching, expand all groups so results are visible
const isSearching = search.length > 0;
// Inactive modes available to add
const inactiveModes = useMemo(
() => TRANSPORT_MODES.filter((m) => !activeTravelModes.includes(m)),
[activeTravelModes]
);
const showTravelModes =
inactiveModes.length > 0 &&
(!search || 'travel time journey commute car bicycle walking transit'.includes(search.toLowerCase()));
return (
<>
<div className="shrink-0 p-2 border-b border-warm-200 dark:border-navy-700">
<SearchInput value={search} onChange={setSearch} placeholder="Search features..." />
</div>
<div className="md:min-h-0 md:flex-1 md:overflow-y-auto flex flex-col">
{!travelTimeEnabled && onEnableTravelTime && (!search || 'travel time journey commute'.includes(search.toLowerCase())) && (
<div className="shrink-0 border-b border-warm-200 dark:border-warm-700">
{showTravelModes && inactiveModes.map((mode) => (
<div key={mode} className="shrink-0 border-b border-warm-200 dark:border-warm-700">
<div className="flex items-start justify-between px-3 py-2 hover:bg-teal-50 dark:hover:bg-teal-900/30 cursor-pointer">
<div className="flex items-center gap-2 min-w-0" onClick={onEnableTravelTime}>
<div className="flex items-center gap-2 min-w-0" onClick={() => onEnableTravelMode(mode)}>
<RouteIcon className="w-4 h-4 text-teal-600 dark:text-teal-400 shrink-0" />
<div className="min-w-0">
<span className="text-sm font-medium text-navy-950 dark:text-warm-100">
Travel Time
Travel Time ({MODE_LABELS[mode]})
</span>
<span className="text-xs text-warm-400 dark:text-warm-500 block">
Color by journey time to a destination
</span>
</div>
</div>
<IconButton onClick={() => onEnableTravelTime()} title="Add travel time">
<IconButton onClick={() => onEnableTravelMode(mode)} title={`Add ${MODE_LABELS[mode]} travel time`}>
<PlusIcon className="w-3.5 h-3.5" />
</IconButton>
</div>
</div>
)}
))}
{grouped.map((group) => {
const isExpanded = isSearching || expandedGroups.has(group.name);
return (
@ -128,7 +139,7 @@ export default function FeatureBrowser({
</div>
);
})}
{grouped.length === 0 ? (
{grouped.length === 0 && !showTravelModes ? (
<EmptyState
icon={<FilterIcon className="w-8 h-8 text-warm-300 dark:text-warm-600" />}
title={search ? 'No matching features' : 'All features are active'}

View file

@ -17,18 +17,25 @@ import { FeatureLabel } from '../ui/FeatureLabel';
import AiFilterInput from './AiFilterInput';
import FeatureBrowser from './FeatureBrowser';
import { TravelTimeCard } from './TravelTimeCard';
import type { TransportMode } from '../../hooks/useTravelTime';
import {
TRANSPORT_MODES,
type TransportMode,
type TravelTimeEntries,
} from '../../hooks/useTravelTime';
function SliderLabels({
min,
max,
value,
displayValues,
absoluteMax,
}: {
min: number;
max: number;
value: [number, number];
displayValues?: [number, number];
/** When true and slider is at max, append "+" to indicate unrestricted upper bound */
absoluteMax?: boolean;
}) {
const range = max - min || 1;
const leftPct = ((value[0] - min) / range) * 100;
@ -46,7 +53,7 @@ function SliderLabels({
className="absolute -translate-x-1/2"
style={{ left: `${rightPct}%` }}
>
{formatFilterValue(labels[1])}
{formatFilterValue(labels[1])}{absoluteMax && value[1] >= max ? '+' : ''}
</span>
</div>
);
@ -70,19 +77,15 @@ interface FiltersProps {
onNavigateToSource?: (slug: string, featureName: string) => void;
openInfoFeature?: string | null;
onClearOpenInfoFeature?: () => void;
travelTimeEnabled: boolean;
travelTimeDestination: [number, number] | null;
travelTimeDestinationLabel: string;
travelTimeMode: TransportMode;
travelTimeRange: [number, number] | null;
travelTimeDataRange: [number, number] | null;
onTravelTimeEnable: () => void;
onTravelTimeDisable: () => void;
onTravelTimeSetDestination: (lat: number, lon: number, label: string) => void;
onTravelTimeModeChange: (mode: TransportMode) => void;
onTravelTimeRangeChange: (range: [number, number]) => void;
travelTimeEntries: TravelTimeEntries;
travelTimeDataRanges: Partial<Record<TransportMode, [number, number]>>;
onTravelTimeEnableMode: (mode: TransportMode) => void;
onTravelTimeDisableMode: (mode: TransportMode) => void;
onTravelTimeSetDestination: (mode: TransportMode, lat: number, lon: number, label: string) => void;
onTravelTimeRangeChange: (mode: TransportMode, range: [number, number]) => void;
aiFilterLoading: boolean;
aiFilterError: string | null;
aiFilterNotes: string | null;
onAiFilterSubmit: (query: string) => void;
}
@ -104,19 +107,15 @@ export default memo(function Filters({
onNavigateToSource,
openInfoFeature,
onClearOpenInfoFeature,
travelTimeEnabled,
travelTimeDestination,
travelTimeDestinationLabel,
travelTimeMode,
travelTimeRange,
travelTimeDataRange,
onTravelTimeEnable,
onTravelTimeDisable,
travelTimeEntries,
travelTimeDataRanges,
onTravelTimeEnableMode,
onTravelTimeDisableMode,
onTravelTimeSetDestination,
onTravelTimeModeChange,
onTravelTimeRangeChange,
aiFilterLoading,
aiFilterError,
aiFilterNotes,
onAiFilterSubmit,
}: FiltersProps) {
const availableFeatures = features.filter((f) => !enabledFeatures.has(f.name));
@ -127,6 +126,11 @@ export default memo(function Filters({
const [activeInfoFeature, setActiveInfoFeature] = useState<FeatureMeta | null>(null);
const [collapsedGroups, toggleGroup] = useCollapsibleGroups();
const activeModes = useMemo(
() => TRANSPORT_MODES.filter((m) => m in travelTimeEntries),
[travelTimeEntries]
);
const handleAddAndScroll = useCallback(
(name: string) => {
onAddFilter(name);
@ -144,17 +148,19 @@ export default memo(function Filters({
const percentileScales = useMemo(() => {
const scales = new Map<string, PercentileScale>();
for (const f of features) {
if (f.type === 'numeric' && f.histogram) {
if (f.type === 'numeric' && f.histogram && !f.absolute) {
scales.set(f.name, buildPercentileScale(f.histogram));
}
}
return scales;
}, [features]);
const badgeCount = enabledFeatureList.length + activeModes.length;
return (
<div ref={containerRef} className="flex flex-col bg-white dark:bg-navy-950 overflow-y-auto md:overflow-hidden h-full">
<div className="shrink-0 border-b border-warm-200 dark:border-navy-700">
<AiFilterInput loading={aiFilterLoading} error={aiFilterError} onSubmit={onAiFilterSubmit} />
<AiFilterInput loading={aiFilterLoading} error={aiFilterError} notes={aiFilterNotes} onSubmit={onAiFilterSubmit} />
<div className="flex items-center gap-2 px-3 pb-2">
<button
onClick={() => setShowPhilosophy(true)}
@ -171,32 +177,34 @@ export default memo(function Filters({
<span className="text-sm font-semibold text-navy-950 dark:text-warm-100">
Active Filters
</span>
{(enabledFeatureList.length > 0 || travelTimeEnabled) && (
{badgeCount > 0 && (
<span className="text-xs font-medium px-1.5 py-0.5 rounded-full bg-teal-50 dark:bg-teal-900/30 text-teal-600 dark:text-teal-400">
{enabledFeatureList.length + (travelTimeEnabled ? 1 : 0)}
{badgeCount}
</span>
)}
</div>
</div>
<div className="md:flex-1 md:overflow-y-auto">
{travelTimeEnabled && (
<div className="px-2 py-1">
<TravelTimeCard
destination={travelTimeDestination}
destinationLabel={travelTimeDestinationLabel}
mode={travelTimeMode}
timeRange={travelTimeRange}
dataRange={travelTimeDataRange}
onSetDestination={onTravelTimeSetDestination}
onModeChange={onTravelTimeModeChange}
onTimeRangeChange={onTravelTimeRangeChange}
onRemove={onTravelTimeDisable}
/>
</div>
)}
{activeModes.map((mode) => {
const entry = travelTimeEntries[mode]!;
return (
<div key={mode} className="px-2 py-1">
<TravelTimeCard
mode={mode}
destination={entry.destination}
destinationLabel={entry.destinationLabel}
timeRange={entry.timeRange}
dataRange={travelTimeDataRanges[mode] ?? null}
onSetDestination={(lat, lon, label) => onTravelTimeSetDestination(mode, lat, lon, label)}
onTimeRangeChange={(range) => onTravelTimeRangeChange(mode, range)}
onRemove={() => onTravelTimeDisableMode(mode)}
/>
</div>
);
})}
{enabledFeatureList.length === 0 && !travelTimeEnabled && (
{enabledFeatureList.length === 0 && activeModes.length === 0 && (
<p className="px-3 py-1.5 text-xs text-warm-400 dark:text-warm-500">
Browse features below and click + to add a filter
</p>
@ -300,6 +308,7 @@ export default memo(function Filters({
max={scale ? 100 : feature.max!}
value={sliderValue}
displayValues={scale ? displayValue : undefined}
absoluteMax={feature.absolute}
/>
</div>
</div>
@ -327,8 +336,8 @@ export default memo(function Filters({
onNavigateToSource={onNavigateToSource}
openInfoFeature={openInfoFeature}
onClearOpenInfoFeature={onClearOpenInfoFeature}
travelTimeEnabled={travelTimeEnabled}
onEnableTravelTime={onTravelTimeEnable}
activeTravelModes={activeModes}
onEnableTravelMode={onTravelTimeEnableMode}
/>
</div>
</div>

View file

@ -0,0 +1,143 @@
import { useState, useCallback, useRef, useEffect } from 'react';
import type { PostcodeGeometry } from '../../types';
import { authHeaders } from '../../lib/api';
import { useIsMobile } from '../../hooks/useIsMobile';
import { useLocationSearch, type SearchResult } from '../../hooks/useLocationSearch';
import { PlaceSearchInput } from '../ui/PlaceSearchInput';
import { SearchIcon } from '../ui/icons/SearchIcon';
export interface SearchedLocation {
postcode: string;
geometry: PostcodeGeometry;
}
const ZOOM_FOR_TYPE: Record<string, number> = {
city: 10,
borough: 12,
town: 13,
suburb: 14,
quarter: 14,
neighbourhood: 14,
village: 14,
station: 15,
island: 12,
locality: 14,
hamlet: 15,
isolated_dwelling: 16,
};
export default function LocationSearch({
onFlyTo,
onLocationSearched,
}: {
onFlyTo: (lat: number, lng: number, zoom: number) => void;
onLocationSearched?: (postcode: SearchedLocation | null) => void;
}) {
const search = useLocationSearch();
const [error, setError] = useState<string | null>(null);
const [loading, setLoading] = useState(false);
const [expanded, setExpanded] = useState(false);
const isMobile = useIsMobile();
const containerRef = useRef<HTMLDivElement>(null);
const inputRef = useRef<HTMLInputElement>(null);
// Close on outside click
useEffect(() => {
const handler = (e: MouseEvent) => {
if (containerRef.current && !containerRef.current.contains(e.target as Node)) {
search.close();
if (isMobile) setExpanded(false);
}
};
document.addEventListener('mousedown', handler);
return () => document.removeEventListener('mousedown', handler);
}, [isMobile, search]);
// Focus input when expanding on mobile
useEffect(() => {
if (isMobile && expanded) {
inputRef.current?.focus();
}
}, [isMobile, expanded]);
const selectResult = useCallback(
async (result: SearchResult) => {
if (result.type === 'place') {
const zoom = ZOOM_FOR_TYPE[result.place_type] ?? 14;
onFlyTo(result.lat, result.lon, zoom);
onLocationSearched?.(null);
search.clear();
if (isMobile) setExpanded(false);
return;
}
// Postcode — fetch geometry
setError(null);
setLoading(true);
search.close();
try {
const res = await fetch(
`/api/postcode/${encodeURIComponent(result.label)}`,
authHeaders(),
);
if (!res.ok) {
setError('Postcode not found');
return;
}
const json: {
postcode: string;
latitude: number;
longitude: number;
geometry: PostcodeGeometry;
} = await res.json();
onFlyTo(json.latitude, json.longitude, 16);
onLocationSearched?.({ postcode: json.postcode, geometry: json.geometry });
search.clear();
if (isMobile) setExpanded(false);
} catch {
setError('Lookup failed');
} finally {
setLoading(false);
}
},
[onFlyTo, onLocationSearched, isMobile, search],
);
// Mobile collapsed state: just a search icon button
if (isMobile && !expanded) {
return (
<button
type="button"
onClick={() => setExpanded(true)}
className="absolute top-3 left-3 z-10 p-2 bg-white dark:bg-warm-800 rounded shadow-lg"
aria-label="Search places or postcodes"
>
<SearchIcon className="w-5 h-5 text-warm-600 dark:text-warm-300" />
</button>
);
}
return (
<div ref={containerRef} className="absolute top-3 left-3 z-10 flex flex-col">
<div className="flex items-center shadow-lg rounded overflow-hidden bg-white dark:bg-warm-800">
<SearchIcon className="w-4 h-4 text-warm-400 dark:text-warm-500 ml-3 shrink-0" />
<PlaceSearchInput
search={search}
onSelect={selectResult}
loading={loading}
placeholder="Search places or postcodes..."
size="sm"
inputClassName="px-2 py-2 text-sm w-56 border-none outline-none bg-transparent text-warm-700 dark:text-warm-200 placeholder-warm-400 dark:placeholder-warm-500"
inputRef={inputRef}
onInputChange={() => setError(null)}
/>
</div>
{error && (
<span className="text-xs text-red-600 dark:text-red-300 bg-white/90 dark:bg-warm-800/90 rounded px-2 py-0.5 shadow mt-1">
{error}
</span>
)}
</div>
);
}

View file

@ -6,6 +6,7 @@ import 'maplibre-gl/dist/maplibre-gl.css';
import type {
HexagonData,
PostcodeFeature,
PostcodeGeometry,
ViewState,
ViewChangeParams,
POI,
@ -15,11 +16,12 @@ import type {
import { zoomToResolution, getBoundsFromViewState, getMapStyle } from '../../lib/map-utils';
import { INITIAL_VIEW_STATE, MAP_MIN_ZOOM, MAP_BOUNDS } from '../../lib/consts';
import PostcodeSearch, { type SearchedPostcode } from './PostcodeSearch';
import LocationSearch, { type SearchedLocation } from './LocationSearch';
import MapLegend from './MapLegend';
import HoverCard from './HoverCard';
import type { FeatureFilters } from '../../types';
import { useDeckLayers, osmIdToUrl } from '../../hooks/useDeckLayers';
import { MODE_LABELS, type TransportMode, type TravelTimeEntries } from '../../hooks/useTravelTime';
interface MapProps {
data: HexagonData[];
@ -42,14 +44,12 @@ interface MapProps {
screenshotMode?: boolean;
ogMode?: boolean;
filters?: FeatureFilters;
searchedPostcode?: SearchedPostcode | null;
onPostcodeSearched?: (postcode: SearchedPostcode | null) => void;
selectedPostcodeGeometry?: PostcodeGeometry | null;
onLocationSearched?: (location: SearchedLocation | null) => void;
bounds?: Bounds | null;
hideLegend?: boolean;
travelTimeEnabled?: boolean;
travelTimeDestination?: [number, number] | null;
travelTimeColorRange?: [number, number] | null;
travelTimeRange?: [number, number] | null;
travelTimeEntries?: TravelTimeEntries;
travelTimeColorRanges?: Partial<Record<TransportMode, [number, number]>>;
}
interface Dimensions {
@ -98,14 +98,12 @@ export default memo(function Map({
screenshotMode = false,
ogMode = false,
filters = {},
searchedPostcode,
onPostcodeSearched,
selectedPostcodeGeometry,
onLocationSearched,
bounds: viewportBounds,
hideLegend = false,
travelTimeEnabled = false,
travelTimeDestination,
travelTimeColorRange,
travelTimeRange,
travelTimeEntries = {},
travelTimeColorRanges = {},
}: MapProps) {
const containerRef = useRef<HTMLDivElement>(null);
const [viewState, setViewState] = useState<ViewState>(initialViewState || INITIAL_VIEW_STATE);
@ -168,6 +166,7 @@ export default memo(function Map({
postcodeCountRange,
colorFeatureMeta,
handleMouseLeave,
primaryTravelMode,
} = useDeckLayers({
data,
postcodeData,
@ -182,12 +181,10 @@ export default memo(function Map({
onHexagonClick,
onHexagonHover,
theme,
searchedPostcode,
selectedPostcodeGeometry,
bounds: viewportBounds,
travelTimeEnabled,
travelTimeDestination,
travelTimeColorRange,
travelTimeRange,
travelTimeEntries,
travelTimeColorRanges,
});
return (
@ -222,12 +219,12 @@ export default memo(function Map({
) : null
) : (
<>
<PostcodeSearch onFlyTo={handleFlyTo} onPostcodeSearched={onPostcodeSearched} />
<LocationSearch onFlyTo={handleFlyTo} onLocationSearched={onLocationSearched} />
{!hideLegend &&
(travelTimeEnabled && travelTimeDestination && travelTimeColorRange ? (
(primaryTravelMode && travelTimeColorRanges[primaryTravelMode] ? (
<MapLegend
featureLabel="Travel time"
range={travelTimeColorRange}
featureLabel={`Travel time (${MODE_LABELS[primaryTravelMode]})`}
range={travelTimeColorRanges[primaryTravelMode]!}
showCancel={false}
onCancel={onCancelPin}
mode="feature"

View file

@ -1,6 +1,6 @@
import { useState, useEffect, useMemo, useCallback } from 'react';
import type { FeatureMeta, FeatureFilters, POICategoryGroup, ViewState } from '../../types';
import type { SearchedPostcode } from './PostcodeSearch';
import type { SearchedLocation } from './LocationSearch';
import type { Page } from '../ui/Header';
import Map from './Map';
import Filters from './Filters';
@ -18,8 +18,14 @@ import { usePaneResize } from '../../hooks/usePaneResize';
import { useAiFilters } from '../../hooks/useAiFilters';
import { useAreaSummary } from '../../hooks/useAreaSummary';
import { useUrlSync } from '../../hooks/useUrlSync';
import { useTravelTime, type TravelTimeInitial } from '../../hooks/useTravelTime';
import { apiUrl, buildFilterString } from '../../lib/api';
import {
useTravelTime,
TRANSPORT_MODES,
MODE_LABELS,
type TransportMode,
type TravelTimeInitial,
} from '../../hooks/useTravelTime';
import { apiUrl, assertOk, buildFilterString, logNonAbortError } from '../../lib/api';
import { SpinnerIcon } from '../ui/icons/SpinnerIcon';
import { MapPinIcon } from '../ui/icons/MapPinIcon';
@ -65,7 +71,6 @@ export default function MapPage({
isMobile = false,
initialTravelTime,
}: MapPageProps) {
const [searchedPostcode, setSearchedPostcode] = useState<SearchedPostcode | null>(null);
const [selectedPOICategories, setSelectedPOICategories] =
useState<Set<string>>(initialPOICategories);
@ -109,7 +114,7 @@ export default function MapPage({
const handleAiFilterSubmit = useCallback(
async (query: string) => {
const result = await aiFilters.fetchAiFilters(query);
if (result) handleSetFilters(result);
if (result) handleSetFilters(result.filters);
},
[aiFilters.fetchAiFilters, handleSetFilters]
);
@ -125,9 +130,7 @@ export default function MapPage({
activeFeature,
dragValue,
dragData,
travelTimeEnabled: travelTime.enabled,
travelTimeDestination: travelTime.destination,
travelTimeMode: travelTime.mode,
travelTimeEntries: travelTime.entries,
});
// Keep filter bounds in sync with map data
@ -142,24 +145,42 @@ export default function MapPage({
resolution: mapData.resolution,
});
// Location search handler — selects postcode + shows stats
const handleLocationSearchResult = useCallback(
(result: SearchedLocation | null) => {
if (result) {
selection.handleLocationSearch(result.postcode, result.geometry);
if (isMobile) setMobileDrawerOpen(true);
} else {
selection.handleCloseSelection();
}
},
[selection.handleLocationSearch, selection.handleCloseSelection, isMobile]
);
// POI data
const pois = usePOIData(mapData.bounds, selectedPOICategories);
// Compute data range for travel time slider
const travelTimeDataRange = useMemo((): [number, number] | null => {
if (!travelTime.enabled || !travelTime.destination) return null;
const vals: number[] = [];
for (const item of mapData.data) {
const val = item.travel_time;
if (typeof val === 'number' && !isNaN(val)) vals.push(val);
// Compute data range for travel time slider per mode (full min/max for slider bounds)
const travelTimeDataRanges = useMemo((): Partial<Record<TransportMode, [number, number]>> => {
const ranges: Partial<Record<TransportMode, [number, number]>> = {};
for (const mode of TRANSPORT_MODES) {
const entry = travelTime.entries[mode];
if (!entry?.destination) continue;
const vals: number[] = [];
for (const item of mapData.data) {
const val = item[`travel_time_${mode}`];
if (typeof val === 'number' && !isNaN(val)) vals.push(val);
}
if (vals.length === 0) continue;
vals.sort((a, b) => a - b);
ranges[mode] = [vals[0], vals[vals.length - 1]];
}
if (vals.length === 0) return null;
vals.sort((a, b) => a - b);
return [vals[0], vals[vals.length - 1]];
}, [travelTime.enabled, travelTime.destination, mapData.data]);
return ranges;
}, [travelTime.entries, mapData.data]);
// Sync current state to URL
useUrlSync(mapData.currentView, filters, features, selectedPOICategories, selection.rightPaneTab, travelTime);
useUrlSync(mapData.currentView, filters, features, selectedPOICategories, selection.rightPaneTab, travelTime.entries);
// Set initial view and tab from URL state
useEffect(() => {
@ -238,7 +259,7 @@ export default function MapPage({
link.click();
URL.revokeObjectURL(link.href);
})
.catch((err) => console.error('Export failed:', err))
.catch((err) => logNonAbortError('Export failed', err))
.finally(() => setExporting(false));
}, [mapData.bounds, filters, features, exporting]);
@ -258,10 +279,7 @@ export default function MapPage({
let min = Infinity;
let max = -Infinity;
for (const d of items) {
const c =
'count' in d
? (d as { count: number }).count
: (d as { properties: { count: number } }).properties.count;
const c = 'count' in d ? d.count : d.properties.count;
if (c < min) min = c;
if (c > max) max = c;
}
@ -301,10 +319,8 @@ export default function MapPage({
screenshotMode
ogMode={ogMode}
bounds={mapData.bounds}
travelTimeEnabled={travelTime.enabled}
travelTimeDestination={travelTime.destination}
travelTimeColorRange={mapData.travelTimeColorRange}
travelTimeRange={travelTime.timeRange}
travelTimeEntries={travelTime.entries}
travelTimeColorRanges={mapData.travelTimeColorRanges}
/>
</div>
);
@ -373,19 +389,15 @@ export default function MapPage({
onCancelPin={handleCancelPin}
openInfoFeature={pendingInfoFeature}
onClearOpenInfoFeature={onClearPendingInfoFeature}
travelTimeEnabled={travelTime.enabled}
travelTimeDestination={travelTime.destination}
travelTimeDestinationLabel={travelTime.destinationLabel}
travelTimeMode={travelTime.mode}
travelTimeRange={travelTime.timeRange}
travelTimeDataRange={travelTimeDataRange}
onTravelTimeEnable={travelTime.handleEnable}
onTravelTimeDisable={travelTime.handleDisable}
travelTimeEntries={travelTime.entries}
travelTimeDataRanges={travelTimeDataRanges}
onTravelTimeEnableMode={travelTime.handleEnableMode}
onTravelTimeDisableMode={travelTime.handleDisableMode}
onTravelTimeSetDestination={travelTime.handleSetDestination}
onTravelTimeModeChange={travelTime.handleModeChange}
onTravelTimeRangeChange={travelTime.handleTimeRangeChange}
aiFilterLoading={aiFilters.loading}
aiFilterError={aiFilters.error}
aiFilterNotes={aiFilters.notes}
onAiFilterSubmit={handleAiFilterSubmit}
/>
);
@ -426,14 +438,12 @@ export default function MapPage({
initialViewState={initialViewState}
theme={theme}
filters={filters}
searchedPostcode={searchedPostcode}
onPostcodeSearched={setSearchedPostcode}
selectedPostcodeGeometry={selection.selectedPostcodeGeometry}
onLocationSearched={handleLocationSearchResult}
bounds={mapData.bounds}
hideLegend
travelTimeEnabled={travelTime.enabled}
travelTimeDestination={travelTime.destination}
travelTimeColorRange={mapData.travelTimeColorRange}
travelTimeRange={travelTime.timeRange}
travelTimeEntries={travelTime.entries}
travelTimeColorRanges={mapData.travelTimeColorRanges}
/>
{mapData.loading && (
<div className="absolute bottom-2 left-2 bg-white dark:bg-navy-800 dark:text-warm-200 px-2 py-1 rounded shadow text-xs">
@ -461,43 +471,54 @@ export default function MapPage({
style={{ flex: '55 0 0' }}
>
{/* Legend */}
{travelTime.enabled && travelTime.destination && mapData.travelTimeColorRange ? (
<MapLegend
featureLabel="Travel time"
range={mapData.travelTimeColorRange}
showCancel={false}
onCancel={handleCancelPin}
mode="feature"
theme={theme}
inline
suffix=" min"
/>
) : viewFeature && mapData.colorRange && mobileLegendMeta ? (
<MapLegend
featureLabel={
viewSource === 'eye'
? `Previewing \u201c${mobileLegendMeta.name}\u201d`
: mobileLegendMeta.name
}
range={mapData.colorRange}
showCancel={viewSource === 'eye'}
onCancel={handleCancelPin}
mode="feature"
enumValues={mobileLegendMeta.type === 'enum' ? mobileLegendMeta.values : undefined}
theme={theme}
inline
/>
) : (
<MapLegend
featureLabel="Property density"
range={mobileDensityRange}
showCancel={false}
onCancel={handleCancelPin}
mode="density"
theme={theme}
inline
/>
)}
{(() => {
const primaryMode = TRANSPORT_MODES.find(
(m) => travelTime.entries[m]?.destination && mapData.travelTimeColorRanges[m]
);
if (primaryMode) {
return (
<MapLegend
featureLabel={`Travel time (${MODE_LABELS[primaryMode]})`}
range={mapData.travelTimeColorRanges[primaryMode]!}
showCancel={false}
onCancel={handleCancelPin}
mode="feature"
theme={theme}
inline
suffix=" min"
/>
);
}
if (viewFeature && mapData.colorRange && mobileLegendMeta) {
return (
<MapLegend
featureLabel={
viewSource === 'eye'
? `Previewing \u201c${mobileLegendMeta.name}\u201d`
: mobileLegendMeta.name
}
range={mapData.colorRange}
showCancel={viewSource === 'eye'}
onCancel={handleCancelPin}
mode="feature"
enumValues={mobileLegendMeta.type === 'enum' ? mobileLegendMeta.values : undefined}
theme={theme}
inline
/>
);
}
return (
<MapLegend
featureLabel="Property density"
range={mobileDensityRange}
showCancel={false}
onCancel={handleCancelPin}
mode="density"
theme={theme}
inline
/>
);
})()}
{/* Filters content */}
<div className="flex-1 min-h-0">
{renderFilters()}
@ -565,13 +586,11 @@ export default function MapPage({
initialViewState={initialViewState}
theme={theme}
filters={filters}
searchedPostcode={searchedPostcode}
onPostcodeSearched={setSearchedPostcode}
selectedPostcodeGeometry={selection.selectedPostcodeGeometry}
onLocationSearched={handleLocationSearchResult}
bounds={mapData.bounds}
travelTimeEnabled={travelTime.enabled}
travelTimeDestination={travelTime.destination}
travelTimeColorRange={mapData.travelTimeColorRange}
travelTimeRange={travelTime.timeRange}
travelTimeEntries={travelTime.entries}
travelTimeColorRanges={mapData.travelTimeColorRanges}
/>
{mapData.loading && (
<div className="absolute bottom-4 left-4 bg-white dark:bg-navy-800 dark:text-warm-200 px-3 py-1 rounded shadow">

View file

@ -1,300 +0,0 @@
import { useState, useCallback, useRef, useEffect } from 'react';
import type { PostcodeGeometry, PlaceResult } from '../../types';
import { authHeaders, logNonAbortError } from '../../lib/api';
import { useIsMobile } from '../../hooks/useIsMobile';
import { SearchIcon } from '../ui/icons/SearchIcon';
import { MapPinIcon } from '../ui/icons/MapPinIcon';
export interface SearchedPostcode {
postcode: string;
geometry: PostcodeGeometry;
}
const POSTCODE_RE = /^[A-Z]{1,2}\d[A-Z\d]?\s*\d?[A-Z]{0,2}$/i;
function looksLikePostcode(s: string) {
return POSTCODE_RE.test(s.trim());
}
type SearchResult =
| { type: 'postcode'; label: string }
| { type: 'place'; name: string; place_type: string; lat: number; lon: number };
const ZOOM_FOR_TYPE: Record<string, number> = {
city: 10,
borough: 12,
town: 13,
suburb: 14,
neighbourhood: 14,
village: 14,
locality: 14,
hamlet: 15,
isolated_dwelling: 16,
};
export default function PostcodeSearch({
onFlyTo,
onPostcodeSearched,
}: {
onFlyTo: (lat: number, lng: number, zoom: number) => void;
onPostcodeSearched?: (postcode: SearchedPostcode | null) => void;
}) {
const [query, setQuery] = useState('');
const [results, setResults] = useState<SearchResult[]>([]);
const [activeIndex, setActiveIndex] = useState(-1);
const [open, setOpen] = useState(false);
const [error, setError] = useState<string | null>(null);
const [loading, setLoading] = useState(false);
const [expanded, setExpanded] = useState(false);
const isMobile = useIsMobile();
const containerRef = useRef<HTMLDivElement>(null);
const inputRef = useRef<HTMLInputElement>(null);
const abortRef = useRef<AbortController | null>(null);
const debounceRef = useRef<ReturnType<typeof setTimeout>>();
// Close on outside click
useEffect(() => {
const handler = (e: MouseEvent) => {
if (containerRef.current && !containerRef.current.contains(e.target as Node)) {
setOpen(false);
if (isMobile) setExpanded(false);
}
};
document.addEventListener('mousedown', handler);
return () => document.removeEventListener('mousedown', handler);
}, [isMobile]);
// Focus input when expanding on mobile
useEffect(() => {
if (isMobile && expanded) {
inputRef.current?.focus();
}
}, [isMobile, expanded]);
const selectPostcode = useCallback(
async (postcode: string) => {
setError(null);
setLoading(true);
setOpen(false);
try {
const res = await fetch(
`/api/postcode/${encodeURIComponent(postcode.trim())}`,
authHeaders()
);
if (!res.ok) {
setError('Postcode not found');
return;
}
const json: {
postcode: string;
latitude: number;
longitude: number;
geometry: PostcodeGeometry;
} = await res.json();
onFlyTo(json.latitude, json.longitude, 16);
onPostcodeSearched?.({ postcode: json.postcode, geometry: json.geometry });
setQuery('');
setResults([]);
if (isMobile) setExpanded(false);
} catch {
setError('Lookup failed');
} finally {
setLoading(false);
}
},
[onFlyTo, onPostcodeSearched, isMobile]
);
const selectPlace = useCallback(
(place: { name: string; place_type: string; lat: number; lon: number }) => {
const zoom = ZOOM_FOR_TYPE[place.place_type] ?? 14;
onFlyTo(place.lat, place.lon, zoom);
setQuery('');
setResults([]);
setOpen(false);
if (isMobile) setExpanded(false);
},
[onFlyTo, isMobile]
);
const selectResult = useCallback(
(result: SearchResult) => {
if (result.type === 'postcode') {
selectPostcode(result.label);
} else {
selectPlace(result);
}
},
[selectPostcode, selectPlace]
);
const handleInputChange = useCallback((value: string) => {
setQuery(value);
setError(null);
setActiveIndex(-1);
// Cancel in-flight request
abortRef.current?.abort();
if (debounceRef.current) clearTimeout(debounceRef.current);
const trimmed = value.trim();
if (!trimmed) {
setResults([]);
setOpen(false);
return;
}
if (looksLikePostcode(trimmed)) {
setResults([{ type: 'postcode', label: trimmed.toUpperCase() }]);
setOpen(true);
return;
}
if (trimmed.length < 2) {
setResults([]);
setOpen(false);
return;
}
// Debounced place search
debounceRef.current = setTimeout(async () => {
const controller = new AbortController();
abortRef.current = controller;
try {
const params = new URLSearchParams({ q: trimmed, limit: '7' });
const res = await fetch(
`/api/places?${params}`,
authHeaders({ signal: controller.signal })
);
if (!res.ok) return;
const json: { places: PlaceResult[] } = await res.json();
const placeResults: SearchResult[] = json.places.map((p) => ({
type: 'place' as const,
...p,
}));
setResults(placeResults);
setOpen(placeResults.length > 0);
} catch (err) {
logNonAbortError('places search', err);
}
}, 200);
}, []);
const handleKeyDown = useCallback(
(e: React.KeyboardEvent) => {
if (e.key === 'ArrowDown') {
e.preventDefault();
setActiveIndex((prev) => (prev < results.length - 1 ? prev + 1 : prev));
} else if (e.key === 'ArrowUp') {
e.preventDefault();
setActiveIndex((prev) => (prev > 0 ? prev - 1 : -1));
} else if (e.key === 'Enter') {
e.preventDefault();
if (activeIndex >= 0 && activeIndex < results.length) {
selectResult(results[activeIndex]);
} else if (looksLikePostcode(query)) {
selectPostcode(query);
}
} else if (e.key === 'Escape') {
setOpen(false);
inputRef.current?.blur();
}
},
[results, activeIndex, query, selectResult, selectPostcode]
);
// Cleanup on unmount
useEffect(() => {
return () => {
abortRef.current?.abort();
if (debounceRef.current) clearTimeout(debounceRef.current);
};
}, []);
// Mobile collapsed state: just a search icon button
if (isMobile && !expanded) {
return (
<button
type="button"
onClick={() => setExpanded(true)}
className="absolute top-3 left-3 z-10 p-2 bg-white dark:bg-warm-800 rounded shadow-lg"
aria-label="Search places or postcodes"
>
<SearchIcon className="w-5 h-5 text-warm-600 dark:text-warm-300" />
</button>
);
}
return (
<div ref={containerRef} className="absolute top-3 left-3 z-10 flex flex-col">
<div className="relative">
<div className="flex items-center shadow-lg rounded overflow-hidden bg-white dark:bg-warm-800">
<SearchIcon className="w-4 h-4 text-warm-400 dark:text-warm-500 ml-3 shrink-0" />
<input
ref={inputRef}
type="text"
value={query}
onChange={(e) => handleInputChange(e.target.value)}
onFocus={() => {
if (results.length > 0) setOpen(true);
}}
onKeyDown={handleKeyDown}
placeholder="Search places or postcodes..."
className="px-2 py-2 text-sm w-56 border-none outline-none bg-transparent text-warm-700 dark:text-warm-200 placeholder-warm-400 dark:placeholder-warm-500"
/>
{loading && (
<div className="mr-3 w-4 h-4 border-2 border-warm-300 dark:border-warm-600 border-t-teal-500 rounded-full animate-spin" />
)}
</div>
{open && results.length > 0 && (
<div className="absolute top-full left-0 right-0 mt-1 bg-white dark:bg-warm-800 rounded shadow-lg border border-warm-200 dark:border-warm-700 max-h-64 overflow-y-auto">
{results.map((result, idx) => (
<button
key={
result.type === 'postcode'
? `pc-${result.label}`
: `pl-${result.name}-${result.lat}`
}
type="button"
className={`w-full text-left px-3 py-2 flex items-center gap-2 text-sm cursor-pointer ${
idx === activeIndex
? 'bg-teal-50 dark:bg-teal-900/30'
: 'hover:bg-warm-50 dark:hover:bg-warm-700'
}`}
onMouseEnter={() => setActiveIndex(idx)}
onMouseDown={(e) => {
e.preventDefault();
selectResult(result);
}}
>
{result.type === 'postcode' ? (
<>
<SearchIcon className="w-4 h-4 text-warm-400 dark:text-warm-500 shrink-0" />
<span className="text-warm-700 dark:text-warm-200">{result.label}</span>
<span className="text-warm-400 dark:text-warm-500 text-xs ml-auto">
postcode
</span>
</>
) : (
<>
<MapPinIcon className="w-4 h-4 text-warm-400 dark:text-warm-500 shrink-0" />
<span className="text-warm-700 dark:text-warm-200">{result.name}</span>
<span className="text-warm-400 dark:text-warm-500 text-xs ml-auto">
{result.place_type}
</span>
</>
)}
</button>
))}
</div>
)}
</div>
{error && (
<span className="text-xs text-red-600 dark:text-red-300 bg-white/90 dark:bg-warm-800/90 rounded px-2 py-0.5 shadow mt-1">
{error}
</span>
)}
</div>
);
}

View file

@ -221,7 +221,7 @@ function PropertyCard({ property }: { property: Property }) {
{age !== undefined && (
<div>
<span className="text-warm-500 dark:text-warm-400">Built:</span>{' '}
{formatAge(age, property.is_construction_date_approximate ?? true)}
{formatAge(age, property.is_construction_date_approximate)}
</div>
)}
{property.current_energy_rating && (

View file

@ -1,10 +1,52 @@
import { useEffect, useState } from 'react';
import type { HexagonLocation } from '../../lib/external-search';
import { apiUrl, logNonAbortError } from '../../lib/api';
interface StreetViewEmbedProps {
location: HexagonLocation;
}
type Status = 'loading' | 'ok' | 'none' | 'error';
export default function StreetViewEmbed({ location }: StreetViewEmbedProps) {
const [status, setStatus] = useState<Status>('loading');
const [panoId, setPanoId] = useState<string | null>(null);
useEffect(() => {
setStatus('loading');
setPanoId(null);
const controller = new AbortController();
const params = new URLSearchParams({
lat: String(location.lat),
lon: String(location.lon),
});
fetch(apiUrl('streetview', params), { signal: controller.signal })
.then((res) => {
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
})
.then((data: { status: string; pano_id?: string }) => {
if (data.status === 'OK' && data.pano_id) {
setPanoId(data.pano_id);
setStatus('ok');
} else {
setStatus('none');
}
})
.catch((err) => {
logNonAbortError('streetview', err);
if (!controller.signal.aborted) {
setStatus('error');
}
});
return () => controller.abort();
}, [location.lat, location.lon]);
if (status === 'none' || status === 'error') return null;
return (
<div>
<div className="px-3 py-1.5 text-xs font-bold text-warm-500 bg-warm-50 dark:bg-warm-900 dark:text-warm-400 sticky top-0">
@ -12,13 +54,20 @@ export default function StreetViewEmbed({ location }: StreetViewEmbedProps) {
</div>
<div className="px-3 py-2">
<div className="rounded overflow-hidden border border-warm-200 dark:border-warm-700">
<iframe
className="w-full"
style={{ height: 240, border: 0 }}
loading="lazy"
referrerPolicy="no-referrer-when-downgrade"
src={`https://maps.google.com/maps?layer=c&cbll=${location.lat},${location.lon}&cbp=11,0,0,0,0&output=svembed`}
/>
{status === 'loading' ? (
<div
className="w-full animate-pulse bg-warm-200 dark:bg-warm-700"
style={{ height: 240 }}
/>
) : (
<iframe
className="w-full"
style={{ height: 240, border: 0 }}
loading="lazy"
referrerPolicy="no-referrer-when-downgrade"
src={`https://maps.google.com/maps?layer=c&panoid=${panoId}&cbp=11,0,0,0,0&output=svembed`}
/>
)}
</div>
</div>
</div>

View file

@ -1,61 +1,69 @@
import { useState, useCallback } from 'react';
import { useState, useCallback, useRef, useEffect } from 'react';
import { Slider } from '../ui/Slider';
import { PillToggle } from '../ui/PillToggle';
import { PillGroup } from '../ui/PillGroup';
import { IconButton } from '../ui/IconButton';
import { PlaceSearchInput } from '../ui/PlaceSearchInput';
import { CloseIcon } from '../ui/icons/CloseIcon';
import { MapPinIcon } from '../ui/icons/MapPinIcon';
import { RouteIcon } from '../ui/icons/RouteIcon';
import { formatFilterValue } from '../../lib/format';
import { authHeaders } from '../../lib/api';
import type { TransportMode } from '../../hooks/useTravelTime';
const MODES: { value: TransportMode; label: string }[] = [
{ value: 'car', label: 'Car' },
{ value: 'bicycle', label: 'Bicycle' },
{ value: 'walking', label: 'Walking' },
{ value: 'transit', label: 'Transit' },
];
import { authHeaders, logNonAbortError } from '../../lib/api';
import { useLocationSearch, type SearchResult } from '../../hooks/useLocationSearch';
import { MODE_LABELS, type TransportMode } from '../../hooks/useTravelTime';
interface TravelTimeCardProps {
mode: TransportMode;
destination: [number, number] | null;
destinationLabel: string;
mode: TransportMode;
timeRange: [number, number] | null;
dataRange: [number, number] | null;
onSetDestination: (lat: number, lon: number, label: string) => void;
onModeChange: (mode: TransportMode) => void;
onTimeRangeChange: (range: [number, number]) => void;
onRemove: () => void;
}
export function TravelTimeCard({
mode,
destination,
destinationLabel,
mode,
timeRange,
dataRange,
onSetDestination,
onModeChange,
onTimeRangeChange,
onRemove,
}: TravelTimeCardProps) {
const [query, setQuery] = useState('');
const search = useLocationSearch();
const [loading, setLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const containerRef = useRef<HTMLDivElement>(null);
const handleSearch = useCallback(
async (e: React.FormEvent) => {
e.preventDefault();
const trimmed = query.trim();
if (!trimmed) return;
// Close dropdown on outside click
useEffect(() => {
const handler = (e: MouseEvent) => {
if (containerRef.current && !containerRef.current.contains(e.target as Node)) {
search.close();
}
};
document.addEventListener('mousedown', handler);
return () => document.removeEventListener('mousedown', handler);
}, [search]);
const selectResult = useCallback(
async (result: SearchResult) => {
if (result.type === 'place') {
onSetDestination(result.lat, result.lon, result.name);
search.clear();
setError(null);
return;
}
// Postcode — fetch coordinates
setError(null);
setLoading(true);
search.close();
try {
const res = await fetch(
`/api/postcode/${encodeURIComponent(trimmed)}`,
authHeaders()
`/api/postcode/${encodeURIComponent(result.label)}`,
authHeaders(),
);
if (!res.ok) {
setError('Postcode not found');
@ -64,14 +72,15 @@ export function TravelTimeCard({
const json: { postcode: string; latitude: number; longitude: number } =
await res.json();
onSetDestination(json.latitude, json.longitude, json.postcode);
setQuery('');
} catch {
search.clear();
} catch (err) {
logNonAbortError('Postcode lookup failed', err);
setError('Lookup failed');
} finally {
setLoading(false);
}
},
[query, onSetDestination]
[onSetDestination, search],
);
const sliderMin = dataRange ? Math.floor(dataRange[0]) : 0;
@ -85,7 +94,7 @@ export function TravelTimeCard({
<div className="flex items-center gap-1.5">
<RouteIcon className="w-4 h-4 text-teal-600 dark:text-teal-400" />
<span className="text-sm font-medium text-navy-950 dark:text-warm-100">
Travel Time
Travel Time ({MODE_LABELS[mode]})
</span>
</div>
<IconButton onClick={() => onRemove()} title="Remove travel time">
@ -94,26 +103,17 @@ export function TravelTimeCard({
</div>
{/* Destination search */}
<div>
<form onSubmit={handleSearch} className="flex gap-1">
<input
type="text"
value={query}
onChange={(e) => {
setQuery(e.target.value);
setError(null);
}}
placeholder={destination ? 'Change destination...' : 'Enter postcode...'}
className="flex-1 min-w-0 px-2 py-1 text-xs rounded border border-warm-200 dark:border-warm-600 bg-white dark:bg-warm-800 text-navy-950 dark:text-warm-200 placeholder-warm-400 dark:placeholder-warm-500 outline-none focus:ring-1 focus:ring-teal-400"
/>
<button
type="submit"
disabled={loading || !query.trim()}
className="px-2 py-1 text-xs rounded bg-teal-600 text-white hover:bg-teal-700 disabled:opacity-50"
>
{loading ? '...' : 'Go'}
</button>
</form>
<div ref={containerRef} className="relative">
<PlaceSearchInput
search={search}
onSelect={selectResult}
loading={loading}
placeholder={destination ? 'Change destination...' : 'Search destination...'}
size="xs"
inputClassName="w-full px-2 py-1 text-xs rounded border border-warm-200 dark:border-warm-600 bg-white dark:bg-warm-800 text-navy-950 dark:text-warm-200 placeholder-warm-400 dark:placeholder-warm-500 outline-none focus:ring-1 focus:ring-teal-400"
onInputChange={() => setError(null)}
/>
{error && (
<p className="text-xs text-red-600 dark:text-red-400 mt-0.5">{error}</p>
)}
@ -127,24 +127,6 @@ export function TravelTimeCard({
)}
</div>
{/* Mode selector */}
<div>
<span className="text-[10px] font-medium text-warm-500 dark:text-warm-400 uppercase tracking-wide">
Mode
</span>
<PillGroup className="mt-0.5">
{MODES.map((m) => (
<PillToggle
key={m.value}
label={m.label}
active={mode === m.value}
onClick={() => onModeChange(m.value)}
size="xs"
/>
))}
</PillGroup>
</div>
{/* Time range slider — only show when we have data */}
{destination && dataRange && (
<div>

View file

@ -0,0 +1,123 @@
import type React from 'react';
import type { SearchResult } from '../../hooks/useLocationSearch';
import { SearchIcon } from './icons/SearchIcon';
import { MapPinIcon } from './icons/MapPinIcon';
interface SearchHook {
query: string;
results: SearchResult[];
activeIndex: number;
setActiveIndex: (idx: number) => void;
open: boolean;
setOpen: (open: boolean) => void;
handleInputChange: (value: string) => void;
handleKeyDown: (
e: React.KeyboardEvent,
onSelect: (result: SearchResult) => void,
) => void;
}
interface PlaceSearchInputProps {
search: SearchHook;
onSelect: (result: SearchResult) => void;
loading?: boolean;
placeholder?: string;
size?: 'sm' | 'xs';
inputClassName?: string;
inputRef?: React.Ref<HTMLInputElement>;
onInputChange?: () => void;
}
export function PlaceSearchInput({
search,
onSelect,
loading,
placeholder,
size = 'sm',
inputClassName,
inputRef,
onInputChange,
}: PlaceSearchInputProps) {
const sm = size === 'sm';
const iconSize = sm ? 'w-4 h-4' : 'w-3 h-3';
const spinnerSize = sm ? 'w-4 h-4' : 'w-3 h-3';
return (
<div className="relative flex-1 min-w-0">
<input
ref={inputRef}
type="text"
value={search.query}
onChange={(e) => {
search.handleInputChange(e.target.value);
onInputChange?.();
}}
onFocus={() => {
if (search.results.length > 0) search.setOpen(true);
}}
onKeyDown={(e) => search.handleKeyDown(e, onSelect)}
placeholder={placeholder}
className={inputClassName}
/>
{loading && (
<div
className={`absolute right-2 top-1/2 -translate-y-1/2 ${spinnerSize} border-2 border-warm-300 dark:border-warm-600 border-t-teal-500 rounded-full animate-spin`}
/>
)}
{search.open && search.results.length > 0 && (
<div
className={`absolute top-full left-0 right-0 mt-1 bg-white dark:bg-warm-800 rounded shadow-lg border border-warm-200 dark:border-warm-700 ${sm ? 'max-h-64' : 'max-h-48'} overflow-y-auto z-20`}
>
{search.results.map((result, idx) => (
<button
key={
result.type === 'postcode'
? `pc-${result.label}`
: `pl-${result.name}-${result.lat}`
}
type="button"
className={`w-full text-left flex items-center cursor-pointer ${
sm ? 'px-3 py-2 gap-2 text-sm' : 'px-2 py-1.5 gap-1.5 text-xs'
} ${
idx === search.activeIndex
? 'bg-teal-50 dark:bg-teal-900/30'
: 'hover:bg-warm-50 dark:hover:bg-warm-700'
}`}
onMouseEnter={() => search.setActiveIndex(idx)}
onMouseDown={(e) => {
e.preventDefault();
onSelect(result);
}}
>
{result.type === 'postcode' ? (
<>
<SearchIcon
className={`${iconSize} text-warm-400 dark:text-warm-500 shrink-0`}
/>
<span className="text-warm-700 dark:text-warm-200">{result.label}</span>
</>
) : (
<>
<MapPinIcon
className={`${iconSize} text-warm-400 dark:text-warm-500 shrink-0`}
/>
<span className="text-warm-700 dark:text-warm-200">
{result.name}
{result.city && (
<span className="text-warm-400 dark:text-warm-500">
{' '}
({result.city})
</span>
)}
</span>
</>
)}
</button>
))}
</div>
)}
</div>
);
}

View file

@ -2,24 +2,32 @@ import { useState, useCallback, useRef } from 'react';
import type { FeatureFilters } from '../types';
import { apiUrl, authHeaders, logNonAbortError } from '../lib/api';
interface AiFiltersResult {
filters: FeatureFilters;
notes: string;
}
interface UseAiFiltersResult {
fetchAiFilters: (query: string) => Promise<FeatureFilters | null>;
fetchAiFilters: (query: string) => Promise<AiFiltersResult | null>;
loading: boolean;
error: string | null;
notes: string | null;
}
export function useAiFilters(): UseAiFiltersResult {
const [loading, setLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const [notes, setNotes] = useState<string | null>(null);
const abortRef = useRef<AbortController | null>(null);
const fetchAiFilters = useCallback(async (query: string): Promise<FeatureFilters | null> => {
const fetchAiFilters = useCallback(async (query: string): Promise<AiFiltersResult | null> => {
abortRef.current?.abort();
const controller = new AbortController();
abortRef.current = controller;
setLoading(true);
setError(null);
setNotes(null);
try {
const url = apiUrl('ai-filters');
@ -39,8 +47,13 @@ export function useAiFilters(): UseAiFiltersResult {
}
const json = await response.json();
const result: AiFiltersResult = {
filters: json.filters as FeatureFilters,
notes: json.notes || '',
};
setNotes(result.notes || null);
setLoading(false);
return json.filters as FeatureFilters;
return result;
} catch (err) {
if (controller.signal.aborted) return null;
logNonAbortError('ai-filters', err);
@ -51,5 +64,5 @@ export function useAiFilters(): UseAiFiltersResult {
}
}, []);
return { fetchAiFilters, loading, error };
return { fetchAiFilters, loading, error, notes };
}

View file

@ -7,13 +7,14 @@ export interface AuthUser {
verified: boolean;
}
// PocketBase RecordModel stores user fields as dynamic properties
// eslint-disable-next-line @typescript-eslint/no-explicit-any
function recordToUser(record: any): AuthUser {
function recordToUser(record: { id: string; [key: string]: unknown }): AuthUser {
if (typeof record.email !== 'string') {
throw new Error('PocketBase record missing email field');
}
return {
id: record.id || '',
email: record.email || '',
verified: record.verified || false,
id: record.id,
email: record.email,
verified: typeof record.verified === 'boolean' ? record.verified : false,
};
}

View file

@ -6,13 +6,18 @@ import type {
HexagonData,
PostcodeFeature,
PostcodeProperties,
PostcodeGeometry,
POI,
FeatureMeta,
Bounds,
} from '../types';
import type { SearchedPostcode } from '../components/map/PostcodeSearch';
import { DENSITY_GRADIENT, DENSITY_GRADIENT_DARK } from '../lib/consts';
import { emojiToTwemojiUrl, getFeatureFillColor } from '../lib/map-utils';
import {
TRANSPORT_MODES,
type TransportMode,
type TravelTimeEntries,
} from './useTravelTime';
/** Convert POI id (e.g. "n12345") to OpenStreetMap URL */
function osmIdToUrl(id: string): string | null {
@ -38,12 +43,10 @@ interface UseDeckLayersProps {
onHexagonClick: (id: string, isPostcode?: boolean) => void;
onHexagonHover: (h3: string | null, x?: number, y?: number) => void;
theme: 'light' | 'dark';
searchedPostcode?: SearchedPostcode | null;
selectedPostcodeGeometry?: PostcodeGeometry | null;
bounds?: Bounds | null;
travelTimeEnabled?: boolean;
travelTimeDestination?: [number, number] | null;
travelTimeColorRange?: [number, number] | null;
travelTimeRange?: [number, number] | null;
travelTimeEntries?: TravelTimeEntries;
travelTimeColorRanges?: Partial<Record<TransportMode, [number, number]>>;
}
export interface PopupInfo {
@ -54,6 +57,17 @@ export interface PopupInfo {
id: string;
}
/** Find the primary travel mode: first mode (in canonical order) with a destination and color range. */
function getPrimaryTravelMode(
entries: TravelTimeEntries,
colorRanges: Partial<Record<TransportMode, [number, number]>>
): TransportMode | null {
for (const mode of TRANSPORT_MODES) {
if (entries[mode]?.destination && colorRanges[mode]) return mode;
}
return null;
}
export function useDeckLayers({
data,
postcodeData,
@ -68,12 +82,10 @@ export function useDeckLayers({
onHexagonClick,
onHexagonHover,
theme,
searchedPostcode,
selectedPostcodeGeometry,
bounds: viewportBounds,
travelTimeEnabled = false,
travelTimeDestination,
travelTimeColorRange,
travelTimeRange,
travelTimeEntries = {},
travelTimeColorRanges = {},
}: UseDeckLayersProps) {
const [popupInfo, setPopupInfo] = useState<PopupInfo | null>(null);
const [hoverPosition, setHoverPosition] = useState<{ x: number; y: number } | null>(null);
@ -103,14 +115,17 @@ export function useDeckLayers({
const hoveredPostcodeRef = useRef(hoveredPostcode);
hoveredPostcodeRef.current = hoveredPostcode;
const travelTimeEnabledRef = useRef(travelTimeEnabled);
travelTimeEnabledRef.current = travelTimeEnabled;
const travelTimeDestinationRef = useRef(travelTimeDestination);
travelTimeDestinationRef.current = travelTimeDestination;
const travelTimeColorRangeRef = useRef(travelTimeColorRange);
travelTimeColorRangeRef.current = travelTimeColorRange;
const travelTimeRangeRef = useRef(travelTimeRange);
travelTimeRangeRef.current = travelTimeRange;
const travelTimeEntriesRef = useRef(travelTimeEntries);
travelTimeEntriesRef.current = travelTimeEntries;
const travelTimeColorRangesRef = useRef(travelTimeColorRanges);
travelTimeColorRangesRef.current = travelTimeColorRanges;
const primaryTravelMode = useMemo(
() => getPrimaryTravelMode(travelTimeEntries, travelTimeColorRanges),
[travelTimeEntries, travelTimeColorRanges]
);
const primaryTravelModeRef = useRef(primaryTravelMode);
primaryTravelModeRef.current = primaryTravelMode;
const colorFeatureMeta = useMemo(
() => (viewFeature ? features.find((f) => f.name === viewFeature) || null : null),
@ -238,7 +253,17 @@ export function useDeckLayers({
}, []);
// --- Color triggers ---
const ttTrigger = `${travelTimeEnabled}|${travelTimeColorRange?.[0]}|${travelTimeColorRange?.[1]}|${travelTimeRange?.[0]}|${travelTimeRange?.[1]}|${travelTimeDestination?.[0]}|${travelTimeDestination?.[1]}`;
// Build travel time trigger from all entries
const ttTrigger = useMemo(() => {
const parts: string[] = [];
for (const mode of TRANSPORT_MODES) {
const entry = travelTimeEntries[mode];
const cr = travelTimeColorRanges[mode];
parts.push(`${mode}:${entry?.destination?.[0]}|${entry?.destination?.[1]}|${cr?.[0]}|${cr?.[1]}|${entry?.timeRange?.[0]}|${entry?.timeRange?.[1]}`);
}
return parts.join(';');
}, [travelTimeEntries, travelTimeColorRanges]);
const colorTrigger = `${viewFeature}|${colorRange?.[0]}|${colorRange?.[1]}|${filterRange?.[0]}|${filterRange?.[1]}|${countRange.min}|${countRange.max}|${selectedHexagonId}|${hoveredHexagonId}|${theme}|${ttTrigger}`;
const postcodeColorTrigger = `${viewFeature}|${colorRange?.[0]}|${colorRange?.[1]}|${filterRange?.[0]}|${filterRange?.[1]}|${postcodeCountRange.min}|${postcodeCountRange.max}|${selectedPostcode}|${hoveredPostcode}|${theme}|${ttTrigger}`;
@ -251,17 +276,28 @@ export function useDeckLayers({
getHexagon: (d) => d.h3,
getFillColor: (d) => {
const dark = isDarkRef.current;
// Travel time coloring takes priority
if (travelTimeEnabledRef.current && travelTimeDestinationRef.current) {
const ttVal = d.travel_time;
const ttClr = travelTimeColorRangeRef.current;
const pm = primaryTravelModeRef.current;
const entries = travelTimeEntriesRef.current;
const colorRanges = travelTimeColorRangesRef.current;
// Travel time coloring: primary mode colors, others dim-filter
if (pm) {
const ttVal = d[`travel_time_${pm}`];
const ttClr = colorRanges[pm];
if (ttVal == null) {
return (dark ? [80, 70, 65, 80] : [128, 128, 128, 80]) as [number, number, number, number];
}
const ttFr = travelTimeRangeRef.current;
if (ttFr && ((ttVal as number) < ttFr[0] || (ttVal as number) > ttFr[1])) {
return (dark ? [60, 55, 50, 60] : [180, 180, 180, 60]) as [number, number, number, number];
// Check all modes with time ranges as filters (including primary)
for (const mode of TRANSPORT_MODES) {
const entry = entries[mode];
if (!entry?.timeRange) continue;
const modeVal = d[`travel_time_${mode}`];
if (modeVal == null || (modeVal as number) < entry.timeRange[0] || (modeVal as number) > entry.timeRange[1]) {
return (dark ? [60, 55, 50, 60] : [180, 180, 180, 60]) as [number, number, number, number];
}
}
if (ttClr) {
return getFeatureFillColor(
ttVal as number,
@ -464,19 +500,19 @@ export function useDeckLayers({
[pois, stablePoiHover]
);
// Check if the searched postcode has data (passes current filters)
const searchedPostcodeHasData = useMemo(() => {
if (!searchedPostcode) return false;
return postcodeData.some((f) => f.properties.postcode === searchedPostcode.postcode);
}, [searchedPostcode, postcodeData]);
// Check if the selected postcode has data (passes current filters)
const selectedPostcodeHasData = useMemo(() => {
if (!selectedPostcodeGeometry || !selectedHexagonId) return false;
return postcodeData.some((f) => f.properties.postcode === selectedHexagonId);
}, [selectedPostcodeGeometry, selectedHexagonId, postcodeData]);
// Highlight layer for searched postcode
const searchedPostcodeHighlightLayer = useMemo(() => {
if (!searchedPostcode) return null;
const hasData = searchedPostcodeHasData;
// Highlight layer for selected postcode (from search)
const selectedPostcodeHighlightLayer = useMemo(() => {
if (!selectedPostcodeGeometry) return null;
const hasData = selectedPostcodeHasData;
const feature = {
type: 'Feature' as const,
geometry: searchedPostcode.geometry,
geometry: selectedPostcodeGeometry,
properties: {},
};
return new GeoJsonLayer({
@ -494,13 +530,25 @@ export function useDeckLayers({
filled: true,
pickable: false,
});
}, [searchedPostcode, searchedPostcodeHasData]);
}, [selectedPostcodeGeometry, selectedPostcodeHasData]);
// Destination markers: one red dot per mode with a destination
const destinationMarkerData = useMemo(() => {
const points: { position: [number, number] }[] = [];
for (const mode of TRANSPORT_MODES) {
const entry = travelTimeEntries[mode];
if (entry?.destination) {
points.push({ position: [entry.destination[1], entry.destination[0]] });
}
}
return points;
}, [travelTimeEntries]);
const destinationMarkerLayer = useMemo(() => {
if (!travelTimeEnabled || !travelTimeDestination) return null;
if (destinationMarkerData.length === 0) return null;
return new ScatterplotLayer({
id: 'travel-time-destination',
data: [{ position: [travelTimeDestination[1], travelTimeDestination[0]] }],
id: 'travel-time-destinations',
data: destinationMarkerData,
getPosition: (d: { position: [number, number] }) => d.position,
getRadius: 8,
getFillColor: [239, 68, 68, 220],
@ -511,14 +559,14 @@ export function useDeckLayers({
stroked: true,
pickable: false,
});
}, [travelTimeEnabled, travelTimeDestination]);
}, [destinationMarkerData]);
const layers = useMemo(() => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const baseLayers: any[] = usePostcodeView
? [postcodeLayer, postcodeLabelsLayer, poiLayer]
: [hexLayer, poiLayer];
if (searchedPostcodeHighlightLayer) baseLayers.push(searchedPostcodeHighlightLayer);
if (selectedPostcodeHighlightLayer) baseLayers.push(selectedPostcodeHighlightLayer);
if (destinationMarkerLayer) baseLayers.push(destinationMarkerLayer);
return baseLayers;
}, [
@ -527,7 +575,7 @@ export function useDeckLayers({
postcodeLayer,
postcodeLabelsLayer,
poiLayer,
searchedPostcodeHighlightLayer,
selectedPostcodeHighlightLayer,
destinationMarkerLayer,
]);
@ -548,5 +596,6 @@ export function useDeckLayers({
handleMouseLeave,
selectedPostcode,
hoveredPostcode,
primaryTravelMode,
};
}

View file

@ -78,7 +78,8 @@ export function useFilters({ initialFilters, features }: UseFiltersOptions) {
const m = features.find((f) => f.name === n);
if (m?.type === 'enum') return `${n}:${(value as string[]).join('|')}`;
const [min, max] = value as [number, number];
return `${n}:${min}:${max}`;
const maxStr = m?.absolute && max === m.max ? 'inf' : String(max);
return `${n}:${min}:${maxStr}`;
})
.join(',');
}

View file

@ -3,10 +3,11 @@ import type {
FeatureMeta,
FeatureFilters,
Property,
PostcodeGeometry,
HexagonPropertiesResponse,
HexagonStatsResponse,
} from '../types';
import { buildFilterString, apiUrl, logNonAbortError, authHeaders } from '../lib/api';
import { buildFilterString, apiUrl, assertOk, logNonAbortError, authHeaders } from '../lib/api';
interface SelectedHexagon {
id: string;
@ -30,6 +31,8 @@ export function useHexagonSelection({ filters, features, resolution }: UseHexago
const [loadingAreaStats, setLoadingAreaStats] = useState(false);
const [hoveredHexagon, setHoveredHexagon] = useState<string | null>(null);
const [rightPaneTab, setRightPaneTab] = useState<'properties' | 'area'>('area');
const [selectedPostcodeGeometry, setSelectedPostcodeGeometry] =
useState<PostcodeGeometry | null>(null);
const fetchHexagonStats = useCallback(
async (h3: string, res: number, signal?: AbortSignal, fields?: string[]) => {
@ -43,6 +46,7 @@ export function useHexagonSelection({ filters, features, resolution }: UseHexago
params.set('fields', fields.join(','));
}
const response = await fetch(apiUrl('hexagon-stats', params), authHeaders({ signal }));
assertOk(response, 'hexagon-stats');
return (await response.json()) as HexagonStatsResponse;
},
[filters, features]
@ -54,6 +58,7 @@ export function useHexagonSelection({ filters, features, resolution }: UseHexago
const filterStr = buildFilterString(filters, features);
if (filterStr) params.append('filters', filterStr);
const response = await fetch(apiUrl('postcode-stats', params), authHeaders({ signal }));
assertOk(response, 'postcode-stats');
return (await response.json()) as HexagonStatsResponse;
},
[filters, features]
@ -74,6 +79,7 @@ export function useHexagonSelection({ filters, features, resolution }: UseHexago
if (filterStr) params.append('filters', filterStr);
const response = await fetch(apiUrl('hexagon-properties', params), authHeaders());
assertOk(response, 'hexagon-properties');
const data: HexagonPropertiesResponse = await response.json();
if (offset === 0) {
@ -84,7 +90,7 @@ export function useHexagonSelection({ filters, features, resolution }: UseHexago
setPropertiesTotal(data.total);
setPropertiesOffset(offset + data.properties.length);
} catch (err) {
console.error('Failed to fetch properties:', err);
logNonAbortError('Failed to fetch properties', err);
} finally {
setLoadingProperties(false);
}
@ -94,6 +100,7 @@ export function useHexagonSelection({ filters, features, resolution }: UseHexago
const handleHexagonClick = useCallback(
(id: string, isPostcode = false) => {
setSelectedPostcodeGeometry(null);
if (selectedHexagon?.id === id) {
setSelectedHexagon(null);
setProperties([]);
@ -154,8 +161,27 @@ export function useHexagonSelection({ filters, features, resolution }: UseHexago
setSelectedHexagon(null);
setProperties([]);
setAreaStats(null);
setSelectedPostcodeGeometry(null);
}, []);
const handleLocationSearch = useCallback(
(postcode: string, geometry: PostcodeGeometry) => {
setSelectedHexagon({ id: postcode, type: 'postcode', resolution });
setSelectedPostcodeGeometry(geometry);
setProperties([]);
setPropertiesTotal(0);
setPropertiesOffset(0);
setRightPaneTab('area');
setLoadingAreaStats(true);
fetchPostcodeStats(postcode)
.then((stats) => setAreaStats(stats))
.catch((error) => logNonAbortError('Failed to fetch postcode stats', error))
.finally(() => setLoadingAreaStats(false));
},
[resolution, fetchPostcodeStats]
);
return {
selectedHexagon,
properties,
@ -172,5 +198,7 @@ export function useHexagonSelection({ filters, features, resolution }: UseHexago
handlePropertiesTabClick,
handleLoadMoreProperties,
handleCloseSelection,
selectedPostcodeGeometry,
handleLocationSearch,
};
}

View file

@ -0,0 +1,123 @@
import { useState, useCallback, useRef, useEffect } from 'react';
import type { PlaceResult } from '../types';
import { authHeaders, logNonAbortError } from '../lib/api';
const POSTCODE_RE = /^[A-Z]{1,2}\d[A-Z\d]?\s*\d?[A-Z]{0,2}$/i;
export function looksLikePostcode(s: string) {
return POSTCODE_RE.test(s.trim());
}
export type SearchResult =
| { type: 'postcode'; label: string }
| { type: 'place'; name: string; place_type: string; lat: number; lon: number; city?: string };
export function useLocationSearch() {
const [query, setQuery] = useState('');
const [results, setResults] = useState<SearchResult[]>([]);
const [activeIndex, setActiveIndex] = useState(-1);
const [open, setOpen] = useState(false);
const abortRef = useRef<AbortController | null>(null);
const debounceRef = useRef<ReturnType<typeof setTimeout>>();
const handleInputChange = useCallback((value: string) => {
setQuery(value);
setActiveIndex(-1);
abortRef.current?.abort();
if (debounceRef.current) clearTimeout(debounceRef.current);
const trimmed = value.trim();
if (!trimmed) {
setResults([]);
setOpen(false);
return;
}
if (looksLikePostcode(trimmed)) {
setResults([{ type: 'postcode', label: trimmed.toUpperCase() }]);
setOpen(true);
return;
}
if (trimmed.length < 2) {
setResults([]);
setOpen(false);
return;
}
debounceRef.current = setTimeout(async () => {
const controller = new AbortController();
abortRef.current = controller;
try {
const params = new URLSearchParams({ q: trimmed, limit: '7' });
const res = await fetch(
`/api/places?${params}`,
authHeaders({ signal: controller.signal }),
);
if (!res.ok) return;
const json: { places: PlaceResult[] } = await res.json();
const placeResults: SearchResult[] = json.places.map((p) => ({
type: 'place' as const,
...p,
}));
setResults(placeResults);
setOpen(placeResults.length > 0);
} catch (err) {
logNonAbortError('places search', err);
}
}, 200);
}, []);
const close = useCallback(() => setOpen(false), []);
const clear = useCallback(() => {
setQuery('');
setResults([]);
setOpen(false);
setActiveIndex(-1);
}, []);
const handleKeyDown = useCallback(
(e: React.KeyboardEvent, onSelect: (result: SearchResult) => void) => {
if (e.key === 'ArrowDown') {
e.preventDefault();
setActiveIndex((prev) => (prev < results.length - 1 ? prev + 1 : prev));
} else if (e.key === 'ArrowUp') {
e.preventDefault();
setActiveIndex((prev) => (prev > 0 ? prev - 1 : -1));
} else if (e.key === 'Enter') {
e.preventDefault();
if (activeIndex >= 0 && activeIndex < results.length) {
onSelect(results[activeIndex]);
} else if (looksLikePostcode(query)) {
onSelect({ type: 'postcode', label: query.trim().toUpperCase() });
}
} else if (e.key === 'Escape') {
setOpen(false);
}
},
[results, activeIndex, query],
);
// Cleanup on unmount
useEffect(() => {
return () => {
abortRef.current?.abort();
if (debounceRef.current) clearTimeout(debounceRef.current);
};
}, []);
return {
query,
results,
activeIndex,
setActiveIndex,
open,
setOpen,
handleInputChange,
handleKeyDown,
close,
clear,
};
}

View file

@ -8,9 +8,10 @@ import type {
ViewChangeParams,
ApiResponse,
} from '../types';
import { buildFilterString, apiUrl, logNonAbortError, authHeaders } from '../lib/api';
import { buildFilterString, apiUrl, assertOk, logNonAbortError, authHeaders } from '../lib/api';
import { POSTCODE_ZOOM_THRESHOLD } from '../lib/consts';
import { COLOR_RANGE_LOW_PERCENTILE, COLOR_RANGE_HIGH_PERCENTILE } from '../lib/consts';
import { TRANSPORT_MODES, type TransportMode, type TravelTimeEntries } from './useTravelTime';
/** Return the p-th percentile (0100) from a sorted array via linear interpolation. */
function percentile(sorted: number[], p: number): number {
@ -32,9 +33,7 @@ interface UseMapDataOptions {
activeFeature: string | null;
dragValue: [number, number] | null;
dragData: HexagonData[] | null;
travelTimeEnabled: boolean;
travelTimeDestination: [number, number] | null;
travelTimeMode: string;
travelTimeEntries: TravelTimeEntries;
}
export function useMapData({
@ -44,9 +43,7 @@ export function useMapData({
activeFeature,
dragValue,
dragData,
travelTimeEnabled,
travelTimeDestination,
travelTimeMode,
travelTimeEntries,
}: UseMapDataOptions) {
const [rawData, setRawData] = useState<HexagonData[]>([]);
const [postcodeData, setPostcodeData] = useState<PostcodeFeature[]>([]);
@ -71,6 +68,18 @@ export function useMapData({
[filters, features]
);
// Build the travel param string from entries with destinations
const travelParam = useMemo((): string => {
const segments: string[] = [];
for (const mode of TRANSPORT_MODES) {
const entry = travelTimeEntries[mode];
if (entry?.destination) {
segments.push(`${entry.destination[0]},${entry.destination[1]},${mode}`);
}
}
return segments.join('|');
}, [travelTimeEntries]);
// Fetch hexagons or postcodes when bounds/filters change
useEffect(() => {
if (!bounds) return;
@ -100,6 +109,7 @@ export function useMapData({
signal: abortControllerRef.current.signal,
})
);
assertOk(res, 'postcodes');
const json: { features: PostcodeFeature[] } = await res.json();
setPostcodeData(json.features);
setRawData([]);
@ -110,9 +120,8 @@ export function useMapData({
});
if (filtersStr) params.set('filters', filtersStr);
params.set('fields', viewFeature || '');
if (travelTimeEnabled && travelTimeDestination) {
params.set('destination', `${travelTimeDestination[0]},${travelTimeDestination[1]}`);
params.set('mode', travelTimeMode);
if (travelParam) {
params.set('travel', travelParam);
}
const res = await fetch(
apiUrl('hexagons', params),
@ -120,6 +129,7 @@ export function useMapData({
signal: abortControllerRef.current.signal,
})
);
assertOk(res, 'hexagons');
const json: ApiResponse = await res.json();
setRawData(json.features);
setPostcodeData([]);
@ -136,7 +146,7 @@ export function useMapData({
clearTimeout(debounceRef.current);
}
};
}, [resolution, bounds, filters, buildFilterParam, viewFeature, usePostcodeView, travelTimeEnabled, travelTimeDestination, travelTimeMode]);
}, [resolution, bounds, filters, buildFilterParam, viewFeature, usePostcodeView, travelParam]);
const data = dragData ?? rawData;
@ -159,7 +169,7 @@ export function useMapData({
if (lat < bounds.south || lat > bounds.north || lng < bounds.west || lng > bounds.east)
continue;
}
const val = feat.properties[`avg_${viewFeature}`] ?? feat.properties[`min_${viewFeature}`];
const val = feat.properties[`avg_${viewFeature}`];
if (typeof val === 'number' && !isNaN(val)) vals.push(val);
}
} else {
@ -170,7 +180,7 @@ export function useMapData({
if (lat < bounds.south || lat > bounds.north || lon < bounds.west || lon > bounds.east)
continue;
}
const val = item[`avg_${viewFeature}`] ?? item[`min_${viewFeature}`];
const val = item[`avg_${viewFeature}`];
if (typeof val === 'number' && !isNaN(val)) vals.push(val);
}
}
@ -197,26 +207,32 @@ export function useMapData({
return null;
}, [viewFeature, features, dataRange, activeFeature, dragValue]);
// Color range for travel time (computed from response data)
const travelTimeColorRange = useMemo((): [number, number] | null => {
if (!travelTimeEnabled || !travelTimeDestination) return null;
const vals: number[] = [];
for (const item of data) {
if (bounds) {
const { lat, lon } = item;
if (lat < bounds.south || lat > bounds.north || lon < bounds.west || lon > bounds.east)
continue;
// Color ranges for travel time per mode (computed from response data)
const travelTimeColorRanges = useMemo((): Partial<Record<TransportMode, [number, number]>> => {
const ranges: Partial<Record<TransportMode, [number, number]>> = {};
for (const mode of TRANSPORT_MODES) {
const entry = travelTimeEntries[mode];
if (!entry?.destination) continue;
const fieldName = `travel_time_${mode}`;
const vals: number[] = [];
for (const item of data) {
if (bounds) {
const { lat, lon } = item;
if (lat < bounds.south || lat > bounds.north || lon < bounds.west || lon > bounds.east)
continue;
}
const val = item[fieldName];
if (typeof val === 'number' && !isNaN(val)) vals.push(val);
}
const val = item.travel_time;
if (typeof val === 'number' && !isNaN(val)) vals.push(val);
if (vals.length === 0) continue;
vals.sort((a, b) => a - b);
ranges[mode] = [
percentile(vals, COLOR_RANGE_LOW_PERCENTILE),
percentile(vals, COLOR_RANGE_HIGH_PERCENTILE),
];
}
if (vals.length === 0) return null;
vals.sort((a, b) => a - b);
return [
percentile(vals, COLOR_RANGE_LOW_PERCENTILE),
percentile(vals, COLOR_RANGE_HIGH_PERCENTILE),
];
}, [travelTimeEnabled, travelTimeDestination, data, bounds]);
return ranges;
}, [travelTimeEntries, data, bounds]);
const handleViewChange = useCallback(
({
@ -257,7 +273,7 @@ export function useMapData({
currentView,
usePostcodeView,
colorRange,
travelTimeColorRange,
travelTimeColorRanges,
handleViewChange,
setInitialView,
};

View file

@ -1,67 +1,83 @@
import { useState, useCallback } from 'react';
import { useState, useCallback, useMemo } from 'react';
export type TransportMode = 'car' | 'bicycle' | 'walking' | 'transit';
export interface TravelTimeState {
enabled: boolean;
export const TRANSPORT_MODES: TransportMode[] = ['car', 'bicycle', 'walking', 'transit'];
export const MODE_LABELS: Record<TransportMode, string> = {
car: 'Car',
bicycle: 'Bicycle',
walking: 'Walking',
transit: 'Transit',
};
export interface TravelTimeEntry {
destination: [number, number] | null; // [lat, lon]
destinationLabel: string;
mode: TransportMode;
timeRange: [number, number] | null;
}
export type TravelTimeEntries = Partial<Record<TransportMode, TravelTimeEntry>>;
export interface TravelTimeInitial {
destination?: [number, number];
destinationLabel?: string;
mode?: TransportMode;
timeRange?: [number, number];
entries?: TravelTimeEntries;
}
export function useTravelTime(initial?: TravelTimeInitial) {
const [enabled, setEnabled] = useState(!!initial?.destination);
const [destination, setDestination] = useState<[number, number] | null>(
initial?.destination ?? null
);
const [destinationLabel, setDestinationLabel] = useState(initial?.destinationLabel ?? '');
const [mode, setMode] = useState<TransportMode>(initial?.mode ?? 'car');
const [timeRange, setTimeRange] = useState<[number, number] | null>(
initial?.timeRange ?? null
const [entries, setEntries] = useState<TravelTimeEntries>(initial?.entries ?? {});
const activeModes = useMemo(
() => TRANSPORT_MODES.filter((m) => m in entries),
[entries]
);
const handleEnable = useCallback(() => {
setEnabled(true);
const modesWithDestination = useMemo(
() => TRANSPORT_MODES.filter((m) => entries[m]?.destination != null),
[entries]
);
const handleEnableMode = useCallback((mode: TransportMode) => {
setEntries((prev) => ({
...prev,
[mode]: { destination: null, destinationLabel: '', timeRange: null },
}));
}, []);
const handleDisable = useCallback(() => {
setEnabled(false);
setDestination(null);
setDestinationLabel('');
setTimeRange(null);
const handleDisableMode = useCallback((mode: TransportMode) => {
setEntries((prev) => {
const next = { ...prev };
delete next[mode];
return next;
});
}, []);
const handleSetDestination = useCallback((lat: number, lon: number, label: string) => {
setDestination([lat, lon]);
setDestinationLabel(label);
}, []);
const handleSetDestination = useCallback(
(mode: TransportMode, lat: number, lon: number, label: string) => {
setEntries((prev) => ({
...prev,
[mode]: { ...prev[mode], destination: [lat, lon] as [number, number], destinationLabel: label },
}));
},
[]
);
const handleModeChange = useCallback((newMode: TransportMode) => {
setMode(newMode);
}, []);
const handleTimeRangeChange = useCallback((range: [number, number]) => {
setTimeRange(range);
}, []);
const handleTimeRangeChange = useCallback(
(mode: TransportMode, range: [number, number]) => {
setEntries((prev) => ({
...prev,
[mode]: { ...prev[mode], timeRange: range },
}));
},
[]
);
return {
enabled,
destination,
destinationLabel,
mode,
timeRange,
handleEnable,
handleDisable,
entries,
activeModes,
modesWithDestination,
handleEnableMode,
handleDisableMode,
handleSetDestination,
handleModeChange,
handleTimeRangeChange,
};
}

View file

@ -1,15 +1,7 @@
import { useEffect, useRef } from 'react';
import type { FeatureMeta, FeatureFilters } from '../types';
import { stateToParams } from '../lib/url-state';
import type { TransportMode } from './useTravelTime';
export interface TravelTimeUrlState {
enabled: boolean;
destination: [number, number] | null;
destinationLabel: string;
mode: TransportMode;
timeRange: [number, number] | null;
}
import type { TravelTimeEntries } from './useTravelTime';
const URL_DEBOUNCE_MS = 300;
@ -19,7 +11,7 @@ export function useUrlSync(
features: FeatureMeta[],
selectedPOICategories: Set<string>,
rightPaneTab: 'properties' | 'area',
travelTime?: TravelTimeUrlState
travelTimeEntries?: TravelTimeEntries
) {
const urlDebounceRef = useRef<ReturnType<typeof setTimeout> | null>(null);
@ -34,7 +26,7 @@ export function useUrlSync(
features,
selectedPOICategories,
rightPaneTab,
travelTime
travelTimeEntries
);
const search = params.toString();
const newUrl = search ? `${window.location.pathname}?${search}` : window.location.pathname;
@ -44,5 +36,5 @@ export function useUrlSync(
return () => {
if (urlDebounceRef.current) clearTimeout(urlDebounceRef.current);
};
}, [currentView, filters, features, selectedPOICategories, rightPaneTab, travelTime]);
}, [currentView, filters, features, selectedPOICategories, rightPaneTab, travelTimeEntries]);
}

View file

@ -10,6 +10,17 @@ export function logNonAbortError(label: string, error: unknown): void {
console.error(`${label}:`, error);
}
export function isAbortError(error: unknown): boolean {
return error instanceof Error && error.name === 'AbortError';
}
/** Throw if response is not 2xx. Call before `.json()`. */
export function assertOk(res: Response, label: string): void {
if (!res.ok) {
throw new Error(`${label}: HTTP ${res.status} ${res.statusText}`);
}
}
export function authHeaders(init?: RequestInit): RequestInit {
const headers: Record<string, string> = {};
if (pb.authStore.isValid && pb.authStore.token) {
@ -69,7 +80,8 @@ export function buildFilterString(filters: FeatureFilters, features: FeatureMeta
return `${name}:${(value as string[]).join('|')}`;
}
const [min, max] = value as [number, number];
return `${name}:${min}:${max}`;
const maxStr = meta?.absolute && max === meta.max ? 'inf' : String(max);
return `${name}:${min}:${maxStr}`;
})
.join(',');
}

View file

@ -1,5 +1,10 @@
import type { FeatureMeta, FeatureFilters, ViewState } from '../types';
import type { TransportMode, TravelTimeInitial } from '../hooks/useTravelTime';
import {
TRANSPORT_MODES,
type TransportMode,
type TravelTimeEntries,
type TravelTimeInitial,
} from '../hooks/useTravelTime';
function parseFilters(params: URLSearchParams): FeatureFilters | undefined {
const filterParams = params.getAll('filter');
@ -65,26 +70,33 @@ export function parseUrlState(): {
result.tab = tab;
}
// Travel time
const dest = params.get('dest');
if (dest) {
const parts = dest.split(',').map(Number);
if (parts.length === 2 && parts.every((n) => !isNaN(n))) {
const tt: TravelTimeInitial = {
destination: [parts[0], parts[1]],
destinationLabel: params.get('destLabel') || '',
mode: (params.get('tmode') as TransportMode) || 'car',
};
const ttRange = params.get('tt');
if (ttRange) {
const [min, max] = ttRange.split(':').map(Number);
if (!isNaN(min) && !isNaN(max)) {
tt.timeRange = [min, max];
// Travel time: per-mode params (tt_car=lat,lon ttl_car=label ttr_car=min:max)
const entries: TravelTimeEntries = {};
for (const mode of TRANSPORT_MODES) {
const dest = params.get(`tt_${mode}`);
if (dest) {
const parts = dest.split(',').map(Number);
if (parts.length === 2 && parts.every((n) => !isNaN(n))) {
const label = params.get(`ttl_${mode}`) || '';
let timeRange: [number, number] | null = null;
const rangeStr = params.get(`ttr_${mode}`);
if (rangeStr) {
const [min, max] = rangeStr.split(':').map(Number);
if (!isNaN(min) && !isNaN(max)) {
timeRange = [min, max];
}
}
entries[mode] = {
destination: [parts[0], parts[1]],
destinationLabel: label,
timeRange,
};
}
result.travelTime = tt;
}
}
if (Object.keys(entries).length > 0) {
result.travelTime = { entries };
}
return result;
}
@ -95,7 +107,7 @@ export function stateToParams(
features: FeatureMeta[],
selectedPOICategories: Set<string>,
rightPaneTab: 'properties' | 'area',
travelTime?: { enabled: boolean; destination: [number, number] | null; destinationLabel: string; mode: string; timeRange: [number, number] | null }
travelTimeEntries?: TravelTimeEntries
): URLSearchParams {
const params = new URLSearchParams();
@ -123,16 +135,18 @@ export function stateToParams(
params.set('tab', 'properties');
}
if (travelTime?.enabled && travelTime.destination) {
params.set('dest', `${travelTime.destination[0].toFixed(5)},${travelTime.destination[1].toFixed(5)}`);
if (travelTime.destinationLabel) {
params.set('destLabel', travelTime.destinationLabel);
}
if (travelTime.mode !== 'car') {
params.set('tmode', travelTime.mode);
}
if (travelTime.timeRange) {
params.set('tt', `${travelTime.timeRange[0]}:${travelTime.timeRange[1]}`);
// Travel time: per-mode params
if (travelTimeEntries) {
for (const mode of TRANSPORT_MODES) {
const entry = travelTimeEntries[mode];
if (!entry?.destination) continue;
params.set(`tt_${mode}`, `${entry.destination[0].toFixed(5)},${entry.destination[1].toFixed(5)}`);
if (entry.destinationLabel) {
params.set(`ttl_${mode}`, entry.destinationLabel);
}
if (entry.timeRange) {
params.set(`ttr_${mode}`, `${entry.timeRange[0]}:${entry.timeRange[1]}`);
}
}
}

View file

@ -17,6 +17,7 @@ export interface FeatureMeta {
prefix?: string;
suffix?: string;
raw?: boolean;
absolute?: boolean;
}
export interface FeatureGroup {
@ -104,6 +105,7 @@ export interface PlaceResult {
place_type: string;
lat: number;
lon: number;
city?: string;
}
export interface RenovationEvent {

View file

@ -1,6 +1,7 @@
"""Extract place=* nodes from OSM PBF → data/places.parquet.
"""Extract place=* nodes and railway stations from OSM PBF → data/places.parquet.
Extracts named place nodes (cities, towns, suburbs, etc.) for typeahead search.
Extracts named place nodes (cities, towns, suburbs, etc.) and railway stations
(tube, national rail, DLR, etc.) for typeahead search.
Reuses the same great-britain-latest.osm.pbf as pois.py.
"""
@ -18,13 +19,54 @@ PLACE_TYPES = {
"borough",
"town",
"suburb",
"quarter",
"neighbourhood",
"village",
"hamlet",
"locality",
"island",
"isolated_dwelling",
}
# Suffixes to strip from raw station names before appending the typed suffix.
_STATION_STRIP = (
" tube station",
" underground station",
" railway station",
" dlr station",
" overground station",
" tram stop",
" station",
)
def _station_display_name(name: str, tags: dict[str, str]) -> str:
"""Build a descriptive station name like 'Bank tube station'."""
station_tag = tags.get("station", "")
network = tags.get("network", "").lower()
if station_tag == "subway" or "underground" in network:
suffix = "tube station"
elif "docklands" in network or "dlr" in network:
suffix = "DLR station"
elif "overground" in network:
suffix = "overground station"
elif "elizabeth" in network:
suffix = "Elizabeth line station"
elif station_tag == "light_rail" or "tramlink" in network or "tram" in network:
suffix = "tram stop"
else:
suffix = "railway station"
# Strip any existing station suffix from the raw name
lower = name.lower()
for s in _STATION_STRIP:
if lower.endswith(s):
name = name[: len(name) - len(s)].rstrip()
break
return f"{name} {suffix}"
class PlaceHandler(osmium.SimpleHandler):
def __init__(self, progress: tqdm) -> None:
@ -32,6 +74,12 @@ class PlaceHandler(osmium.SimpleHandler):
self._progress = progress
self.places: list[dict] = []
def _add(self, name: str, place_type: str, lat: float, lon: float, population: int) -> None:
self.places.append(
{"name": name, "place_type": place_type, "lat": lat, "lon": lon, "population": population}
)
self._progress.set_postfix(places=f"{len(self.places):,}", refresh=False)
def node(self, n: osmium.osm.Node) -> None:
self._progress.update(1)
if not n.location.valid:
@ -39,16 +87,28 @@ class PlaceHandler(osmium.SimpleHandler):
lat, lon = n.location.lat, n.location.lon
if not (UK_BBOX_SOUTH <= lat <= UK_BBOX_NORTH and UK_BBOX_WEST <= lon <= UK_BBOX_EAST):
return
place_type = n.tags.get("place")
if place_type not in PLACE_TYPES:
return
name = n.tags.get("name:en", n.tags.get("name", ""))
if not name:
return
self.places.append(
{"name": name, "place_type": place_type, "lat": lat, "lon": lon}
)
self._progress.set_postfix(places=f"{len(self.places):,}", refresh=False)
pop_str = n.tags.get("population", "")
try:
population = int(pop_str)
except ValueError:
population = 0
# place=* nodes (cities, towns, suburbs, etc.)
place_type = n.tags.get("place")
if place_type in PLACE_TYPES:
self._add(name, place_type, lat, lon, population)
return
# railway=station nodes (tube, national rail, DLR, tram, etc.)
if n.tags.get("railway") == "station":
display_name = _station_display_name(name, dict(n.tags))
self._add(display_name, "station", lat, lon, population)
return
def main() -> None:
@ -73,7 +133,7 @@ def main() -> None:
else:
print(f"Using cached PBF: {pbf_file}")
print(f"Extracting place nodes: {sorted(PLACE_TYPES)}")
print(f"Extracting place nodes: {sorted(PLACE_TYPES)} + railway=station")
with tqdm(
unit=" elements",
unit_scale=True,

View file

@ -0,0 +1,121 @@
"""Shared utilities for price index, price estimate, and renovation premium scripts."""
import numpy as np
import polars as pl
CURRENT_YEAR = 2025
TERRACE_TYPES = [
"Mid-Terrace",
"End-Terrace",
"Enclosed Mid-Terrace",
"Enclosed End-Terrace",
"Terraced",
]
FLAT_TYPES = ["Flats/Maisonettes", "Flat", "Maisonette"]
TYPE_GROUPS = ["Detached", "Semi-Detached", "Terraced", "Flats", "Bungalow"]
SHRINKAGE_K = 50
def type_group_expr():
"""Polars expression: Property type -> type_group."""
return (
pl.when(pl.col("Property type").is_in(TERRACE_TYPES))
.then(pl.lit("Terraced"))
.when(pl.col("Property type").is_in(FLAT_TYPES))
.then(pl.lit("Flats"))
.when(pl.col("Property type") == "Bungalow")
.then(pl.lit("Bungalow"))
.when(pl.col("Property type").is_in(["Detached", "Semi-Detached"]))
.then(pl.col("Property type"))
.otherwise(pl.lit(None))
.alias("type_group")
)
def sector_expr():
"""Polars expression: Postcode -> sector (drop last 2 chars, strip)."""
return (
pl.col("Postcode")
.str.slice(0, pl.col("Postcode").str.len_chars() - 2)
.str.strip_chars()
.alias("sector")
)
def hierarchy_keys(sector: str) -> tuple[str, str]:
"""Return (district, area) for a sector string."""
district = sector.rsplit(" ", 1)[0] if " " in sector else sector
area = ""
for ch in district:
if ch.isalpha():
area += ch
else:
break
return district, area
AGE_BREAKS = [1900, 1930, 1950, 1967, 1983, 2000, 2010]
AGE_LABELS = [
"pre-1900",
"1900-1929",
"1930-1949",
"1950-1966",
"1967-1982",
"1983-1999",
"2000-2009",
"2010+",
]
HEDONIC_COLUMNS = [
"Last known price",
"Date of last transaction",
"Property type",
"Total floor area (sqm)",
"Postcode",
]
def age_band_expr():
"""Polars expression: Construction age (UInt16 year) → age band string."""
expr = pl.when(pl.col("Construction age").is_null()).then(pl.lit(None))
for i, brk in enumerate(AGE_BREAKS):
expr = expr.when(pl.col("Construction age") < brk).then(pl.lit(AGE_LABELS[i]))
return expr.otherwise(pl.lit(AGE_LABELS[-1])).alias("age_band")
NON_REF_TYPES = ["Terraced", "Semi-Detached", "Flats", "Bungalow"]
def build_hedonic_features(df: pl.DataFrame) -> np.ndarray:
"""Build hedonic feature matrix from a DataFrame with type_group column.
Columns (5 total): log(floor_area), 4 type dummies (ref: Detached).
Sector fixed effects do the heavy lifting additional property features
(EPC, rooms, age) add no predictive value after sector demeaning.
"""
fa = df["Total floor area (sqm)"].to_numpy().astype(np.float32)
log_fa = np.log(np.maximum(fa, 1.0)).reshape(-1, 1)
tg = df["type_group"].to_numpy()
parts = [log_fa]
for t in NON_REF_TYPES:
parts.append((tg == t).astype(np.float32).reshape(-1, 1))
return np.hstack(parts)
def extract_centroids(input_path) -> dict[str, tuple[float, float]]:
"""Compute mean lat/lon per postcode sector."""
print("Computing sector centroids...")
df = (
pl.scan_parquet(input_path)
.select("Postcode", "lat", "lon")
.filter(pl.col("Postcode").is_not_null(), pl.col("lat").is_not_null())
.with_columns(sector_expr())
.group_by("sector")
.agg(pl.col("lat").mean(), pl.col("lon").mean())
.collect()
)
centroids = {}
for row in df.iter_rows(named=True):
centroids[row["sector"]] = (row["lat"], row["lon"])
print(f" {len(centroids):,} sector centroids")
return centroids

View file

@ -0,0 +1,300 @@
"""Cross-Sectional Hedonic Model (Per-Type)
Trains separate OLS models per property type on recent sales (last 5 years)
with sector fixed effects via Frisch-Waugh-Lovell demeaning:
log(price) = beta_type * log(floor_area) + alpha_sector_type + epsilon
Each type gets its own floor area elasticity and sector intercepts, capturing
that detached houses (beta=0.74) have higher price sensitivity to size than
terraced houses (beta=0.60), and a sector's value differs by property type.
Sector intercepts are hierarchically shrunk (sector district area national)
and spatially smoothed via KD-tree nearest neighbors.
Output: hedonic_model.json with per-type betas and sector intercepts.
"""
import argparse
import json
from pathlib import Path
import numpy as np
import polars as pl
from scipy.spatial import KDTree
from pipeline.transform._price_utils import (
CURRENT_YEAR,
HEDONIC_COLUMNS,
SHRINKAGE_K,
TYPE_GROUPS,
extract_centroids,
hierarchy_keys,
sector_expr,
type_group_expr,
)
TRAINING_YEARS = 5
SPATIAL_NEIGHBORS = 5
SPATIAL_BLEND_K = 30
def load_training_data(input_path: Path) -> pl.DataFrame:
"""Load recent sales with complete hedonic features."""
min_year = CURRENT_YEAR - TRAINING_YEARS
print(f"Loading training data (sales {min_year}-{CURRENT_YEAR})...")
df = (
pl.scan_parquet(input_path)
.select(*HEDONIC_COLUMNS)
.filter(
pl.col("Last known price").is_not_null(),
pl.col("Total floor area (sqm)").is_not_null(),
pl.col("Total floor area (sqm)") > 0,
pl.col("Postcode").is_not_null(),
)
.with_columns(
pl.col("Date of last transaction").dt.year().alias("sale_year"),
type_group_expr(),
sector_expr(),
)
.filter(
pl.col("type_group").is_not_null(),
pl.col("sale_year").is_not_null(),
pl.col("sale_year") >= min_year,
pl.col("sale_year") <= CURRENT_YEAR,
)
.collect()
)
print(f" {len(df):,} complete cases")
return df
def train_type_model(
df: pl.DataFrame, type_group: str
) -> tuple[float, dict[str, float], dict[str, int], float]:
"""Train hedonic model for a single property type.
Returns (beta_fa, sector_intercepts, sector_counts, national_intercept).
"""
t_df = df.filter(pl.col("type_group") == type_group)
y = np.log(t_df["Last known price"].to_numpy().astype(np.float64))
log_fa = np.log(
np.maximum(t_df["Total floor area (sqm)"].to_numpy().astype(np.float64), 1.0)
)
X = log_fa.reshape(-1, 1)
sectors = t_df["sector"].to_list()
# Group by sector for demeaning
sector_indices: dict[str, list[int]] = {}
for i, s in enumerate(sectors):
sector_indices.setdefault(s, []).append(i)
# Compute sector means and demean
X_demeaned = np.empty_like(X)
y_demeaned = np.empty_like(y)
sector_X_means: dict[str, np.ndarray] = {}
sector_y_means: dict[str, float] = {}
sector_counts: dict[str, int] = {}
for s, idxs in sector_indices.items():
idx = np.array(idxs)
X_mean = X[idx].mean(axis=0)
y_mean = y[idx].mean()
sector_X_means[s] = X_mean
sector_y_means[s] = y_mean
X_demeaned[idx] = X[idx] - X_mean
y_demeaned[idx] = y[idx] - y_mean
sector_counts[s] = len(idxs)
# OLS on demeaned data
beta = np.linalg.lstsq(X_demeaned, y_demeaned, rcond=None)[0]
beta_fa = float(beta[0])
# Recover sector intercepts
sector_intercepts = {}
for s in sector_indices:
sector_intercepts[s] = float(sector_y_means[s] - beta_fa * sector_X_means[s][0])
national_intercept = float(np.mean(list(sector_intercepts.values())))
# R-squared
y_pred = X[:, 0] * beta_fa
for i, s in enumerate(sectors):
y_pred[i] += sector_intercepts[s]
ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - y.mean()) ** 2)
r2 = 1 - ss_res / ss_tot
print(
f" {type_group:<15s}: n={len(t_df):>9,} β_fa={beta_fa:.4f} "
f"R²={r2:.4f} sectors={len(sector_intercepts):,}"
)
return beta_fa, sector_intercepts, sector_counts, national_intercept
def shrink_intercepts(
sector_intercepts: dict[str, float],
sector_counts: dict[str, int],
) -> dict[str, float]:
"""Hierarchical shrinkage: sector -> district -> area -> national."""
national = float(np.mean(list(sector_intercepts.values())))
sector_to_dist: dict[str, str] = {}
dist_to_area: dict[str, str] = {}
for s in sector_intercepts:
d, a = hierarchy_keys(s)
sector_to_dist[s] = d
dist_to_area[d] = a
# Area-level intercepts (weighted mean of sectors in area)
area_vals: dict[str, list[tuple[float, int]]] = {}
for s, val in sector_intercepts.items():
d = sector_to_dist[s]
a = dist_to_area[d]
area_vals.setdefault(a, []).append((val, sector_counts.get(s, 0)))
area_intercepts: dict[str, float] = {}
area_counts: dict[str, int] = {}
for a, entries in area_vals.items():
total_n = sum(n for _, n in entries)
if total_n > 0:
area_intercepts[a] = sum(v * n for v, n in entries) / total_n
else:
area_intercepts[a] = sum(v for v, _ in entries) / len(entries)
area_counts[a] = total_n
# District-level intercepts
dist_vals: dict[str, list[tuple[float, int]]] = {}
for s, val in sector_intercepts.items():
d = sector_to_dist[s]
dist_vals.setdefault(d, []).append((val, sector_counts.get(s, 0)))
dist_intercepts: dict[str, float] = {}
dist_counts: dict[str, int] = {}
for d, entries in dist_vals.items():
total_n = sum(n for _, n in entries)
if total_n > 0:
dist_intercepts[d] = sum(v * n for v, n in entries) / total_n
else:
dist_intercepts[d] = sum(v for v, _ in entries) / len(entries)
dist_counts[d] = total_n
# Shrink: area -> national
area_shrunk: dict[str, float] = {}
for a, val in area_intercepts.items():
n = area_counts[a]
w = n / (n + SHRINKAGE_K)
area_shrunk[a] = w * val + (1 - w) * national
# Shrink: district -> area
dist_shrunk: dict[str, float] = {}
for d, val in dist_intercepts.items():
a = dist_to_area[d]
parent = area_shrunk.get(a, national)
n = dist_counts[d]
w = n / (n + SHRINKAGE_K)
dist_shrunk[d] = w * val + (1 - w) * parent
# Shrink: sector -> district
result: dict[str, float] = {}
for s, val in sector_intercepts.items():
d = sector_to_dist[s]
parent = dist_shrunk.get(d, national)
n = sector_counts.get(s, 0)
w = n / (n + SHRINKAGE_K)
result[s] = w * val + (1 - w) * parent
return result
def spatial_smooth_intercepts(
sector_intercepts: dict[str, float],
centroids: dict[str, tuple[float, float]],
sector_counts: dict[str, int],
) -> dict[str, float]:
"""Blend sparse sector intercepts with K nearest neighbors."""
sectors_with_coords = [s for s in sector_intercepts if s in centroids]
if len(sectors_with_coords) < SPATIAL_NEIGHBORS + 1:
return sector_intercepts
coords = np.array([centroids[s] for s in sectors_with_coords])
mean_lat = np.mean(coords[:, 0])
scale = np.cos(np.radians(mean_lat))
scaled_coords = np.column_stack([coords[:, 0], coords[:, 1] * scale])
tree = KDTree(scaled_coords)
result = dict(sector_intercepts)
for i, sec in enumerate(sectors_with_coords):
n = sector_counts.get(sec, 0)
self_w = n / (n + SPATIAL_BLEND_K)
if self_w > 0.95:
continue
dists, idxs = tree.query(scaled_coords[i], k=SPATIAL_NEIGHBORS + 1)
neighbor_dists = dists[1:]
neighbor_idxs = idxs[1:]
inv_dists = []
neighbor_vals = []
for d, j in zip(neighbor_dists, neighbor_idxs):
ns = sectors_with_coords[j]
if d > 0 and ns in sector_intercepts:
inv_dists.append(1.0 / d)
neighbor_vals.append(sector_intercepts[ns])
if not neighbor_vals:
continue
total_inv = sum(inv_dists)
nbr_w = 1.0 - self_w
blended = self_w * sector_intercepts[sec]
for val, iw in zip(neighbor_vals, inv_dists):
blended += nbr_w * (iw / total_inv) * val
result[sec] = blended
return result
def main():
parser = argparse.ArgumentParser(description="Train cross-sectional hedonic model")
parser.add_argument(
"--input", type=Path, required=True, help="Path to wide.parquet"
)
parser.add_argument(
"--output", type=Path, required=True, help="Output hedonic_model.json"
)
args = parser.parse_args()
df = load_training_data(args.input)
centroids = extract_centroids(args.input)
print("\nTraining per-type models...")
type_models = {}
total_sectors = 0
for tg in TYPE_GROUPS:
beta_fa, raw_intercepts, sector_counts, national = train_type_model(df, tg)
shrunk = shrink_intercepts(raw_intercepts, sector_counts)
smoothed = spatial_smooth_intercepts(shrunk, centroids, sector_counts)
total_sectors += len(smoothed)
type_models[tg] = {
"beta_fa": beta_fa,
"sector_intercepts": smoothed,
"national_intercept": national,
}
# Output
args.output.parent.mkdir(parents=True, exist_ok=True)
with open(args.output, "w") as f:
json.dump({"type_models": type_models}, f, indent=2)
size_kb = args.output.stat().st_size / 1024
print(f"\nWrote {args.output} ({size_kb:.0f} KB)")
print(f" {len(TYPE_GROUPS)} type models, {total_sectors:,} total sector intercepts")
if __name__ == "__main__":
main()

View file

@ -223,7 +223,6 @@ def _build_wide(
)
.drop(
"inspection_date",
"floor_height",
"_bedrooms",
"LSOA name (2021)",
"Local Authority District code (2024)",
@ -276,6 +275,7 @@ def _build_wide(
"shrink_swell_risk": "Shrink-swell risk",
"soluble_rocks_risk": "Soluble rocks risk",
"median_monthly_rent": "Estimated monthly rent",
"floor_height": "Interior height (m)",
}
)
)

View file

@ -9,45 +9,60 @@ Output: backtest_results.parquet with predictions vs actuals.
"""
import argparse
import json
from pathlib import Path
import numpy as np
import polars as pl
CURRENT_YEAR = 2025
from pipeline.transform._price_utils import (
CURRENT_YEAR,
HEDONIC_COLUMNS,
sector_expr,
type_group_expr,
)
TEST_YEAR_MIN = 2022
TERRACE_TYPES = ["Mid-Terrace", "End-Terrace", "Enclosed Mid-Terrace", "Enclosed End-Terrace"]
def type_group_expr():
return (
pl.when(pl.col("Property type").is_in(TERRACE_TYPES)).then(pl.lit("Terraced"))
.when(pl.col("Property type") == "Flats/Maisonettes").then(pl.lit("Flats"))
.when(pl.col("Property type").is_in(["Detached", "Semi-Detached"])).then(pl.col("Property type"))
.otherwise(pl.lit(None))
.alias("type_group")
)
def extract_test_set(input_path: Path) -> pl.DataFrame:
def extract_test_set(
input_path: Path, include_hedonic_cols: bool = False
) -> pl.DataFrame:
"""Extract test pairs: second-to-last sale as input, last sale as ground truth."""
print("Loading test set...")
cols = ["Postcode", "historical_prices", "Property type"]
if include_hedonic_cols:
for c in HEDONIC_COLUMNS:
if c not in cols:
cols.append(c)
df = (
pl.scan_parquet(input_path)
.select("Postcode", "historical_prices", "Property type")
.select(cols)
.filter(
pl.col("Postcode").is_not_null(),
pl.col("historical_prices").list.len() >= 2,
)
.with_columns(
pl.col("Postcode").str.slice(0, pl.col("Postcode").str.len_chars() - 2).str.strip_chars().alias("sector"),
sector_expr(),
type_group_expr(),
# Last sale (ground truth)
pl.col("historical_prices").list.last().struct.field("year").alias("actual_year"),
pl.col("historical_prices").list.last().struct.field("price").alias("actual_price"),
pl.col("historical_prices")
.list.last()
.struct.field("year")
.alias("actual_year"),
pl.col("historical_prices")
.list.last()
.struct.field("price")
.alias("actual_price"),
# Second-to-last sale (input)
pl.col("historical_prices").list.get(-2).struct.field("year").alias("input_year"),
pl.col("historical_prices").list.get(-2).struct.field("price").alias("input_price"),
pl.col("historical_prices")
.list.get(-2)
.struct.field("year")
.alias("input_year"),
pl.col("historical_prices")
.list.get(-2)
.struct.field("price")
.alias("input_price"),
)
.filter(
pl.col("actual_year") >= TEST_YEAR_MIN,
@ -71,7 +86,9 @@ def predict(test: pl.DataFrame, index: pl.DataFrame) -> pl.DataFrame:
# Join type-specific index at input year
test = test.join(
idx_typed.select("sector", "type_group", "year", pl.col("log_index").alias("li_in_typed")),
idx_typed.select(
"sector", "type_group", "year", pl.col("log_index").alias("li_in_typed")
),
left_on=["sector", "type_group", "input_year"],
right_on=["sector", "type_group", "year"],
how="left",
@ -85,7 +102,12 @@ def predict(test: pl.DataFrame, index: pl.DataFrame) -> pl.DataFrame:
)
# Join type-specific index at actual year
test = test.join(
idx_typed.select("sector", "type_group", "year", pl.col("log_index").alias("li_act_typed")),
idx_typed.select(
"sector",
"type_group",
"year",
pl.col("log_index").alias("li_act_typed"),
),
left_on=["sector", "type_group", "actual_year"],
right_on=["sector", "type_group", "year"],
how="left",
@ -99,19 +121,27 @@ def predict(test: pl.DataFrame, index: pl.DataFrame) -> pl.DataFrame:
)
test = test.with_columns(
pl.col("li_in_typed").fill_null(pl.col("li_in_all")).alias("log_index_input"),
pl.col("li_act_typed").fill_null(pl.col("li_act_all")).alias("log_index_actual"),
pl.col("li_in_typed")
.fill_null(pl.col("li_in_all"))
.alias("log_index_input"),
pl.col("li_act_typed")
.fill_null(pl.col("li_act_all"))
.alias("log_index_actual"),
)
else:
# Unstratified index
test = test.join(
index.select("sector", "year", pl.col("log_index").alias("log_index_input")),
index.select(
"sector", "year", pl.col("log_index").alias("log_index_input")
),
left_on=["sector", "input_year"],
right_on=["sector", "year"],
how="left",
)
test = test.join(
index.select("sector", "year", pl.col("log_index").alias("log_index_actual")),
index.select(
"sector", "year", pl.col("log_index").alias("log_index_actual")
),
left_on=["sector", "actual_year"],
right_on=["sector", "year"],
how="left",
@ -121,7 +151,9 @@ def predict(test: pl.DataFrame, index: pl.DataFrame) -> pl.DataFrame:
(
pl.col("input_price").cast(pl.Float64)
* (pl.col("log_index_actual") - pl.col("log_index_input")).exp()
).fill_null(pl.col("input_price").cast(pl.Float64)).alias("predicted"),
)
.fill_null(pl.col("input_price").cast(pl.Float64))
.alias("predicted"),
)
return test
@ -150,7 +182,15 @@ def print_metrics_table(metrics_by_stage: dict):
print("BACKTEST RESULTS")
print("=" * 55)
metric_names = ["MdAPE (%)", "% within 10%", "% within 20%", "% within 30%", "MAE (£)", "Mean signed error (£)", "n"]
metric_names = [
"MdAPE (%)",
"% within 10%",
"% within 20%",
"% within 30%",
"MAE (£)",
"Mean signed error (£)",
"n",
]
stages = list(metrics_by_stage.keys())
header = f"{'Metric':<25s}"
@ -176,20 +216,37 @@ def print_metrics_table(metrics_by_stage: dict):
def main():
parser = argparse.ArgumentParser(description="Backtest price estimation model")
parser.add_argument("--input", type=Path, required=True, help="Path to wide.parquet")
parser.add_argument("--index", type=Path, required=True, help="Path to price_index.parquet")
parser.add_argument("--output", type=Path, required=True, help="Output backtest_results.parquet")
parser.add_argument(
"--input", type=Path, required=True, help="Path to wide.parquet"
)
parser.add_argument(
"--index", type=Path, required=True, help="Path to price_index.parquet"
)
parser.add_argument(
"--output", type=Path, required=True, help="Output backtest_results.parquet"
)
parser.add_argument(
"--hedonic-model",
type=Path,
default=None,
help="Path to hedonic_model.json (optional)",
)
args = parser.parse_args()
index = pl.read_parquet(args.index)
has_type_group = "type_group" in index.columns
if has_type_group:
print(f"Price index: {len(index):,} rows, {index['sector'].n_unique():,} sectors, "
f"{index['type_group'].n_unique()} type groups")
print(
f"Price index: {len(index):,} rows, {index['sector'].n_unique():,} sectors, "
f"{index['type_group'].n_unique()} type groups"
)
else:
print(f"Price index: {len(index):,} rows, {index['sector'].n_unique():,} sectors")
print(
f"Price index: {len(index):,} rows, {index['sector'].n_unique():,} sectors"
)
test = extract_test_set(args.input)
has_hedonic = args.hedonic_model is not None
test = extract_test_set(args.input, include_hedonic_cols=has_hedonic)
print("\nPredicting with price index...")
test = predict(test, index)
@ -197,19 +254,126 @@ def main():
# Compute and print metrics
actual = test["actual_price"].to_numpy().astype(np.float64)
metrics = {
"Naive": compute_metrics(actual, test["input_price"].to_numpy().astype(np.float64)),
"Index": compute_metrics(actual, test["predicted"].to_numpy().astype(np.float64)),
"Naive": compute_metrics(
actual, test["input_price"].to_numpy().astype(np.float64)
),
"Index": compute_metrics(
actual, test["predicted"].to_numpy().astype(np.float64)
),
}
# Hedonic blending
if has_hedonic:
print("\nApplying hedonic blending...")
with open(args.hedonic_model) as f:
model = json.load(f)
type_models = model["type_models"]
# Identify eligible rows for hedonic estimate
hedonic_mask = (
pl.col("Total floor area (sqm)").is_not_null()
& (pl.col("Total floor area (sqm)") > 0)
& pl.col("type_group").is_not_null()
)
eligible_mask = test.select(hedonic_mask).to_series()
eligible = test.filter(eligible_mask)
if len(eligible) > 0:
log_fa = np.log(
np.maximum(
eligible["Total floor area (sqm)"].to_numpy().astype(np.float64),
1.0,
)
)
sectors = eligible["sector"].to_list()
types = eligible["type_group"].to_list()
# Per-type hedonic prediction
log_hedonic = np.empty(len(eligible))
for i in range(len(eligible)):
tm = type_models.get(types[i])
if tm is None:
log_hedonic[i] = np.nan
continue
alpha = tm["sector_intercepts"].get(
sectors[i], tm["national_intercept"]
)
log_hedonic[i] = tm["beta_fa"] * log_fa[i] + alpha
valid = np.isfinite(log_hedonic)
# Hold years: input_year to actual_year (simulating real prediction)
input_years = eligible["input_year"].to_numpy().astype(np.float64)
actual_years = eligible["actual_year"].to_numpy().astype(np.float64)
hold_years = np.maximum(actual_years - input_years, 0.0)
log_index_pred = np.log(
np.maximum(eligible["predicted"].to_numpy().astype(np.float64), 1.0)
)
# Sweep tau values (only on valid hedonic rows)
tau_values = [5.0, 10.0, 15.0, 20.0, 30.0]
actual_eligible = eligible["actual_price"].to_numpy().astype(np.float64)
best_tau = 15.0
best_mdape = float("inf")
print(f"\n tau sweep ({valid.sum():,} eligible properties):")
for tau in tau_values:
blend_w = hold_years / (hold_years + tau)
log_blended = np.where(
valid,
(1 - blend_w) * log_index_pred + blend_w * log_hedonic,
log_index_pred,
)
blended = np.exp(log_blended)
m = compute_metrics(actual_eligible, blended)
marker = ""
if m["MdAPE (%)"] < best_mdape:
best_mdape = m["MdAPE (%)"]
best_tau = tau
marker = " <-- best"
print(
f" tau={tau:>4.0f}: MdAPE={m['MdAPE (%)']:>5.1f}%, "
f"within 10%={m['% within 10%']:>5.1f}%{marker}"
)
print(f"\n Best tau = {best_tau}")
# Compute blended predictions with best tau for full test set
blend_w = hold_years / (hold_years + best_tau)
log_blended = np.where(
valid,
(1 - blend_w) * log_index_pred + blend_w * log_hedonic,
log_index_pred,
)
blended_eligible = np.exp(log_blended)
# Merge back: for non-eligible rows, use index prediction
blended_all = test["predicted"].to_numpy().astype(np.float64).copy()
eligible_indices = eligible_mask.arg_true()
for i, idx in enumerate(eligible_indices):
blended_all[idx] = blended_eligible[i]
test = test.with_columns(
pl.Series("blended", blended_all, dtype=pl.Float64),
)
metrics["Blended"] = compute_metrics(actual, blended_all)
print_metrics_table(metrics)
# Save results
result = test.select(
"Postcode", "sector",
"input_year", "input_price",
"actual_year", "actual_price",
result_cols = [
"Postcode",
"sector",
"input_year",
"input_price",
"actual_year",
"actual_price",
"predicted",
)
]
if "blended" in test.columns:
result_cols.append("blended")
result = test.select(result_cols)
result.write_parquet(args.output)
size_mb = args.output.stat().st_size / (1024 * 1024)

View file

@ -4,32 +4,56 @@ Joins the precomputed repeat-sales price index (from price_index.py) with each
property's last known sale to produce an inflation-adjusted current price estimate.
Uses type-stratified index when available, falling back to "All" type.
Optionally applies renovation premiums from renovation_premium.py: for properties
with post-sale renovation events, the estimated price is adjusted upward based on
data-driven per-area premiums with time decay.
Modifies wide.parquet in-place, adding the "Estimated current price" column.
"""
import argparse
import json
import math
from pathlib import Path
import numpy as np
import polars as pl
CURRENT_YEAR = 2025
TERRACE_TYPES = ["Mid-Terrace", "End-Terrace", "Enclosed Mid-Terrace", "Enclosed End-Terrace"]
from pipeline.transform._price_utils import (
CURRENT_YEAR,
sector_expr,
type_group_expr,
)
def type_group_expr():
return (
pl.when(pl.col("Property type").is_in(TERRACE_TYPES)).then(pl.lit("Terraced"))
.when(pl.col("Property type") == "Flats/Maisonettes").then(pl.lit("Flats"))
.when(pl.col("Property type").is_in(["Detached", "Semi-Detached"])).then(pl.col("Property type"))
.otherwise(pl.lit(None))
.alias("type_group")
)
HALF_LIFE = 10.0
DECAY_RATE = math.log(2) / HALF_LIFE
def main():
parser = argparse.ArgumentParser(description="Augment wide.parquet with estimated current prices")
parser.add_argument("--input", type=Path, required=True, help="Path to wide.parquet (modified in-place)")
parser.add_argument("--index", type=Path, required=True, help="Path to price_index.parquet")
parser = argparse.ArgumentParser(
description="Augment wide.parquet with estimated current prices"
)
parser.add_argument(
"--input",
type=Path,
required=True,
help="Path to wide.parquet (modified in-place)",
)
parser.add_argument(
"--index", type=Path, required=True, help="Path to price_index.parquet"
)
parser.add_argument(
"--renovation-premium",
type=Path,
default=None,
help="Path to renovation_premium.parquet (optional)",
)
parser.add_argument(
"--hedonic-model",
type=Path,
default=None,
help="Path to hedonic_model.json (optional)",
)
args = parser.parse_args()
print("Loading wide.parquet...")
@ -49,7 +73,7 @@ def main():
)
df = df.with_columns(
pl.col("Postcode").str.slice(0, pl.col("Postcode").str.len_chars() - 2).str.strip_chars().alias("_sector"),
sector_expr().alias("_sector"),
pl.col("Date of last transaction").dt.year().alias("_sale_year"),
type_group_expr().alias("_type_group"),
)
@ -57,10 +81,14 @@ def main():
index = pl.read_parquet(args.index)
has_type_group = "type_group" in index.columns
if has_type_group:
print(f" Price index: {len(index):,} rows, {index['sector'].n_unique():,} sectors, "
f"{index['type_group'].n_unique()} type groups")
print(
f" Price index: {len(index):,} rows, {index['sector'].n_unique():,} sectors, "
f"{index['type_group'].n_unique()} type groups"
)
else:
print(f" Price index: {len(index):,} rows, {index['sector'].n_unique():,} sectors (unstratified)")
print(
f" Price index: {len(index):,} rows, {index['sector'].n_unique():,} sectors (unstratified)"
)
print("\nApplying repeat-sales index...")
@ -70,49 +98,63 @@ def main():
# Join type-specific index at sale year
df = df.join(
idx_typed.select("sector", "type_group", "year", pl.col("log_index").alias("log_idx_sale_typed")),
idx_typed.select(
"sector",
"type_group",
"year",
pl.col("log_index").alias("log_idx_sale_typed"),
),
left_on=["_sector", "_type_group", "_sale_year"],
right_on=["sector", "type_group", "year"],
how="left",
)
# Join "All" index at sale year
df = df.join(
idx_all.select("sector", "year", pl.col("log_index").alias("log_idx_sale_all")),
idx_all.select(
"sector", "year", pl.col("log_index").alias("log_idx_sale_all")
),
left_on=["_sector", "_sale_year"],
right_on=["sector", "year"],
how="left",
)
# Join type-specific index at current year
df = df.join(
idx_typed.filter(pl.col("year") == CURRENT_YEAR)
.select("sector", "type_group", pl.col("log_index").alias("log_idx_cur_typed")),
idx_typed.filter(pl.col("year") == CURRENT_YEAR).select(
"sector", "type_group", pl.col("log_index").alias("log_idx_cur_typed")
),
left_on=["_sector", "_type_group"],
right_on=["sector", "type_group"],
how="left",
)
# Join "All" index at current year
df = df.join(
idx_all.filter(pl.col("year") == CURRENT_YEAR)
.select("sector", pl.col("log_index").alias("log_idx_cur_all")),
idx_all.filter(pl.col("year") == CURRENT_YEAR).select(
"sector", pl.col("log_index").alias("log_idx_cur_all")
),
left_on="_sector",
right_on="sector",
how="left",
)
df = df.with_columns(
pl.col("log_idx_sale_typed").fill_null(pl.col("log_idx_sale_all")).alias("_log_index_sale"),
pl.col("log_idx_cur_typed").fill_null(pl.col("log_idx_cur_all")).alias("_log_index_current"),
pl.col("log_idx_sale_typed")
.fill_null(pl.col("log_idx_sale_all"))
.alias("_log_index_sale"),
pl.col("log_idx_cur_typed")
.fill_null(pl.col("log_idx_cur_all"))
.alias("_log_index_current"),
)
else:
df = df.join(
index.select("sector", "year", pl.col("log_index").alias("_log_index_sale")),
index.select(
"sector", "year", pl.col("log_index").alias("_log_index_sale")
),
left_on=["_sector", "_sale_year"],
right_on=["sector", "year"],
how="left",
)
index_current = (
index.filter(pl.col("year") == CURRENT_YEAR)
.select("sector", pl.col("log_index").alias("_log_index_current"))
index_current = index.filter(pl.col("year") == CURRENT_YEAR).select(
"sector", pl.col("log_index").alias("_log_index_current")
)
df = df.join(index_current, left_on="_sector", right_on="sector", how="left")
@ -127,6 +169,224 @@ def main():
.alias("Estimated current price"),
)
n_adjusted = df.filter(has_price & pl.col("_log_index_sale").is_not_null()).height
n_with_price = df.filter(has_price).height
print(
f" {n_adjusted:,} of {n_with_price:,} properties adjusted by index ({n_adjusted / max(n_with_price, 1) * 100:.1f}%)"
)
# Apply hedonic blending if model provided
if args.hedonic_model is not None:
print("\nApplying hedonic blending...")
with open(args.hedonic_model) as f:
model = json.load(f)
type_models = model["type_models"]
tau = model.get("tau", 15.0)
print(f" tau = {tau}, {len(type_models)} type models")
# Add type_group for per-type lookup
df = df.with_columns(type_group_expr())
hedonic_mask = (
has_price
& pl.col("Estimated current price").is_not_null()
& pl.col("Total floor area (sqm)").is_not_null()
& (pl.col("Total floor area (sqm)") > 0)
& pl.col("type_group").is_not_null()
)
eligible = df.filter(hedonic_mask)
if len(eligible) > 0:
log_fa = np.log(
np.maximum(
eligible["Total floor area (sqm)"].to_numpy().astype(np.float64),
1.0,
)
)
sectors = eligible["_sector"].to_list()
types = eligible["type_group"].to_list()
# Per-type hedonic prediction
log_hedonic = np.empty(len(eligible))
for i in range(len(eligible)):
tm = type_models.get(types[i])
if tm is None:
log_hedonic[i] = np.nan
continue
alpha = tm["sector_intercepts"].get(
sectors[i], tm["national_intercept"]
)
log_hedonic[i] = tm["beta_fa"] * log_fa[i] + alpha
valid = np.isfinite(log_hedonic)
# Hold years and blend weight
sale_years = eligible["_sale_year"].to_numpy().astype(np.float64)
hold_years = np.maximum(CURRENT_YEAR - sale_years, 0.0)
blend_w = hold_years / (hold_years + tau)
# Blend in log space
log_index_est = np.log(
eligible["Estimated current price"].to_numpy().astype(np.float64)
)
log_blended = np.where(
valid,
(1 - blend_w) * log_index_est + blend_w * log_hedonic,
log_index_est,
)
blended_prices = np.exp(log_blended)
# Write back into df
eligible_indices = df.select(hedonic_mask).to_series().arg_true()
price_arr = df["Estimated current price"].to_numpy().astype(np.float64)
for i, idx in enumerate(eligible_indices):
price_arr[idx] = blended_prices[i]
df = df.with_columns(
pl.Series("Estimated current price", price_arr, dtype=pl.Float64),
)
n_blended = int(valid.sum())
avg_w = float(np.mean(blend_w[valid]))
print(
f" {n_blended:,} properties with hedonic blending (avg blend weight: {avg_w:.3f})"
)
else:
print(" No eligible properties for hedonic blending")
# Apply renovation premiums if provided
if args.renovation_premium is not None:
print("\nApplying renovation premiums...")
reno_prem = pl.read_parquet(args.renovation_premium)
print(f" Loaded {len(reno_prem):,} premium rows")
# Find properties with post-sale renovation events
has_reno = (
pl.col("renovation_history").is_not_null()
& (pl.col("renovation_history").list.len() > 0)
& pl.col("Estimated current price").is_not_null()
)
# Explode renovation events, filter to post-sale only
reno_rows = (
df.lazy()
.filter(has_reno)
.select("_sector", "_type_group", "_sale_year", "renovation_history")
.with_row_index("_row_idx")
.explode("renovation_history")
.with_columns(
pl.col("renovation_history").struct.field("year").alias("_event_year"),
pl.col("renovation_history").struct.field("event").alias("_event_type"),
)
.filter(pl.col("_event_year") > pl.col("_sale_year"))
.collect()
)
if len(reno_rows) > 0:
# Take most recent event per (row, event_type)
latest = (
reno_rows.lazy()
.group_by("_row_idx", "_event_type", "_sector", "_type_group")
.agg(pl.col("_event_year").max().alias("_event_year"))
.collect()
)
# Compute time-decayed premium
latest = latest.with_columns(
(-DECAY_RATE * (CURRENT_YEAR - pl.col("_event_year")).cast(pl.Float64))
.exp()
.alias("_decay"),
)
# Join with renovation_premium.parquet — try typed first, fall back to "All"
rp_typed = reno_prem.filter(pl.col("type_group") != "All")
rp_all = reno_prem.filter(pl.col("type_group") == "All")
latest = (
latest.join(
rp_typed.select(
"sector",
"type_group",
"event_type",
pl.col("log_premium").alias("_lp_typed"),
),
left_on=["_sector", "_type_group", "_event_type"],
right_on=["sector", "type_group", "event_type"],
how="left",
)
.join(
rp_all.select(
"sector", "event_type", pl.col("log_premium").alias("_lp_all")
),
left_on=["_sector", "_event_type"],
right_on=["sector", "event_type"],
how="left",
)
.with_columns(
pl.col("_lp_typed")
.fill_null(pl.col("_lp_all"))
.fill_null(0.0)
.alias("_log_premium"),
)
)
# Compute total decayed log premium per property
per_property = (
latest.lazy()
.with_columns(
(pl.col("_log_premium") * pl.col("_decay")).alias("_decayed_lp"),
)
.group_by("_row_idx")
.agg(pl.col("_decayed_lp").sum().alias("_reno_log_premium"))
.collect()
)
# We need to map _row_idx back to the main df. Re-derive the row indices.
# _row_idx was generated from filtered rows — we need the actual df row indices.
reno_mask = df.select(has_reno).to_series()
actual_indices = reno_mask.arg_true()
# Build a mapping: _row_idx -> actual df row
idx_map = per_property.with_columns(
pl.col("_row_idx")
.map_elements(
lambda i: int(actual_indices[i]),
return_dtype=pl.UInt32,
)
.alias("_df_row"),
)
# Create a full-length column of zeros, then fill in premium values
reno_log_prem = [0.0] * len(df)
for row in idx_map.iter_rows(named=True):
reno_log_prem[row["_df_row"]] = row["_reno_log_premium"]
df = df.with_columns(
pl.Series("_reno_log_premium", reno_log_prem, dtype=pl.Float64),
)
# Apply: multiply estimated price by exp(reno_log_premium) where premium > 0
df = df.with_columns(
pl.when(pl.col("_reno_log_premium") != 0.0)
.then(
pl.col("Estimated current price")
* pl.col("_reno_log_premium").exp()
)
.otherwise(pl.col("Estimated current price"))
.alias("Estimated current price"),
)
n_with_premium = idx_map.height
avg_multiplier = math.exp(
per_property["_reno_log_premium"]
.filter(per_property["_reno_log_premium"] != 0.0)
.mean()
)
print(f" {n_with_premium:,} properties with renovation premium applied")
print(
f" Average premium multiplier: {avg_multiplier:.3f} ({avg_multiplier - 1:.1%} uplift)"
)
else:
print(" No properties with post-sale renovation events")
# Derive estimated price per sqm where both estimated price and floor area exist
df = df.with_columns(
(pl.col("Estimated current price") / pl.col("Total floor area (sqm)"))
@ -135,20 +395,19 @@ def main():
.alias("Est. price per sqm"),
)
n_adjusted = df.filter(
has_price & pl.col("_log_index_sale").is_not_null()
).height
n_with_price = df.filter(has_price).height
print(f" {n_adjusted:,} of {n_with_price:,} properties adjusted by index ({n_adjusted / max(n_with_price, 1) * 100:.1f}%)")
# Drop all temporary columns
temp_cols = [c for c in df.columns if c.startswith("_") or c.startswith("log_idx_")]
# Also drop hedonic-derived column if it was added
if "type_group" in df.columns:
temp_cols.append("type_group")
df = df.drop(temp_cols)
df.write_parquet(args.input)
size_mb = args.input.stat().st_size / (1024 * 1024)
print(f"\nWrote {args.input} ({size_mb:.1f} MB)")
print(f" {len(df):,} rows, {len(df.columns)} columns (including 'Estimated current price')")
print(
f" {len(df):,} rows, {len(df.columns)} columns (including 'Estimated current price')"
)
if __name__ == "__main__":

View file

@ -19,66 +19,38 @@ from scipy.sparse.linalg import lsqr
from scipy.spatial import KDTree
from tqdm import tqdm
from pipeline.transform._price_utils import (
CURRENT_YEAR,
SHRINKAGE_K,
TYPE_GROUPS,
build_hedonic_features,
extract_centroids,
hierarchy_keys,
sector_expr,
type_group_expr,
)
# --- Constants ---
MIN_PAIRS = 5
SHRINKAGE_K = 50
OUTLIER_THRESHOLD = 3.0 # hard pre-filter; Huber handles the rest
HUBER_K = 1.345
IRLS_ITERATIONS = 5
SPATIAL_NEIGHBORS = 5
SPATIAL_BLEND_K = 30
CURRENT_YEAR = 2025
TYPE_GROUPS = ["Detached", "Semi-Detached", "Terraced", "Flats"]
TERRACE_TYPES = ["Mid-Terrace", "End-Terrace", "Enclosed Mid-Terrace", "Enclosed End-Terrace"]
AGE_BREAKS = [1900, 1930, 1950, 1967, 1983, 2000, 2010]
AGE_LABELS = ["pre-1900", "1900-1929", "1930-1949", "1950-1966", "1967-1982", "1983-1999", "2000-2009", "2010+"]
def type_group_expr():
"""Polars expression: Property type → type_group."""
return (
pl.when(pl.col("Property type").is_in(TERRACE_TYPES)).then(pl.lit("Terraced"))
.when(pl.col("Property type") == "Flats/Maisonettes").then(pl.lit("Flats"))
.when(pl.col("Property type").is_in(["Detached", "Semi-Detached"])).then(pl.col("Property type"))
.otherwise(pl.lit(None))
.alias("type_group")
)
def age_band_expr():
"""Polars expression: Construction age (UInt16 year) → age band string."""
expr = pl.when(pl.col("Construction age").is_null()).then(pl.lit(None))
for i, brk in enumerate(AGE_BREAKS):
expr = expr.when(pl.col("Construction age") < brk).then(pl.lit(AGE_LABELS[i]))
return expr.otherwise(pl.lit(AGE_LABELS[-1])).alias("age_band")
def sector_expr():
"""Polars expression: Postcode → sector (drop last 2 chars, strip)."""
return pl.col("Postcode").str.slice(0, pl.col("Postcode").str.len_chars() - 2).str.strip_chars().alias("sector")
def hierarchy_keys(sector: str) -> tuple[str, str]:
"""Return (district, area) for a sector string."""
district = sector.rsplit(" ", 1)[0] if " " in sector else sector
area = ""
for ch in district:
if ch.isalpha():
area += ch
else:
break
return district, area
# --- Pair extraction ---
def extract_pairs(input_path: Path) -> pl.DataFrame:
print("Extracting repeat-sale pairs...")
df = (
pl.scan_parquet(input_path)
.select("Postcode", "historical_prices", "Property type")
.filter(pl.col("Postcode").is_not_null(), pl.col("historical_prices").list.len() >= 2)
.filter(
pl.col("Postcode").is_not_null(),
pl.col("historical_prices").list.len() >= 2,
)
.with_columns(sector_expr(), type_group_expr())
.collect()
)
@ -87,7 +59,9 @@ def extract_pairs(input_path: Path) -> pl.DataFrame:
pairs = (
df.lazy()
.with_columns(
pl.col("historical_prices").list.slice(0, pl.col("historical_prices").list.len() - 1).alias("from_txn"),
pl.col("historical_prices")
.list.slice(0, pl.col("historical_prices").list.len() - 1)
.alias("from_txn"),
pl.col("historical_prices").list.slice(1).alias("to_txn"),
)
.explode("from_txn", "to_txn")
@ -98,10 +72,18 @@ def extract_pairs(input_path: Path) -> pl.DataFrame:
pl.col("to_txn").struct.field("price").alias("price2"),
)
.select("sector", "type_group", "year1", "price1", "year2", "price2")
.filter(pl.col("price1") > 0, pl.col("price2") > 0, pl.col("year2") > pl.col("year1"))
.filter(
pl.col("price1") > 0,
pl.col("price2") > 0,
pl.col("year2") > pl.col("year1"),
)
.with_columns(
(pl.col("price2").cast(pl.Float64) / pl.col("price1").cast(pl.Float64)).log().alias("log_ratio"),
(1.0 / (pl.col("year2") - pl.col("year1")).cast(pl.Float64).sqrt()).alias("weight"),
(pl.col("price2").cast(pl.Float64) / pl.col("price1").cast(pl.Float64))
.log()
.alias("log_ratio"),
(1.0 / (pl.col("year2") - pl.col("year1")).cast(pl.Float64).sqrt()).alias(
"weight"
),
)
.filter(pl.col("log_ratio").abs() <= OUTLIER_THRESHOLD)
.collect()
@ -118,31 +100,14 @@ def extract_pairs(input_path: Path) -> pl.DataFrame:
return pairs
# --- Sector centroids ---
def extract_centroids(input_path: Path) -> dict[str, tuple[float, float]]:
print("Computing sector centroids...")
df = (
pl.scan_parquet(input_path)
.select("Postcode", "lat", "lon")
.filter(pl.col("Postcode").is_not_null(), pl.col("lat").is_not_null())
.with_columns(sector_expr())
.group_by("sector")
.agg(pl.col("lat").mean(), pl.col("lon").mean())
.collect()
)
centroids = {}
for row in df.iter_rows(named=True):
centroids[row["sector"]] = (row["lat"], row["lon"])
print(f" {len(centroids):,} sector centroids")
return centroids
# --- Robust IRLS solver ---
def solve_robust_index(
years1: np.ndarray, years2: np.ndarray,
log_ratios: np.ndarray, base_weights: np.ndarray,
years1: np.ndarray,
years2: np.ndarray,
log_ratios: np.ndarray,
base_weights: np.ndarray,
) -> dict[int, float]:
"""IRLS Huber M-estimation for the Case-Shiller repeat-sales model."""
n = len(years1)
@ -205,11 +170,16 @@ def solve_robust_index(
def compute_indices_for_level(pairs: pl.DataFrame, group_col: str):
"""Solve robust indices for each group. Returns (indices, n_pairs) dicts."""
groups = pairs.group_by(group_col).agg(
pl.col("year1"), pl.col("year2"), pl.col("log_ratio"), pl.col("weight"),
pl.col("year1"),
pl.col("year2"),
pl.col("log_ratio"),
pl.col("weight"),
)
indices = {}
n_pairs = {}
for row in tqdm(groups.iter_rows(named=True), total=len(groups), desc=f" {group_col}"):
for row in tqdm(
groups.iter_rows(named=True), total=len(groups), desc=f" {group_col}"
):
key = row[group_col]
y1 = np.array(row["year1"], dtype=np.int32)
y2 = np.array(row["year2"], dtype=np.int32)
@ -224,28 +194,28 @@ def compute_indices_for_level(pairs: pl.DataFrame, group_col: str):
# --- Hedonic model ---
def compute_hedonic_index(input_path: Path, min_year: int, max_year: int) -> dict[int, float]:
def compute_hedonic_index(
input_path: Path, min_year: int, max_year: int
) -> dict[int, float]:
"""Two-step hedonic index: regress log(price) on features, average residual by year."""
print("Computing hedonic index...")
df = (
pl.scan_parquet(input_path)
.select(
"Last known price", "Date of last transaction", "Property type",
"Total floor area (sqm)", "Current energy rating",
"Number of bedrooms & living rooms", "Construction age",
"Last known price",
"Date of last transaction",
"Property type",
"Total floor area (sqm)",
)
.filter(
pl.col("Last known price").is_not_null(),
pl.col("Total floor area (sqm)").is_not_null(),
pl.col("Total floor area (sqm)") > 0,
pl.col("Current energy rating").is_in(["A", "B", "C", "D", "E", "F", "G"]),
pl.col("Number of bedrooms & living rooms").is_not_null(),
pl.col("Construction age").is_not_null(),
)
.with_columns(
pl.col("Date of last transaction").dt.year().alias("sale_year"),
type_group_expr(),
age_band_expr(),
)
.filter(
pl.col("type_group").is_not_null(),
@ -261,29 +231,9 @@ def compute_hedonic_index(input_path: Path, min_year: int, max_year: int) -> dic
log_price = np.log(df["Last known price"].to_numpy().astype(np.float64))
sale_years = df["sale_year"].to_numpy()
# Build feature matrix
parts = []
# log(floor_area)
fa = df["Total floor area (sqm)"].to_numpy().astype(np.float32)
parts.append(np.log(np.maximum(fa, 1.0)).reshape(-1, 1))
# Type dummies (ref: Detached)
tg = df["type_group"].to_numpy()
for t in ["Terraced", "Semi-Detached", "Flats"]:
parts.append((tg == t).astype(np.float32).reshape(-1, 1))
# EPC dummies (ref: D)
epc = df["Current energy rating"].to_numpy()
for r in ["A", "B", "C", "E", "F", "G"]:
parts.append((epc == r).astype(np.float32).reshape(-1, 1))
# Rooms
parts.append(df["Number of bedrooms & living rooms"].to_numpy().astype(np.float32).reshape(-1, 1))
# Age band dummies (ref: pre-1900)
ab = df["age_band"].to_numpy()
for band in AGE_LABELS[1:]:
parts.append((ab == band).astype(np.float32).reshape(-1, 1))
# Intercept
parts.append(np.ones((len(df), 1), dtype=np.float32))
F = np.hstack(parts)
# Build feature matrix (18 hedonic features + intercept)
X = build_hedonic_features(df)
F = np.hstack([X, np.ones((len(df), 1), dtype=np.float32)])
print(f" Feature matrix: {F.shape[0]:,} × {F.shape[1]}")
# Step 1: regress log(price) on features → quality score
@ -303,12 +253,15 @@ def compute_hedonic_index(input_path: Path, min_year: int, max_year: int) -> dic
for y in hedonic:
hedonic[y] -= base
print(f" Hedonic index: {len(hedonic)} years, range {min(hedonic.values()):.3f} to {max(hedonic.values()):.3f}")
print(
f" Hedonic index: {len(hedonic)} years, range {min(hedonic.values()):.3f} to {max(hedonic.values()):.3f}"
)
return hedonic
# --- Shrinkage ---
def shrink_index(raw: dict, parent: dict, n_pairs: int, k: int = SHRINKAGE_K) -> dict:
w = n_pairs / (n_pairs + k)
result = {}
@ -320,9 +273,18 @@ def shrink_index(raw: dict, parent: dict, n_pairs: int, k: int = SHRINKAGE_K) ->
def apply_shrinkage(
sector_idx, sector_n, district_idx, district_n,
area_idx, area_n, national_idx, national_n,
hedonic_idx, all_sectors, sector_to_dist, dist_to_area,
sector_idx,
sector_n,
district_idx,
district_n,
area_idx,
area_n,
national_idx,
national_n,
hedonic_idx,
all_sectors,
sector_to_dist,
dist_to_area,
):
"""Top-down hierarchical shrinkage: national→hedonic, area→national, etc."""
# National → hedonic
@ -361,8 +323,11 @@ def apply_shrinkage(
# --- Spatial smoothing ---
def spatial_smooth(
sector_indices: dict, centroids: dict, n_pairs_map: dict,
sector_indices: dict,
centroids: dict,
n_pairs_map: dict,
) -> dict:
"""Blend sparse sector indices with K nearest neighbors."""
# Build coordinate arrays for sectors with centroids
@ -420,6 +385,7 @@ def spatial_smooth(
# --- Forward fill ---
def forward_fill(index: dict, min_year: int, max_year: int) -> dict:
filled = {}
last = 0.0
@ -432,8 +398,11 @@ def forward_fill(index: dict, min_year: int, max_year: int) -> dict:
# --- Main ---
def main():
parser = argparse.ArgumentParser(description="Build improved repeat-sales price index")
parser = argparse.ArgumentParser(
description="Build improved repeat-sales price index"
)
parser.add_argument("--input", type=Path, required=True)
parser.add_argument("--output", type=Path, required=True)
args = parser.parse_args()
@ -474,8 +443,10 @@ def main():
# National
np_arrs = typed.select("year1", "year2", "log_ratio", "weight")
national_idx = solve_robust_index(
np_arrs["year1"].to_numpy(), np_arrs["year2"].to_numpy(),
np_arrs["log_ratio"].to_numpy(), np_arrs["weight"].to_numpy(),
np_arrs["year1"].to_numpy(),
np_arrs["year2"].to_numpy(),
np_arrs["log_ratio"].to_numpy(),
np_arrs["weight"].to_numpy(),
)
national_n = len(typed)
print(f" National: {len(national_idx)} years")
@ -485,14 +456,25 @@ def main():
area_idx, area_n = compute_indices_for_level(typed, "area")
district_idx, district_n = compute_indices_for_level(typed, "district")
sector_idx, sector_n = compute_indices_for_level(typed, "sector")
print(f" {len(area_idx)} areas, {len(district_idx)} districts, {len(sector_idx)} sectors")
print(
f" {len(area_idx)} areas, {len(district_idx)} districts, {len(sector_idx)} sectors"
)
# Shrinkage
print(" Applying shrinkage...")
sector_shrunk = apply_shrinkage(
sector_idx, sector_n, district_idx, district_n,
area_idx, area_n, national_idx, national_n,
hedonic_idx, all_sectors, sector_to_dist, dist_to_area,
sector_idx,
sector_n,
district_idx,
district_n,
area_idx,
area_n,
national_idx,
national_n,
hedonic_idx,
all_sectors,
sector_to_dist,
dist_to_area,
)
# Spatial smoothing
@ -519,15 +501,22 @@ def main():
result = pl.DataFrame(
rows,
schema={"sector": pl.String, "type_group": pl.String, "year": pl.Int32,
"log_index": pl.Float64, "n_pairs": pl.Int64},
schema={
"sector": pl.String,
"type_group": pl.String,
"year": pl.Int32,
"log_index": pl.Float64,
"n_pairs": pl.Int64,
},
orient="row",
).sort("type_group", "sector", "year")
result.write_parquet(args.output)
size_mb = args.output.stat().st_size / (1024 * 1024)
print(f"\nWrote {args.output} ({size_mb:.1f} MB)")
print(f" {result['sector'].n_unique():,} sectors × {len(all_type_groups)} types × {max_year - min_year + 1} years = {len(result):,} rows")
print(
f" {result['sector'].n_unique():,} sectors × {len(all_type_groups)} types × {max_year - min_year + 1} years = {len(result):,} rows"
)
if __name__ == "__main__":

View file

@ -0,0 +1,572 @@
"""Estimate per-area renovation premiums from repeat-sale residuals.
For each repeat-sale pair, computes the residual after removing the price-index
predicted return. Pairs where renovation events occurred between sales should have
systematically higher residuals. A WLS regression estimates the log-premium per
event type, with hierarchical shrinkage and spatial smoothing.
Output: renovation_premium.parquet sector × type_group × event_type log_premium
"""
import argparse
import math
from pathlib import Path
import numpy as np
import polars as pl
from scipy.spatial import KDTree
from pipeline.transform._price_utils import (
SHRINKAGE_K,
TYPE_GROUPS,
extract_centroids,
hierarchy_keys,
sector_expr,
type_group_expr,
)
HALF_LIFE = 10.0
DECAY_RATE = math.log(2) / HALF_LIFE
OUTLIER_THRESHOLD = 3.0
MIN_PAIRS = 10
SPATIAL_NEIGHBORS = 5
SPATIAL_BLEND_K = 30
EVENT_TYPES = ["Extension", "Renovation", "Remodeling"]
def extract_pairs_with_events(input_path: Path, index_path: Path) -> pl.DataFrame:
"""Extract repeat-sale pairs with renovation events and index residuals."""
print("Extracting repeat-sale pairs with renovation events...")
df = (
pl.scan_parquet(input_path)
.select("Postcode", "historical_prices", "Property type", "renovation_history")
.filter(
pl.col("Postcode").is_not_null(),
pl.col("historical_prices").list.len() >= 2,
)
.with_columns(sector_expr(), type_group_expr())
.collect()
)
print(f" {len(df):,} properties with 2+ transactions")
# Build consecutive pairs
pairs = (
df.lazy()
.with_columns(
pl.col("historical_prices")
.list.slice(0, pl.col("historical_prices").list.len() - 1)
.alias("from_txn"),
pl.col("historical_prices").list.slice(1).alias("to_txn"),
)
.explode("from_txn", "to_txn")
.with_columns(
pl.col("from_txn").struct.field("year").alias("year1"),
pl.col("from_txn").struct.field("price").alias("price1"),
pl.col("to_txn").struct.field("year").alias("year2"),
pl.col("to_txn").struct.field("price").alias("price2"),
)
.select(
"sector",
"type_group",
"year1",
"price1",
"year2",
"price2",
"renovation_history",
)
.filter(
pl.col("price1") > 0,
pl.col("price2") > 0,
pl.col("year2") > pl.col("year1"),
)
.with_columns(
(pl.col("price2").cast(pl.Float64) / pl.col("price1").cast(pl.Float64))
.log()
.alias("log_ratio"),
)
.filter(pl.col("log_ratio").abs() <= OUTLIER_THRESHOLD)
.collect()
)
print(f" {len(pairs):,} repeat-sale pairs")
# Join price index to compute residuals
index = pl.read_parquet(index_path)
has_type_group = "type_group" in index.columns
if has_type_group:
idx_typed = index.filter(pl.col("type_group") != "All")
idx_all = index.filter(pl.col("type_group") == "All")
# Join at year1
pairs = pairs.join(
idx_typed.select(
"sector", "type_group", "year", pl.col("log_index").alias("li1_typed")
),
left_on=["sector", "type_group", "year1"],
right_on=["sector", "type_group", "year"],
how="left",
).join(
idx_all.select("sector", "year", pl.col("log_index").alias("li1_all")),
left_on=["sector", "year1"],
right_on=["sector", "year"],
how="left",
)
# Join at year2
pairs = pairs.join(
idx_typed.select(
"sector", "type_group", "year", pl.col("log_index").alias("li2_typed")
),
left_on=["sector", "type_group", "year2"],
right_on=["sector", "type_group", "year"],
how="left",
).join(
idx_all.select("sector", "year", pl.col("log_index").alias("li2_all")),
left_on=["sector", "year2"],
right_on=["sector", "year"],
how="left",
)
pairs = pairs.with_columns(
(pl.col("li1_typed").fill_null(pl.col("li1_all"))).alias("_li1"),
(pl.col("li2_typed").fill_null(pl.col("li2_all"))).alias("_li2"),
)
else:
pairs = pairs.join(
index.select("sector", "year", pl.col("log_index").alias("_li1")),
left_on=["sector", "year1"],
right_on=["sector", "year"],
how="left",
).join(
index.select("sector", "year", pl.col("log_index").alias("_li2")),
left_on=["sector", "year2"],
right_on=["sector", "year"],
how="left",
)
# Compute residual = log_ratio - (index2 - index1)
pairs = pairs.with_columns(
(
pl.col("log_ratio")
- (pl.col("_li2").fill_null(0.0) - pl.col("_li1").fill_null(0.0))
).alias("residual"),
(1.0 / (pl.col("year2") - pl.col("year1")).cast(pl.Float64).sqrt()).alias(
"weight"
),
)
# For each pair, compute time-decayed renovation indicators
# Use row index for unique identification (composite keys aren't unique per pair)
pairs = pairs.with_row_index("_pair_idx")
for et in EVENT_TYPES:
col_name = f"has_{et.lower()}"
pairs = pairs.with_columns(pl.lit(0.0).alias(col_name))
# Process properties that have renovation history
has_reno = pairs.filter(
pl.col("renovation_history").is_not_null()
& (pl.col("renovation_history").list.len() > 0)
)
if len(has_reno) > 0:
reno_exploded = (
has_reno.select("_pair_idx", "year1", "year2", "renovation_history")
.explode("renovation_history")
.with_columns(
pl.col("renovation_history").struct.field("year").alias("event_year"),
pl.col("renovation_history").struct.field("event").alias("event_type"),
)
# Only events between the two sales
.filter(
(pl.col("event_year") > pl.col("year1"))
& (pl.col("event_year") <= pl.col("year2"))
)
)
if len(reno_exploded) > 0:
# For each pair + event type, take the most recent event
latest_events = reno_exploded.group_by(
"_pair_idx", "event_type", "year2"
).agg(pl.col("event_year").max().alias("latest_event_year"))
# Compute time-decayed indicator: exp(-decay_rate * (year2 - event_year))
latest_events = latest_events.with_columns(
(
-DECAY_RATE
* (pl.col("year2") - pl.col("latest_event_year")).cast(pl.Float64)
)
.exp()
.alias("decayed_indicator"),
)
# Pivot to wide format using _pair_idx for unique join
for et in EVENT_TYPES:
et_data = latest_events.filter(pl.col("event_type") == et)
if len(et_data) > 0:
col_name = f"has_{et.lower()}"
pairs = (
pairs.join(
et_data.select(
"_pair_idx",
pl.col("decayed_indicator").alias(f"_{col_name}"),
),
on="_pair_idx",
how="left",
)
.with_columns(
pl.col(f"_{col_name}").fill_null(0.0).alias(col_name),
)
.drop(f"_{col_name}")
)
pairs = pairs.drop("_pair_idx")
# Add hierarchy columns
pairs = pairs.with_columns(
pl.col("sector").str.replace(r"\s+\d+$", "").alias("district"),
).with_columns(
pl.col("district").str.replace(r"\d.*$", "").alias("area"),
)
# Count reno pairs
reno_mask = (
(pl.col("has_extension") > 0)
| (pl.col("has_renovation") > 0)
| (pl.col("has_remodeling") > 0)
)
n_reno = pairs.filter(reno_mask).height
print(
f" {n_reno:,} pairs with renovation events ({n_reno / len(pairs) * 100:.1f}%)"
)
# Drop temporary columns from index join + renovation_history (no longer needed)
temp_cols = [
c
for c in pairs.columns
if c.startswith("_li") or c.startswith("li1_") or c.startswith("li2_")
]
pairs = pairs.drop(temp_cols + ["renovation_history"])
return pairs
def wls_regression(
residuals: np.ndarray,
weights: np.ndarray,
X: np.ndarray,
) -> np.ndarray:
"""Weighted least squares: residual ~ X (with intercept column in X).
Uses sqrt(weights) scaling to avoid building a full N×N diagonal matrix.
"""
sqrt_w = np.sqrt(weights)[:, np.newaxis]
Xw = X * sqrt_w
yw = residuals * sqrt_w.ravel()
try:
betas = np.linalg.lstsq(Xw, yw, rcond=None)[0]
except np.linalg.LinAlgError:
betas = np.zeros(X.shape[1])
return betas
def compute_premiums_for_group(df: pl.DataFrame) -> dict[str, float]:
"""Run WLS regression for a group, return {event_type: log_premium}."""
n = len(df)
if n < MIN_PAIRS:
return {}
residuals = df["residual"].to_numpy().astype(np.float64)
weights = df["weight"].to_numpy().astype(np.float64)
# Build design matrix: intercept + 3 event indicators
X = np.column_stack(
[
np.ones(n),
df["has_extension"].to_numpy().astype(np.float64),
df["has_renovation"].to_numpy().astype(np.float64),
df["has_remodeling"].to_numpy().astype(np.float64),
]
)
# Check if we have any renovation pairs in this group
reno_sum = X[:, 1:].sum()
if reno_sum < 1.0:
return {}
betas = wls_regression(residuals, weights, X)
# betas[0] is intercept, betas[1:4] are the premiums
return {
"Extension": float(betas[1]),
"Renovation": float(betas[2]),
"Remodeling": float(betas[3]),
}
def compute_premiums_for_level(
pairs: pl.DataFrame, group_col: str
) -> tuple[dict, dict]:
"""Compute premiums per group at a given hierarchy level.
Returns (premiums, n_reno_pairs) dicts keyed by group value.
premiums[key] = {event_type: log_premium}
"""
groups = pairs.group_by(group_col)
premiums = {}
n_reno_pairs = {}
for key, group_df in groups:
key_val = key[0]
result = compute_premiums_for_group(group_df)
if result:
premiums[key_val] = result
# Count pairs with any reno indicator
reno_mask = (
(group_df["has_extension"].to_numpy() > 0)
| (group_df["has_renovation"].to_numpy() > 0)
| (group_df["has_remodeling"].to_numpy() > 0)
)
n_reno_pairs[key_val] = int(reno_mask.sum())
return premiums, n_reno_pairs
def shrink_premium(
raw: dict[str, float], parent: dict[str, float], n: int
) -> dict[str, float]:
"""Shrink raw premiums toward parent level."""
w = n / (n + SHRINKAGE_K)
result = {}
for et in EVENT_TYPES:
r = raw.get(et, parent.get(et, 0.0))
p = parent.get(et, raw.get(et, 0.0))
result[et] = w * r + (1 - w) * p
return result
def apply_shrinkage(
sector_prem,
sector_n,
district_prem,
district_n,
area_prem,
area_n,
national_prem,
national_n,
all_sectors,
sector_to_dist,
dist_to_area,
):
"""Top-down hierarchical shrinkage for premiums."""
# Area -> national
area_shrunk = {}
for area, prem in area_prem.items():
area_shrunk[area] = shrink_premium(prem, national_prem, area_n.get(area, 0))
# District -> area
district_shrunk = {}
for dist, prem in district_prem.items():
a = dist_to_area.get(dist, "")
parent = area_shrunk.get(a, national_prem)
district_shrunk[dist] = shrink_premium(prem, parent, district_n.get(dist, 0))
# Sector -> district
sector_shrunk = {}
for sec, prem in sector_prem.items():
d = sector_to_dist.get(sec, "")
parent = district_shrunk.get(d, national_prem)
sector_shrunk[sec] = shrink_premium(prem, parent, sector_n.get(sec, 0))
# Fill missing sectors
for sec in all_sectors:
if sec not in sector_shrunk:
d = sector_to_dist.get(sec, "")
a = dist_to_area.get(d, "")
sector_shrunk[sec] = district_shrunk.get(
d, area_shrunk.get(a, national_prem)
)
return sector_shrunk
def spatial_smooth(
sector_premiums: dict[str, dict[str, float]],
centroids: dict[str, tuple[float, float]],
n_reno_map: dict[str, int],
) -> dict[str, dict[str, float]]:
"""Blend sparse sector premiums with K nearest neighbors."""
sectors_with_coords = [s for s in sector_premiums if s in centroids]
if len(sectors_with_coords) < SPATIAL_NEIGHBORS + 1:
return sector_premiums
coords = np.array([centroids[s] for s in sectors_with_coords])
mean_lat = np.mean(coords[:, 0])
scale = np.cos(np.radians(mean_lat))
scaled_coords = np.column_stack([coords[:, 0], coords[:, 1] * scale])
tree = KDTree(scaled_coords)
result = dict(sector_premiums)
for i, sec in enumerate(sectors_with_coords):
n = n_reno_map.get(sec, 0)
self_w = n / (n + SPATIAL_BLEND_K)
if self_w > 0.95:
continue
dists, idxs = tree.query(scaled_coords[i], k=SPATIAL_NEIGHBORS + 1)
neighbor_dists = dists[1:]
neighbor_idxs = idxs[1:]
inv_dists = []
neighbor_prems = []
for d, j in zip(neighbor_dists, neighbor_idxs):
ns = sectors_with_coords[j]
if d > 0 and ns in sector_premiums:
inv_dists.append(1.0 / d)
neighbor_prems.append(sector_premiums[ns])
if not neighbor_prems:
continue
total_inv = sum(inv_dists)
nbr_w = 1.0 - self_w
ws = [iw / total_inv * nbr_w for iw in inv_dists]
blended = {}
for et in EVENT_TYPES:
val = self_w * sector_premiums[sec].get(et, 0.0)
for np_dict, w in zip(neighbor_prems, ws):
val += w * np_dict.get(et, 0.0)
blended[et] = val
result[sec] = blended
return result
def main():
parser = argparse.ArgumentParser(
description="Estimate renovation premiums from repeat-sale residuals"
)
parser.add_argument(
"--input", type=Path, required=True, help="Path to wide.parquet"
)
parser.add_argument(
"--index", type=Path, required=True, help="Path to price_index.parquet"
)
parser.add_argument(
"--output", type=Path, required=True, help="Output renovation_premium.parquet"
)
args = parser.parse_args()
pairs = extract_pairs_with_events(args.input, args.index)
centroids = extract_centroids(args.input)
# Precompute hierarchy
all_sectors = pairs["sector"].unique().to_list()
sector_to_dist = {}
dist_to_area = {}
for s in all_sectors:
d, a = hierarchy_keys(s)
sector_to_dist[s] = d
dist_to_area[d] = a
all_type_groups = ["All"] + TYPE_GROUPS
rows = []
for tg in all_type_groups:
print(f"\n--- {tg} ---")
typed = pairs if tg == "All" else pairs.filter(pl.col("type_group") == tg)
if len(typed) < MIN_PAIRS:
print(f" Skipping (only {len(typed)} pairs)")
continue
print(f" {len(typed):,} pairs")
# National
national_prem = compute_premiums_for_group(typed)
national_reno = typed.filter(
(pl.col("has_extension") > 0)
| (pl.col("has_renovation") > 0)
| (pl.col("has_remodeling") > 0)
).height
if not national_prem:
print(" No renovation pairs at national level, skipping")
continue
print(
" National premiums: "
+ ", ".join(
f"{et}: {v:.4f} ({math.exp(v) - 1:.1%})"
for et, v in national_prem.items()
)
)
# Per-level
print(" Computing per-level premiums:")
area_prem, area_n = compute_premiums_for_level(typed, "area")
district_prem, district_n = compute_premiums_for_level(typed, "district")
sector_prem, sector_n = compute_premiums_for_level(typed, "sector")
print(
f" {len(area_prem)} areas, {len(district_prem)} districts, {len(sector_prem)} sectors with data"
)
# Shrinkage
print(" Applying shrinkage...")
sector_shrunk = apply_shrinkage(
sector_prem,
sector_n,
district_prem,
district_n,
area_prem,
area_n,
national_prem,
national_reno,
all_sectors,
sector_to_dist,
dist_to_area,
)
# Spatial smoothing
print(" Spatial smoothing...")
sector_smoothed = spatial_smooth(sector_shrunk, centroids, sector_n)
# Collect rows
for sec in all_sectors:
prem = sector_smoothed.get(sec, national_prem)
n = sector_n.get(sec, 0)
for et in EVENT_TYPES:
rows.append((sec, tg, et, prem.get(et, 0.0), n))
result = pl.DataFrame(
rows,
schema={
"sector": pl.String,
"type_group": pl.String,
"event_type": pl.String,
"log_premium": pl.Float64,
"n_reno_pairs": pl.Int64,
},
orient="row",
).sort("type_group", "sector", "event_type")
result.write_parquet(args.output)
size_mb = args.output.stat().st_size / (1024 * 1024)
print(f"\nWrote {args.output} ({size_mb:.1f} MB)")
print(
f" {result['sector'].n_unique():,} sectors x {len(all_type_groups)} types x {len(EVENT_TYPES)} events = {len(result):,} rows"
)
# Print summary statistics
print("\nNational premium summary:")
national = (
result.filter(pl.col("type_group") == "All")
.group_by("event_type")
.agg(
pl.col("log_premium").mean().alias("mean_log_premium"),
)
)
for row in national.iter_rows(named=True):
et = row["event_type"]
lp = row["mean_log_premium"]
print(f" {et}: log_premium={lp:.4f} ({math.exp(lp) - 1:.1%} price uplift)")
if __name__ == "__main__":
main()

View file

@ -163,7 +163,7 @@ public class App {
case "bicycle":
task.fromTime = 8 * 3600;
task.toTime = 8 * 3600 + 60;
task.maxTripDurationMinutes = 90;
task.maxTripDurationMinutes = 120;
task.accessModes = EnumSet.of(LegMode.BICYCLE);
task.egressModes = EnumSet.of(LegMode.BICYCLE);
task.directModes = EnumSet.of(LegMode.BICYCLE);
@ -172,7 +172,7 @@ public class App {
case "walking":
task.fromTime = 8 * 3600;
task.toTime = 8 * 3600 + 60;
task.maxTripDurationMinutes = 60;
task.maxTripDurationMinutes = 120;
task.accessModes = EnumSet.of(LegMode.WALK);
task.egressModes = EnumSet.of(LegMode.WALK);
task.directModes = EnumSet.of(LegMode.WALK);
@ -181,7 +181,7 @@ public class App {
default: // transit
task.fromTime = 8 * 3600;
task.toTime = 8 * 3600 + 60; // single RAPTOR sweep
task.maxTripDurationMinutes = 90;
task.maxTripDurationMinutes = 120;
task.maxRides = 4;
task.accessModes = EnumSet.of(LegMode.WALK);
task.egressModes = EnumSet.of(LegMode.WALK);

View file

@ -79,13 +79,18 @@ async fn validate_token(
.header("Authorization", format!("Bearer {token}"))
.send()
.await
.map_err(|err| warn!("Token validation request failed: {err}"))
.ok()?;
if !res.status().is_success() {
return None;
}
let body: AuthRefreshResponse = res.json().await.ok()?;
let body: AuthRefreshResponse = res
.json()
.await
.map_err(|err| warn!("Failed to parse auth refresh response: {err}"))
.ok()?;
Some(body.record)
}

View file

@ -18,6 +18,5 @@ pub const AREA_SUMMARY_SYSTEM_PROMPT: &str = "You are an experienced estate agen
pub const AREA_SUMMARY_MAX_TOKENS: usize = 300;
pub const AREA_SUMMARY_TEMPERATURE: f32 = 0.3;
pub const AI_FILTERS_SYSTEM_PROMPT: &str = "You are a property search assistant. The user will describe their ideal property or area in natural language. Your job is to translate their description into filter settings. ONLY set filters the user explicitly mentioned or clearly implied. Leave everything else out. Do not guess or add extra filters. If a request is ambiguous, prefer a wider range. Output valid JSON matching the provided schema.";
pub const AI_FILTERS_MAX_TOKENS: usize = 2000;
pub const AI_FILTERS_TEMPERATURE: f32 = 0.0;

View file

@ -26,6 +26,8 @@ pub struct FeatureConfig {
pub suffix: &'static str,
/// If true, show full integer (no k/M abbreviation)
pub raw: bool,
/// If true, the slider uses absolute min/max/step instead of percentile scaling
pub absolute: bool,
}
/// Features whose histogram bins should be exactly 1 unit wide (one per integer).
@ -85,6 +87,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "£",
suffix: "",
raw: false,
absolute: true,
},
FeatureConfig {
name: "Estimated current price",
@ -94,11 +97,12 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
},
step: 10000.0,
description: "Inflation-adjusted estimate of the current property value",
detail: "Estimated by applying a repeat-sales price index to the last known sale price. The index tracks price changes within each postcode sector and property type. Properties sold recently will have estimates close to their sale price; older sales are adjusted more. Coverage depends on having enough repeat sales in the local area to build the index.",
detail: "Estimated by applying a repeat-sales price index to the last known sale price, plus a renovation premium for properties with post-sale improvements detected from EPC records (extensions, renovations, remodeling). The index tracks price changes within each postcode sector and property type. Renovation premiums are estimated per area from observed repeat-sale pairs and decay over time. Properties sold recently will have estimates close to their sale price; older sales are adjusted more.",
source: "price-paid",
prefix: "£",
suffix: "",
raw: false,
absolute: true,
},
FeatureConfig {
name: "Price per sqm",
@ -113,6 +117,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "£",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Est. price per sqm",
@ -122,11 +127,12 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
},
step: 100.0,
description: "Estimated current price divided by total floor area",
detail: "Calculated by dividing the inflation-adjusted estimated current price by the total floor area from the EPC certificate. Provides a more up-to-date price-per-area comparison than the historical sale price per sqm.",
detail: "Calculated by dividing the inflation-adjusted estimated current price (including any renovation premium) by the total floor area from the EPC certificate. Provides a more up-to-date price-per-area comparison than the historical sale price per sqm.",
source: "price-paid",
prefix: "£",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Total floor area (sqm)",
@ -141,12 +147,28 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: " sqm",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Interior height (m)",
bounds: Bounds::Percentile {
low: 2.0,
high: 98.0,
},
step: 0.1,
description: "Average storey height from the EPC survey",
detail: "Average internal floor-to-ceiling height in metres as recorded during the Energy Performance Certificate assessment. Calculated by dividing the total internal volume by the total floor area.",
source: "epc",
prefix: "",
suffix: " m",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Number of bedrooms & living rooms",
bounds: Bounds::Fixed {
min: 1.0,
max: 10.0,
max: 12.0,
},
step: 1.0,
description: "Count of habitable rooms from the EPC survey",
@ -155,6 +177,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: " rooms",
raw: false,
absolute: true,
},
FeatureConfig {
name: "Estimated monthly rent",
@ -166,6 +189,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "£",
suffix: "/mo",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Date of last transaction",
@ -180,6 +204,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: true,
absolute: false,
},
FeatureConfig {
name: "Construction age",
@ -194,6 +219,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: true,
absolute: false,
},
],
},
@ -213,6 +239,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: " mins",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Public transport to Fitzrovia (mins)",
@ -227,6 +254,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: " mins",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Cycling to Bank (mins)",
@ -241,6 +269,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: " mins",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Cycling to Fitzrovia (mins)",
@ -255,6 +284,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: " mins",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Number of public transport stations within 2km",
@ -269,6 +299,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
],
},
@ -288,6 +319,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Good+ primary schools within 5km",
@ -302,6 +334,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Good+ secondary schools within 5km",
@ -316,6 +349,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
],
},
@ -332,6 +366,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Employment Score (rate)",
@ -343,6 +378,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Health Deprivation and Disability Score",
@ -357,6 +393,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Living Environment Score",
@ -371,6 +408,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Indoors Sub-domain Score",
@ -385,6 +423,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Outdoors Sub-domain Score",
@ -399,6 +438,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
],
},
@ -418,6 +458,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Violence and sexual offences (avg/yr)",
@ -432,6 +473,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Criminal damage and arson (avg/yr)",
@ -446,6 +488,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Burglary (avg/yr)",
@ -460,6 +503,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Vehicle crime (avg/yr)",
@ -474,6 +518,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Robbery (avg/yr)",
@ -488,6 +533,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Other theft (avg/yr)",
@ -502,6 +548,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Shoplifting (avg/yr)",
@ -516,6 +563,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Drugs (avg/yr)",
@ -530,6 +578,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Possession of weapons (avg/yr)",
@ -544,6 +593,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Public order (avg/yr)",
@ -558,6 +608,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Bicycle theft (avg/yr)",
@ -572,6 +623,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Theft from the person (avg/yr)",
@ -586,6 +638,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Other crime (avg/yr)",
@ -600,6 +653,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Serious crime (avg/yr)",
@ -614,6 +668,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Minor crime (avg/yr)",
@ -628,6 +683,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "/yr",
raw: false,
absolute: false,
},
],
},
@ -647,6 +703,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "%",
raw: false,
absolute: false,
},
FeatureConfig {
name: "% Asian",
@ -661,6 +718,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "%",
raw: false,
absolute: false,
},
FeatureConfig {
name: "% Black",
@ -675,6 +733,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "%",
raw: false,
absolute: false,
},
FeatureConfig {
name: "% Mixed",
@ -689,6 +748,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "%",
raw: false,
absolute: false,
},
FeatureConfig {
name: "% Other",
@ -703,6 +763,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "%",
raw: false,
absolute: false,
},
],
},
@ -722,6 +783,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Number of grocery shops and supermarkets within 2km",
@ -736,6 +798,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Number of parks within 2km",
@ -750,6 +813,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: "",
raw: false,
absolute: false,
},
],
},
@ -769,6 +833,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: " dB",
raw: false,
absolute: false,
},
FeatureConfig {
name: "Max available download speed (Mbps)",
@ -783,6 +848,7 @@ pub static FEATURE_GROUPS: &[FeatureGroup] = &[
prefix: "",
suffix: " Mbps",
raw: true,
absolute: false,
},
],
},

View file

@ -6,6 +6,7 @@ mod features;
mod metrics;
mod og_middleware;
pub mod parsing;
mod pocketbase;
mod routes;
mod state;
pub mod utils;
@ -23,7 +24,7 @@ use tower_http::compression::CompressionLayer;
use tower_http::cors::{Any, CorsLayer};
use tower_http::services::{ServeDir, ServeFile};
use tower_http::trace::TraceLayer;
use tracing::{info, warn};
use tracing::info;
use tracing_subscriber::EnvFilter;
use state::AppState;
@ -39,6 +40,10 @@ struct Cli {
#[arg(long)]
pois: PathBuf,
/// Path to the places parquet file
#[arg(long)]
places: PathBuf,
/// Path to the postcode boundaries directory
#[arg(long)]
postcodes: PathBuf,
@ -56,28 +61,36 @@ struct Cli {
screenshot_url: String,
/// Public-facing URL for absolute og:image URLs
#[arg(
long,
env = "PUBLIC_URL",
default_value = "https://perfectpostcodes.schmelczer.dev"
)]
#[arg(long, env = "PUBLIC_URL")]
public_url: String,
/// PocketBase server URL for authentication (e.g. http://localhost:8090)
#[arg(long, env = "POCKETBASE_URL")]
pocketbase_url: String,
/// PocketBase superuser email (for auto-creating collections at startup)
#[arg(long, env = "POCKETBASE_ADMIN_EMAIL")]
pocketbase_admin_email: Option<String>,
/// PocketBase superuser password (for auto-creating collections at startup)
#[arg(long, env = "POCKETBASE_ADMIN_PASSWORD")]
pocketbase_admin_password: Option<String>,
/// Ollama server URL for AI area summaries (e.g. http://ollama:11434)
#[arg(long, env = "OLLAMA_URL")]
ollama_url: String,
/// Ollama model name for area summaries
#[arg(long, env = "OLLAMA_MODEL", default_value = "gemma3:12b")]
#[arg(long, env = "OLLAMA_MODEL")]
ollama_model: String,
/// R5 routing service URL for real-time travel times (e.g. http://r5:8003)
#[arg(long, env = "R5_URL", default_value = "")]
r5_url: String,
/// R5 routing service URL for all travel times (e.g. http://r5:8003)
#[arg(long, env = "R5_URL")]
r5_url: Option<String>,
/// Google Maps API key for Street View metadata lookups
#[arg(long, env = "GOOGLE_MAPS_API_KEY")]
google_maps_api_key: String,
}
#[tokio::main]
@ -138,6 +151,15 @@ async fn main() -> anyhow::Result<()> {
info!("Building POI spatial grid index");
let poi_grid = utils::GridIndex::build(&poi_data.lat, &poi_data.lng, consts::GRID_CELL_SIZE);
// Load place data
let places_path = &cli.places;
if !places_path.exists() {
bail!("Places parquet file not found: {}", places_path.display());
}
info!("Loading place data from {}", places_path.display());
let place_data = data::PlaceData::load(places_path)?;
info!(places = place_data.name.len(), "Place data loaded");
// Load postcode boundaries
let postcodes_path = &cli.postcodes;
if !postcodes_path.exists() {
@ -191,26 +213,15 @@ async fn main() -> anyhow::Result<()> {
let poi_category_groups = poi_data.category_groups()?;
// Read index.html at startup for crawler OG injection
let frontend_dist = cli
.dist
.unwrap_or_else(|| PathBuf::from("frontend/dist"));
let index_html = {
let index_path = frontend_dist.join("index.html");
match std::fs::read_to_string(&index_path) {
Ok(html) => {
info!("Loaded index.html for OG injection");
Some(html)
}
Err(err) => {
warn!(
"Could not read {}: {} (OG injection disabled)",
index_path.display(),
err
);
None
}
}
let (frontend_dist, index_html) = if let Some(dist) = cli.dist {
let index_path = dist.join("index.html");
let html = std::fs::read_to_string(&index_path)
.with_context(|| format!("Failed to read {}", index_path.display()))?;
info!("Loaded index.html for OG injection");
(Some(dist), Some(html))
} else {
info!("No --dist provided, static serving and OG injection disabled");
(None, None)
};
let http_client = reqwest::Client::new();
@ -223,6 +234,10 @@ async fn main() -> anyhow::Result<()> {
"Precomputed features response"
);
let ai_filters_schema = routes::build_ollama_schema(&features_response);
let ai_filters_system_prompt = routes::build_system_prompt(&features_response);
info!("Precomputed AI filters schema and system prompt");
// Record data loading metrics
metrics::record_data_stats(
property_data.lat.len(),
@ -231,12 +246,21 @@ async fn main() -> anyhow::Result<()> {
);
info!("PocketBase configured: {}", cli.pocketbase_url);
if let (Some(ref email), Some(ref password)) =
(&cli.pocketbase_admin_email, &cli.pocketbase_admin_password)
{
pocketbase::ensure_collections(&http_client, &cli.pocketbase_url, email, password).await?;
} else {
info!("PocketBase admin credentials not set — skipping collection auto-creation");
}
info!(
"Ollama configured: {} (model: {})",
cli.ollama_url, cli.ollama_model
);
if !cli.r5_url.is_empty() {
info!("R5 routing service configured: {}", cli.r5_url);
if let Some(ref url) = cli.r5_url {
info!("R5 routing service configured: {}", url);
} else {
info!("R5 routing service not configured (travel time queries disabled)");
}
@ -249,6 +273,7 @@ async fn main() -> anyhow::Result<()> {
h3_cells,
poi_data,
poi_grid,
place_data,
postcode_data,
feature_name_to_index,
min_keys,
@ -265,6 +290,9 @@ async fn main() -> anyhow::Result<()> {
ollama_model: cli.ollama_model,
r5_url: cli.r5_url,
token_cache,
ai_filters_schema,
ai_filters_system_prompt,
google_maps_api_key: cli.google_maps_api_key,
});
let cors = CorsLayer::new()
@ -286,8 +314,11 @@ async fn main() -> anyhow::Result<()> {
let state_pb = state.clone();
let state_postcode_stats = state.clone();
let state_area_summary = state.clone();
let state_places = state.clone();
let state_shorten = state.clone();
let state_short_url = state.clone();
let state_ai_filters = state.clone();
let state_streetview = state.clone();
let api = Router::new()
.route(
@ -314,6 +345,10 @@ async fn main() -> anyhow::Result<()> {
"/api/poi-categories",
get(move || routes::get_poi_categories(state_poi_categories.clone())),
)
.route(
"/api/places",
get(move |query| routes::get_places(state_places.clone(), query)),
)
.route(
"/api/hexagon-properties",
get(move |query| {
@ -345,6 +380,14 @@ async fn main() -> anyhow::Result<()> {
"/api/shorten",
post(move |body| routes::post_shorten(state_shorten.clone(), body)),
)
.route(
"/api/ai-filters",
post(move |body| routes::post_ai_filters(state_ai_filters.clone(), body)),
)
.route(
"/api/streetview",
get(move |query| routes::get_streetview(state_streetview.clone(), query)),
)
.route(
"/s/{code}",
get(move |path| routes::get_short_url(state_short_url.clone(), path)),
@ -364,6 +407,7 @@ async fn main() -> anyhow::Result<()> {
routes::get_style(axum::extract::State(reader_style.clone()), headers, query)
}),
)
.route("/health", get(|| async { "ok" }))
.route(
"/metrics",
get(move || metrics::metrics_handler(metrics_handle.clone())),
@ -373,10 +417,9 @@ async fn main() -> anyhow::Result<()> {
any(move |req| routes::proxy_to_pocketbase(state_pb.clone(), req)),
);
let app = if frontend_dist.exists() {
let app = if let Some(ref dist) = frontend_dist {
api.fallback_service(
ServeDir::new(&frontend_dist)
.not_found_service(ServeFile::new(frontend_dist.join("index.html"))),
ServeDir::new(dist).not_found_service(ServeFile::new(dist.join("index.html"))),
)
} else {
api

View file

@ -38,11 +38,8 @@ pub fn cell_for_row(
if !need_parent || max_cell == 0 {
return max_cell;
}
h3o::CellIndex::try_from(max_cell)
.ok()
.and_then(|ci| ci.parent(h3_res))
.map(u64::from)
.unwrap_or(0)
let cell = h3o::CellIndex::try_from(max_cell).expect("precomputed H3 cell must be valid");
u64::from(cell.parent(h3_res).expect("parent resolution must be valid for precomputed cell"))
}
/// Whether the given resolution requires computing a parent from precomputed cells.

235
server-rs/src/pocketbase.rs Normal file
View file

@ -0,0 +1,235 @@
use reqwest::Client;
use serde::{Deserialize, Serialize};
use tracing::info;
#[derive(Deserialize)]
struct AuthResponse {
token: String,
}
#[derive(Deserialize)]
struct CollectionList {
items: Vec<CollectionItem>,
}
#[derive(Deserialize)]
struct CollectionItem {
name: String,
}
#[derive(Serialize)]
struct CreateCollection {
name: String,
r#type: String,
fields: Vec<Field>,
}
#[derive(Serialize)]
#[serde(rename_all = "camelCase")]
struct Field {
name: String,
r#type: String,
#[serde(skip_serializing_if = "Option::is_none")]
required: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
max_select: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
collection_id: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
max_size: Option<u64>,
#[serde(skip_serializing_if = "Option::is_none")]
mime_types: Option<Vec<String>>,
}
impl Field {
fn text(name: &str, required: bool) -> Self {
Self {
name: name.to_string(),
r#type: "text".to_string(),
required: Some(required),
max_select: None,
collection_id: None,
max_size: None,
mime_types: None,
}
}
fn file(name: &str, mime_types: Vec<&str>) -> Self {
Self {
name: name.to_string(),
r#type: "file".to_string(),
required: Some(false),
max_select: Some(1),
collection_id: None,
max_size: Some(10 * 1024 * 1024), // 10 MB
mime_types: Some(mime_types.into_iter().map(String::from).collect()),
}
}
fn relation(name: &str, collection_id: &str) -> Self {
Self {
name: name.to_string(),
r#type: "relation".to_string(),
required: Some(true),
max_select: Some(1),
collection_id: Some(collection_id.to_string()),
max_size: None,
mime_types: None,
}
}
}
async fn auth_superuser(
client: &Client,
base_url: &str,
email: &str,
password: &str,
) -> anyhow::Result<String> {
let url = format!("{base_url}/api/collections/_superusers/auth-with-password");
let resp = client
.post(&url)
.json(&serde_json::json!({
"identity": email,
"password": password,
}))
.send()
.await?;
if !resp.status().is_success() {
let status = resp.status();
let text = resp.text().await.unwrap_or_default();
anyhow::bail!("PocketBase superuser auth failed ({status}): {text}");
}
let body: AuthResponse = resp.json().await?;
Ok(body.token)
}
async fn list_collections(
client: &Client,
base_url: &str,
token: &str,
) -> anyhow::Result<Vec<String>> {
let url = format!("{base_url}/api/collections?perPage=200");
let resp = client
.get(&url)
.header("Authorization", format!("Bearer {token}"))
.send()
.await?;
if !resp.status().is_success() {
let status = resp.status();
let text = resp.text().await.unwrap_or_default();
anyhow::bail!("Failed to list PocketBase collections ({status}): {text}");
}
let body: CollectionList = resp.json().await?;
Ok(body.items.into_iter().map(|c| c.name).collect())
}
async fn create_collection(
client: &Client,
base_url: &str,
token: &str,
collection: CreateCollection,
) -> anyhow::Result<()> {
let name = collection.name.clone();
let resp = client
.post(&format!("{base_url}/api/collections"))
.header("Authorization", format!("Bearer {token}"))
.json(&collection)
.send()
.await?;
if !resp.status().is_success() {
let status = resp.status();
let text = resp.text().await.unwrap_or_default();
anyhow::bail!("Failed to create collection '{name}' ({status}): {text}");
}
info!("Created PocketBase collection: {name}");
Ok(())
}
/// Look up the internal ID of the `users` auth collection.
async fn find_users_collection_id(
client: &Client,
base_url: &str,
token: &str,
) -> anyhow::Result<String> {
let url = format!("{base_url}/api/collections/users");
let resp = client
.get(&url)
.header("Authorization", format!("Bearer {token}"))
.send()
.await?;
if !resp.status().is_success() {
let status = resp.status();
let text = resp.text().await.unwrap_or_default();
anyhow::bail!("Failed to fetch users collection ({status}): {text}");
}
let body: serde_json::Value = resp.json().await?;
let id = body["id"]
.as_str()
.ok_or_else(|| anyhow::anyhow!("users collection has no id field"))?;
Ok(id.to_string())
}
/// Ensure the `saved_searches` and `short_urls` collections exist in PocketBase.
/// Authenticates as superuser, checks existing collections, and creates any that are missing.
pub async fn ensure_collections(
client: &Client,
base_url: &str,
admin_email: &str,
admin_password: &str,
) -> anyhow::Result<()> {
let base_url = base_url.trim_end_matches('/');
let token = auth_superuser(client, base_url, admin_email, admin_password).await?;
let existing = list_collections(client, base_url, &token).await?;
if !existing.iter().any(|n| n == "saved_searches") {
let users_id = find_users_collection_id(client, base_url, &token).await?;
create_collection(
client,
base_url,
&token,
CreateCollection {
name: "saved_searches".to_string(),
r#type: "base".to_string(),
fields: vec![
Field::relation("user", &users_id),
Field::text("name", true),
Field::text("params", true),
Field::file("screenshot", vec!["image/png", "image/jpeg", "image/webp"]),
],
},
)
.await?;
} else {
info!("PocketBase collection 'saved_searches' already exists");
}
if !existing.iter().any(|n| n == "short_urls") {
create_collection(
client,
base_url,
&token,
CreateCollection {
name: "short_urls".to_string(),
r#type: "base".to_string(),
fields: vec![
Field::text("code", true),
Field::text("params", true),
],
},
)
.await?;
} else {
info!("PocketBase collection 'short_urls' already exists");
}
Ok(())
}

View file

@ -14,10 +14,11 @@ pub(crate) mod properties;
mod screenshot;
mod shorten;
mod stats;
mod streetview;
mod tiles;
pub(crate) mod travel_time;
pub use ai_filters::{build_feature_prompt, build_ollama_schema, post_ai_filters};
pub use ai_filters::{build_ollama_schema, build_system_prompt, post_ai_filters};
pub use area_summary::post_area_summary;
pub use export::get_export;
pub use features::{build_features_response, get_features, FeatureInfo, FeaturesResponse};
@ -32,4 +33,5 @@ pub use postcodes::{get_postcode_lookup, get_postcodes};
pub use properties::get_hexagon_properties;
pub use screenshot::get_screenshot;
pub use shorten::{get_short_url, post_shorten};
pub use streetview::get_streetview;
pub use tiles::{get_style, get_tile, init_tile_reader};

View file

@ -0,0 +1,334 @@
use std::sync::Arc;
use axum::http::StatusCode;
use axum::response::Json;
use serde::{Deserialize, Serialize};
use serde_json::{json, Value};
use tracing::{info, warn};
use crate::consts::{AI_FILTERS_MAX_TOKENS, AI_FILTERS_TEMPERATURE};
use crate::routes::{FeatureInfo, FeaturesResponse};
use crate::state::AppState;
use crate::utils::{extract_ollama_content, ollama_chat, strip_think_blocks};
#[derive(Deserialize)]
pub struct AiFiltersRequest {
query: String,
}
#[derive(Serialize)]
pub struct AiFiltersResponse {
filters: Value,
/// What the LLM couldn't map to existing filters (empty if everything matched)
#[serde(skip_serializing_if = "String::is_empty")]
notes: String,
}
/// Build a JSON schema for Ollama structured output.
///
/// Uses two arrays (`numeric_filters` and `enum_filters`) instead of one property
/// per feature, because Ollama converts JSON schema to GBNF grammar and a schema
/// with 50+ optional keys causes a combinatorial explosion that crashes the parser.
/// Array-based schema keeps the grammar small and constant-size.
pub fn build_ollama_schema(_features: &FeaturesResponse) -> Value {
json!({
"type": "object",
"properties": {
"numeric_filters": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"min": { "type": "number" },
"max": { "type": "number" }
},
"required": ["name"]
}
},
"enum_filters": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"values": { "type": "array", "items": { "type": "string" } }
},
"required": ["name", "values"]
}
},
"notes": {
"type": "string"
}
}
})
}
/// Build the complete system prompt for AI filters.
///
/// Contains: role instructions, feature catalogue, few-shot examples, output rules.
/// Precomputed at startup and cached in AppState.
pub fn build_system_prompt(features: &FeaturesResponse) -> String {
let mut parts = Vec::new();
// Role and task description
parts.push(
"You are a UK property search assistant. \
The user describes their ideal property or area in natural language. \
Translate their description into filter settings using ONLY the features listed below.\n\
\n\
Rules:\n\
- ONLY set filters the user explicitly mentioned or clearly implied.\n\
- Leave out any filter the user did not mention. Empty arrays are fine.\n\
- For numeric filters, omit \"min\" to leave the lower bound open, \
omit \"max\" to leave the upper bound open.\n\
- Use EXACT feature names from the list spelling, capitalisation, and punctuation must match.\n\
- \"cheap\" / \"affordable\" = lower price range. \"expensive\" = higher price range.\n\
- \"low crime\" / \"safe\" = low values on crime features. \
\"quiet\" = low Noise (dB). \"green\" / \"near parks\" = high Number of parks within 2km.\n\
- When the user says a number like \"under 400k\", interpret it as 400000.\n\
- When the user says \"3 bed\" or \"3 bedroom\", use Number of bedrooms & living rooms \
(note: this counts bedrooms + living rooms combined, so 3 bed ~ min 4).\n\
- If the user mentions something that has no matching filter, put it in \"notes\" \
as a short phrase (e.g. \"No filter for: garden, sea view\"). \
If everything was matched, set \"notes\" to an empty string."
.to_string(),
);
// Feature catalogue
parts.push("\n--- AVAILABLE FEATURES ---\n".to_string());
for group in &features.groups {
parts.push(format!("## {}", group.name));
for feature in &group.features {
match feature {
FeatureInfo::Numeric {
name,
min,
max,
description,
prefix,
suffix,
..
} => {
parts.push(format!(
"- \"{}\" (numeric, {}{:.0}{} to {}{:.0}{}): {}",
name, prefix, min, suffix, prefix, max, suffix, description
));
}
FeatureInfo::Enum {
name,
values,
description,
..
} => {
parts.push(format!(
"- \"{}\" (enum, values: [{}]): {}",
name,
values
.iter()
.map(|val| format!("\"{}\"", val))
.collect::<Vec<_>>()
.join(", "),
description
));
}
}
}
}
// Few-shot examples
parts.push("\n--- EXAMPLES ---\n".to_string());
parts.push(
"User: \"cheap freehold house under 400k\"\n\
Output: {\"numeric_filters\": [{\"name\": \"Last known price\", \"max\": 400000}], \
\"enum_filters\": [{\"name\": \"Leashold/Freehold\", \"values\": [\"Freehold\"]}, \
{\"name\": \"Property type\", \"values\": [\"Detached\", \"Semi-Detached\", \"Terraced\"]}], \
\"notes\": \"\"}"
.to_string(),
);
parts.push(
"\nUser: \"safe quiet area with good schools and parks\"\n\
Output: {\"numeric_filters\": [\
{\"name\": \"Violence and sexual offences (avg/yr)\", \"max\": 20}, \
{\"name\": \"Burglary (avg/yr)\", \"max\": 10}, \
{\"name\": \"Noise (dB)\", \"max\": 55}, \
{\"name\": \"Good+ primary schools within 5km\", \"min\": 5}, \
{\"name\": \"Good+ secondary schools within 5km\", \"min\": 2}, \
{\"name\": \"Number of parks within 2km\", \"min\": 3}], \
\"enum_filters\": [], \"notes\": \"\"}"
.to_string(),
);
parts.push(
"\nUser: \"3 bed flat under 300k with fast broadband near the beach\"\n\
Output: {\"numeric_filters\": [\
{\"name\": \"Last known price\", \"max\": 300000}, \
{\"name\": \"Number of bedrooms & living rooms\", \"min\": 4}, \
{\"name\": \"Max available download speed (Mbps)\", \"min\": 100}], \
\"enum_filters\": [{\"name\": \"Property type\", \"values\": [\"Flat\"]}], \
\"notes\": \"No filter for: beach proximity\"}"
.to_string(),
);
parts.push(
"\nUser: \"large family home with a garden near restaurants\"\n\
Output: {\"numeric_filters\": [\
{\"name\": \"Total floor area (sqm)\", \"min\": 100}, \
{\"name\": \"Number of bedrooms & living rooms\", \"min\": 5}, \
{\"name\": \"Number of restaurants within 2km\", \"min\": 10}], \
\"enum_filters\": [{\"name\": \"Property type\", \
\"values\": [\"Detached\", \"Semi-Detached\"]}], \
\"notes\": \"No filter for: garden\"}"
.to_string(),
);
// Output format reminder
parts.push(
"\n--- OUTPUT FORMAT ---\n\
{\"numeric_filters\": [...], \"enum_filters\": [...], \"notes\": \"...\"}\n\
Respond with ONLY the JSON object. No explanation."
.to_string(),
);
parts.join("\n")
}
pub async fn post_ai_filters(
state: Arc<AppState>,
Json(req): Json<AiFiltersRequest>,
) -> Result<Json<AiFiltersResponse>, (StatusCode, String)> {
info!(query = %req.query, "POST /api/ai-filters");
// Use Ollama native API with structured output
let url = format!("{}/api/chat", state.ollama_url);
let body = json!({
"model": state.ollama_model,
"messages": [
{ "role": "system", "content": state.ai_filters_system_prompt },
{ "role": "user", "content": req.query }
],
"stream": false,
"format": state.ai_filters_schema,
"options": {
"temperature": AI_FILTERS_TEMPERATURE,
"num_predict": AI_FILTERS_MAX_TOKENS,
}
});
let json_resp = ollama_chat(&state.http_client, &url, &body).await?;
let content = extract_ollama_content(&json_resp)?;
let content = strip_think_blocks(content);
let content = content.trim();
let raw: Value = serde_json::from_str(content).map_err(|err| {
warn!(error = %err, content = %content, "Failed to parse LLM JSON output");
(
StatusCode::BAD_GATEWAY,
format!("Failed to parse LLM output as JSON: {}", err),
)
})?;
// Validate and convert to FeatureFilters format
let filters = validate_and_convert(&raw, &state.features_response);
let notes = raw
.get("notes")
.and_then(|val| val.as_str())
.unwrap_or("")
.to_string();
Ok(Json(AiFiltersResponse { filters, notes }))
}
/// Validate LLM output against feature metadata and convert to FeatureFilters format.
///
/// Input format (array-based, grammar-friendly):
/// ```json
/// {
/// "numeric_filters": [{"name": "Last known price", "min": 0, "max": 300000}],
/// "enum_filters": [{"name": "Leashold/Freehold", "values": ["Freehold"]}]
/// }
/// ```
///
/// Output format (FeatureFilters):
/// ```json
/// { "Last known price": [0, 300000], "Leashold/Freehold": ["Freehold"] }
/// ```
fn validate_and_convert(raw: &Value, features: &FeaturesResponse) -> Value {
let mut result = serde_json::Map::new();
// Build lookup maps from feature metadata
let mut numeric_features: rustc_hash::FxHashMap<&str, (f32, f32)> =
rustc_hash::FxHashMap::default();
let mut enum_features: rustc_hash::FxHashMap<&str, &[String]> =
rustc_hash::FxHashMap::default();
for group in &features.groups {
for feature in &group.features {
match feature {
FeatureInfo::Numeric { name, min, max, .. } => {
numeric_features.insert(name, (*min, *max));
}
FeatureInfo::Enum { name, values, .. } => {
enum_features.insert(name, values);
}
}
}
}
// Process numeric filters
if let Some(arr) = raw.get("numeric_filters").and_then(|val| val.as_array()) {
for item in arr {
let name = match item.get("name").and_then(|val| val.as_str()) {
Some(name) => name,
None => continue,
};
let (feat_min, feat_max) = match numeric_features.get(name) {
Some(range) => *range,
None => continue,
};
let filter_min = item
.get("min")
.and_then(|val| val.as_f64())
.map(|num| num.max(feat_min as f64).min(feat_max as f64) as f32)
.unwrap_or(feat_min);
let filter_max = item
.get("max")
.and_then(|val| val.as_f64())
.map(|num| num.max(feat_min as f64).min(feat_max as f64) as f32)
.unwrap_or(feat_max);
// Only include if range is narrower than full range
if filter_min > feat_min || filter_max < feat_max {
result.insert(name.to_string(), json!([filter_min, filter_max]));
}
}
}
// Process enum filters
if let Some(arr) = raw.get("enum_filters").and_then(|val| val.as_array()) {
for item in arr {
let name = match item.get("name").and_then(|val| val.as_str()) {
Some(name) => name,
None => continue,
};
let valid_values = match enum_features.get(name) {
Some(values) => *values,
None => continue,
};
if let Some(selected) = item.get("values").and_then(|val| val.as_array()) {
let valid: Vec<&str> = selected
.iter()
.filter_map(|item| item.as_str())
.filter(|str_val| valid_values.iter().any(|known| known == str_val))
.collect();
if !valid.is_empty() && valid.len() < valid_values.len() {
result.insert(name.to_string(), json!(valid));
}
}
}
}
Value::Object(result)
}

View file

@ -3,12 +3,13 @@ use std::sync::Arc;
use axum::http::StatusCode;
use axum::response::Json;
use serde::{Deserialize, Serialize};
use tracing::{info, warn};
use tracing::info;
use crate::consts::{
AREA_SUMMARY_MAX_TOKENS, AREA_SUMMARY_SYSTEM_PROMPT, AREA_SUMMARY_TEMPERATURE,
};
use crate::state::AppState;
use crate::utils::{extract_openai_content, ollama_chat, strip_think_blocks};
#[derive(Deserialize)]
pub struct NumericStat {
@ -89,22 +90,6 @@ fn build_prompt(req: &AreaSummaryRequest) -> String {
result
}
/// Strip `<think>...</think>` blocks from model output
pub(crate) fn strip_think_blocks(text: &str) -> String {
let mut result = String::new();
let mut remaining = text;
while let Some(start) = remaining.find("<think>") {
result.push_str(&remaining[..start]);
if let Some(end) = remaining[start..].find("</think>") {
remaining = &remaining[start + end + 8..];
} else {
return result;
}
}
result.push_str(remaining);
result
}
pub async fn post_area_summary(
state: Arc<AppState>,
Json(req): Json<AreaSummaryRequest>,
@ -124,45 +109,8 @@ pub async fn post_area_summary(
"max_tokens": AREA_SUMMARY_MAX_TOKENS,
});
let response = state
.http_client
.post(&url)
.json(&body)
.send()
.await
.map_err(|err| {
warn!(error = %err, "Failed to connect to Ollama");
(
StatusCode::BAD_GATEWAY,
format!("Failed to connect to Ollama: {}", err),
)
})?;
if !response.status().is_success() {
let status = response.status();
let body_text = response.text().await.unwrap_or_default();
warn!(status = %status, body = %body_text, "Ollama returned error");
return Err((
StatusCode::BAD_GATEWAY,
format!("Ollama error {}: {}", status, body_text),
));
}
let json: serde_json::Value = response.json().await.map_err(|err| {
warn!(error = %err, "Failed to parse Ollama response");
(
StatusCode::BAD_GATEWAY,
format!("Failed to parse Ollama response: {}", err),
)
})?;
let content = json
.get("choices")
.and_then(|ch| ch.get(0))
.and_then(|ch| ch.get("message"))
.and_then(|msg| msg.get("content"))
.and_then(|ct| ct.as_str())
.unwrap_or("");
let json = ollama_chat(&state.http_client, &url, &body).await?;
let content = extract_openai_content(&json)?;
let summary = strip_think_blocks(content).trim().to_string();

View file

@ -530,13 +530,19 @@ pub async fn get_export(
}
// Column widths
sheet.set_column_width(0, 12).ok();
sheet.set_column_width(1, 12).ok();
sheet
.set_column_width(0, 12)
.map_err(|e| format!("Failed to set column width: {e}"))?;
sheet
.set_column_width(1, 12)
.map_err(|e| format!("Failed to set column width: {e}"))?;
for col_offset in 0..feat_indices.len() {
let col = (col_offset + 2) as u16;
let feat_name = &feature_names[feat_indices[col_offset]];
let width = (feat_name.len() as f64 * 1.1).clamp(10.0, 30.0);
sheet.set_column_width(col, width).ok();
sheet
.set_column_width(col, width)
.map_err(|e| format!("Failed to set column width: {e}"))?;
}
}

View file

@ -35,6 +35,8 @@ pub enum FeatureInfo {
suffix: &'static str,
#[serde(skip_serializing_if = "is_false")]
raw: bool,
#[serde(skip_serializing_if = "is_false")]
absolute: bool,
},
#[serde(rename = "enum")]
Enum {
@ -99,6 +101,7 @@ pub fn build_features_response(data: &PropertyData) -> FeaturesResponse {
prefix: feature_config.prefix,
suffix: feature_config.suffix,
raw: feature_config.raw,
absolute: feature_config.absolute,
});
}
}

View file

@ -6,7 +6,7 @@ use axum::response::Json;
use rustc_hash::FxHashMap;
use serde::{Deserialize, Serialize};
use serde_json::{Map, Value};
use tracing::{info, warn};
use tracing::info;
use crate::aggregation::Aggregator;
use crate::consts::MAX_CELLS_PER_REQUEST;
@ -33,10 +33,55 @@ pub struct HexagonParams {
/// When present (even if empty), only listed features are aggregated and written.
/// When absent, all features are included (backward compatible).
fields: Option<String>,
/// Destination point as "lat,lon" for real-time travel time calculation via R5.
destination: Option<String>,
/// Transport mode for travel time: "transit" (default), "car", or "bicycle".
mode: Option<String>,
/// Pipe-separated travel time entries: `lat,lon,mode|lat,lon,mode`
/// Each entry requests travel time from hex centroids to that destination via the given mode.
travel: Option<String>,
}
struct TravelEntry {
lat: f64,
lon: f64,
mode: String,
}
const VALID_MODES: &[&str] = &["car", "bicycle", "walking", "transit"];
/// Parse `travel` param into a list of travel entries.
/// Format: `lat,lon,mode|lat,lon,mode`
fn parse_travel_entries(s: &str) -> Result<Vec<TravelEntry>, String> {
let mut entries = Vec::new();
let mut seen_modes = Vec::new();
for segment in s.split('|') {
let parts: Vec<&str> = segment.split(',').collect();
if parts.len() != 3 {
return Err(format!(
"each travel entry must be 'lat,lon,mode', got '{}'",
segment
));
}
let lat: f64 = parts[0]
.trim()
.parse()
.map_err(|_| format!("invalid travel latitude in '{}'", segment))?;
let lon: f64 = parts[1]
.trim()
.parse()
.map_err(|_| format!("invalid travel longitude in '{}'", segment))?;
let mode = parts[2].trim().to_string();
if !VALID_MODES.contains(&mode.as_str()) {
return Err(format!(
"invalid travel mode '{}', must be one of: {}",
mode,
VALID_MODES.join(", ")
));
}
if seen_modes.contains(&mode) {
return Err(format!("duplicate travel mode '{}'", mode));
}
seen_modes.push(mode.clone());
entries.push(TravelEntry { lat, lon, mode });
}
Ok(entries)
}
/// Build feature maps from aggregated cell data, filtering to only cells that intersect the query bounds.
@ -104,23 +149,6 @@ fn build_feature_maps(
features
}
/// Parse "lat,lon" string into (lat, lon) tuple.
fn parse_destination(s: &str) -> Result<[f64; 2], String> {
let parts: Vec<&str> = s.split(',').collect();
if parts.len() != 2 {
return Err("destination must be 'lat,lon'".into());
}
let lat: f64 = parts[0]
.trim()
.parse()
.map_err(|_| "invalid destination latitude")?;
let lon: f64 = parts[1]
.trim()
.parse()
.map_err(|_| "invalid destination longitude")?;
Ok([lat, lon])
}
pub async fn get_hexagons(
state: Arc<AppState>,
Query(params): Query<HexagonParams>,
@ -141,16 +169,17 @@ pub async fn get_hexagons(
let field_indices = parse_field_indices(params.fields.as_deref(), &state.feature_name_to_index);
// Parse destination for travel time (before moving into blocking closure)
let destination = params
.destination
// Parse travel entries
let travel_entries = params
.travel
.as_deref()
.map(parse_destination)
.filter(|s| !s.is_empty())
.map(parse_travel_entries)
.transpose()
.map_err(|e| (StatusCode::BAD_REQUEST, e))?;
let mode = params.mode.clone().unwrap_or_else(|| "car".into());
.map_err(|e| (StatusCode::BAD_REQUEST, e))?
.unwrap_or_default();
// Capture what we need for the R5 call before moving state into spawn_blocking
// Capture what we need for the R5 calls before moving state into spawn_blocking
let r5_url = state.r5_url.clone();
let http_client = state.http_client.clone();
@ -250,14 +279,12 @@ pub async fn get_hexagons(
.map_err(|error| (StatusCode::INTERNAL_SERVER_ERROR, error.to_string()))?
.map_err(|error| (StatusCode::INTERNAL_SERVER_ERROR, error))?;
// If a destination was requested and R5 is configured, fetch travel times.
if let Some(dest) = destination {
if r5_url.is_empty() {
return Err((
StatusCode::SERVICE_UNAVAILABLE,
"Travel time queries require routing service (R5_URL not configured)".into(),
));
}
// If travel entries were requested and R5 is configured, fetch travel times concurrently.
if !travel_entries.is_empty() {
let url = r5_url.as_deref().ok_or((
StatusCode::SERVICE_UNAVAILABLE,
"Travel time queries require routing service (R5_URL not configured)".into(),
))?;
// Collect hex centroids
let origins: Vec<[f64; 2]> = response
@ -267,39 +294,56 @@ pub async fn get_hexagons(
let lat = f
.get("lat")
.and_then(|v| v.as_f64())
.unwrap_or(0.0);
.expect("lat must be present in feature map");
let lon = f
.get("lon")
.and_then(|v| v.as_f64())
.unwrap_or(0.0);
.expect("lon must be present in feature map");
[lat, lon]
})
.collect();
match fetch_travel_times(&http_client, &r5_url, origins, dest, &mode).await {
Ok(travel_times) => {
for (feature, tt) in response.features.iter_mut().zip(travel_times) {
match tt {
Some(minutes) => {
if let Some(num) = serde_json::Number::from_f64(minutes) {
feature.insert("travel_time".into(), Value::Number(num));
}
}
None => {
feature.insert("travel_time".into(), Value::Null);
// Fire concurrent R5 calls for each travel entry
let mut handles = Vec::with_capacity(travel_entries.len());
for entry in &travel_entries {
let client = http_client.clone();
let url = url.to_string();
let origins = origins.clone();
let dest = [entry.lat, entry.lon];
let mode = entry.mode.clone();
handles.push(tokio::spawn(async move {
fetch_travel_times(&client, &url, origins, dest, &mode).await
}));
}
let mut results = Vec::with_capacity(handles.len());
for handle in handles {
results.push(handle.await);
}
for (entry, result) in travel_entries.iter().zip(results) {
let travel_times = result
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?
.map_err(|err| (StatusCode::BAD_GATEWAY, err))?;
let field_name = format!("travel_time_{}", entry.mode);
for (feature, tt) in response.features.iter_mut().zip(&travel_times) {
match tt {
Some(minutes) => {
if let Some(num) = serde_json::Number::from_f64(*minutes) {
feature.insert(field_name.clone(), Value::Number(num));
}
}
None => {
feature.insert(field_name.clone(), Value::Null);
}
}
info!(
hexagons = response.features.len(),
destination = format_args!("{},{}", dest[0], dest[1]),
mode = mode,
"Travel times merged"
);
}
Err(err) => {
warn!("Travel time query failed, returning hexagons without travel_time: {}", err);
}
info!(
hexagons = response.features.len(),
destination = format_args!("{},{}", entry.lat, entry.lon),
mode = entry.mode,
"Travel times merged"
);
}
}

View file

@ -53,11 +53,14 @@ pub async fn proxy_to_pocketbase(state: Arc<AppState>, req: Request) -> impl Int
if name == "transfer-encoding" {
continue;
}
response = response.header(
HeaderName::from_bytes(name.as_ref())
.unwrap_or(HeaderName::from_static("x-invalid")),
value.clone(),
);
match HeaderName::from_bytes(name.as_ref()) {
Ok(header_name) => {
response = response.header(header_name, value.clone());
}
Err(err) => {
warn!(header = ?name, error = %err, "Skipping unparseable upstream header");
}
}
}
match upstream.bytes().await {

View file

@ -14,6 +14,8 @@ pub struct PlaceResult {
place_type: String,
lat: f32,
lon: f32,
#[serde(skip_serializing_if = "Option::is_none")]
city: Option<String>,
}
#[derive(Serialize)]
@ -24,7 +26,7 @@ pub struct PlacesResponse {
#[derive(Deserialize)]
#[allow(clippy::min_ident_chars)]
pub struct PlacesParams {
q: Option<String>,
q: String,
limit: Option<usize>,
}
@ -32,10 +34,11 @@ pub async fn get_places(
state: Arc<AppState>,
Query(params): Query<PlacesParams>,
) -> Result<Json<PlacesResponse>, (StatusCode, String)> {
let query = params
.q
.filter(|val| !val.is_empty())
.ok_or((StatusCode::BAD_REQUEST, "Missing 'q' parameter".to_string()))?;
let query = if params.q.is_empty() {
return Err((StatusCode::BAD_REQUEST, "'q' must not be empty".into()));
} else {
params.q
};
let limit = params.limit.unwrap_or(7).min(20);
@ -45,26 +48,37 @@ pub async fn get_places(
let pd = &state.place_data;
// Linear scan — ~50-100k rows, <1ms
let mut matches: Vec<(usize, bool, u8, usize)> = pd
// Tuple: (row_idx, is_exact, is_prefix, type_rank, population, name_len)
let mut matches: Vec<(usize, bool, bool, u8, u32, usize)> = pd
.name_lower
.iter()
.enumerate()
.filter_map(|(idx, name)| {
if name.contains(&query_lower) {
let is_exact = name.len() == query_lower.len();
let is_prefix = name.starts_with(&query_lower);
Some((idx, is_prefix, pd.type_rank[idx], pd.name[idx].len()))
Some((
idx,
is_exact,
is_prefix,
pd.type_rank[idx],
pd.population[idx],
pd.name[idx].len(),
))
} else {
None
}
})
.collect();
// Sort: prefix first, then by type rank (cities before hamlets), then shorter names first
// Sort: exact first, then prefix, then type rank asc, then population desc, then name length asc
matches.sort_unstable_by(|lhs, rhs| {
rhs.1
.cmp(&lhs.1)
.then(lhs.2.cmp(&rhs.2))
.then(rhs.2.cmp(&lhs.2))
.then(lhs.3.cmp(&rhs.3))
.then(rhs.4.cmp(&lhs.4))
.then(lhs.5.cmp(&rhs.5))
});
matches.truncate(limit);
@ -76,6 +90,7 @@ pub async fn get_places(
place_type: pd.place_type.get(idx).to_string(),
lat: pd.lat[idx],
lon: pd.lon[idx],
city: pd.city[idx].clone(),
})
.collect();

View file

@ -146,6 +146,9 @@ pub async fn get_hexagon_properties(
}
});
// Sort so properties with addresses come first, unknown addresses last
matching_rows.sort_unstable_by_key(|&row| state.data.address(row).trim().is_empty());
let total = matching_rows.len();
let limit = params
.limit

View file

@ -0,0 +1,84 @@
use std::sync::Arc;
use axum::http::StatusCode;
use axum::response::{IntoResponse, Json};
use serde::{Deserialize, Serialize};
use tracing::warn;
use crate::state::AppState;
#[derive(Deserialize)]
pub struct StreetViewQuery {
lat: f64,
lon: f64,
}
#[derive(Deserialize)]
struct GoogleMetadataResponse {
status: String,
#[serde(default)]
pano_id: String,
}
#[derive(Serialize)]
struct StreetViewResponse {
status: String,
#[serde(skip_serializing_if = "Option::is_none")]
pano_id: Option<String>,
}
pub async fn get_streetview(
state: Arc<AppState>,
query: axum::extract::Query<StreetViewQuery>,
) -> impl IntoResponse {
let url = format!(
"https://maps.googleapis.com/maps/api/streetview/metadata?location={},{}&radius=1000&source=outdoor&key={}",
query.lat, query.lon, state.google_maps_api_key
);
let resp = match state.http_client.get(&url).send().await {
Ok(r) => r,
Err(e) => {
warn!("Street View metadata request failed: {e}");
return (
StatusCode::BAD_GATEWAY,
Json(StreetViewResponse {
status: "ERROR".to_string(),
pano_id: None,
}),
);
}
};
let meta: GoogleMetadataResponse = match resp.json().await {
Ok(m) => m,
Err(e) => {
warn!("Failed to parse Street View metadata: {e}");
return (
StatusCode::BAD_GATEWAY,
Json(StreetViewResponse {
status: "ERROR".to_string(),
pano_id: None,
}),
);
}
};
if meta.status == "OK" {
(
StatusCode::OK,
Json(StreetViewResponse {
status: "OK".to_string(),
pano_id: Some(meta.pano_id),
}),
)
} else {
(
StatusCode::OK,
Json(StreetViewResponse {
status: meta.status,
pano_id: None,
}),
)
}
}

View file

@ -35,7 +35,6 @@ pub async fn get_tile(
#[derive(Deserialize)]
pub struct StyleParams {
#[serde(default)]
theme: Option<String>,
}
@ -43,26 +42,26 @@ pub async fn get_style(
State(reader): State<Arc<TileReader>>,
headers: HeaderMap,
Query(params): Query<StyleParams>,
) -> Response {
) -> Result<Response, (StatusCode, String)> {
let is_dark = params.theme.as_deref() == Some("dark");
// Metadata is returned as a JSON string
let metadata_str = match reader.get_metadata().await {
Ok(meta) => meta,
Err(err) => {
warn!(error = %err, "Failed to get PMTiles metadata");
return StatusCode::INTERNAL_SERVER_ERROR.into_response();
}
};
let metadata_str = reader.get_metadata().await.map_err(|err| {
warn!(error = %err, "Failed to get PMTiles metadata");
(
StatusCode::INTERNAL_SERVER_ERROR,
format!("Failed to get PMTiles metadata: {err}"),
)
})?;
// Parse the JSON string
let metadata: serde_json::Value = match serde_json::from_str(&metadata_str) {
Ok(val) => val,
Err(err) => {
warn!(error = %err, "Failed to parse PMTiles metadata JSON");
serde_json::Value::Object(serde_json::Map::new())
}
};
let metadata: serde_json::Value = serde_json::from_str(&metadata_str).map_err(|err| {
warn!(error = %err, "Failed to parse PMTiles metadata JSON");
(
StatusCode::INTERNAL_SERVER_ERROR,
format!("Failed to parse PMTiles metadata: {err}"),
)
})?;
// Extract tilestats for layer info if available
let layers: Vec<serde_json::Value> = metadata
@ -75,16 +74,19 @@ pub async fn get_style(
let host = headers
.get(header::HOST)
.and_then(|hv| hv.to_str().ok())
.unwrap_or("localhost:8001");
.ok_or((
StatusCode::BAD_REQUEST,
"Missing Host header".into(),
))?;
let tile_url = format!("http://{}/api/tiles/{{z}}/{{x}}/{{y}}", host);
let style = build_style(is_dark, &layers, &tile_url);
(
Ok((
StatusCode::OK,
[(header::CONTENT_TYPE, "application/json")],
serde_json::to_string(&style).unwrap(),
)
.into_response()
.into_response())
}
fn build_style(is_dark: bool, layers: &[serde_json::Value], tile_url: &str) -> serde_json::Value {

View file

@ -44,12 +44,14 @@ pub struct AppState {
pub ollama_url: String,
/// Ollama model name for area summaries (e.g. gemma3:12b)
pub ollama_model: String,
/// R5 routing service URL for all travel times (empty = disabled)
pub r5_url: String,
/// R5 routing service URL for all travel times (None = disabled)
pub r5_url: Option<String>,
/// Token validation cache (60s TTL)
pub token_cache: Arc<TokenCache>,
/// JSON schema for Ollama structured output in AI filters
pub ai_filters_schema: serde_json::Value,
/// Feature listing portion of the AI filters prompt
pub ai_filters_feature_prompt: String,
/// Complete system prompt for AI filters (features + examples + instructions)
pub ai_filters_system_prompt: String,
/// Google Maps API key for Street View metadata lookups
pub google_maps_api_key: String,
}

View file

@ -6,4 +6,4 @@ mod llm;
pub use grid_index::GridIndex;
pub use hash::{generate_priorities, splitmix64_hash};
pub use interned_column::InternedColumn;
pub use llm::strip_think_blocks;
pub use llm::{extract_ollama_content, extract_openai_content, ollama_chat, strip_think_blocks};

View file

@ -1,3 +1,75 @@
use axum::http::StatusCode;
use serde_json::Value;
use tracing::warn;
pub type LlmError = (StatusCode, String);
/// Send a chat request to Ollama and return the parsed JSON response.
///
/// Handles connection errors, non-success status codes, and JSON parse failures
/// uniformly as `BAD_GATEWAY` errors.
pub async fn ollama_chat(
client: &reqwest::Client,
url: &str,
body: &Value,
) -> Result<Value, LlmError> {
let response = client.post(url).json(body).send().await.map_err(|err| {
warn!(error = %err, "Failed to connect to Ollama");
(
StatusCode::BAD_GATEWAY,
format!("Failed to connect to Ollama: {}", err),
)
})?;
if !response.status().is_success() {
let status = response.status();
let body_text = response.text().await.unwrap_or_default();
warn!(status = %status, body = %body_text, "Ollama returned error");
return Err((
StatusCode::BAD_GATEWAY,
format!("Ollama error {}: {}", status, body_text),
));
}
response.json().await.map_err(|err| {
warn!(error = %err, "Failed to parse Ollama response");
(
StatusCode::BAD_GATEWAY,
format!("Failed to parse Ollama response: {}", err),
)
})
}
/// Extract content from OpenAI-compatible response (`choices[0].message.content`)
pub fn extract_openai_content(json: &Value) -> Result<&str, LlmError> {
json.get("choices")
.and_then(|ch| ch.get(0))
.and_then(|ch| ch.get("message"))
.and_then(|msg| msg.get("content"))
.and_then(|ct| ct.as_str())
.ok_or_else(|| {
warn!("Malformed OpenAI response: missing choices[0].message.content");
(
StatusCode::BAD_GATEWAY,
"Malformed LLM response: missing choices[0].message.content".into(),
)
})
}
/// Extract content from Ollama native response (`message.content`)
pub fn extract_ollama_content(json: &Value) -> Result<&str, LlmError> {
json.get("message")
.and_then(|msg| msg.get("content"))
.and_then(|ct| ct.as_str())
.ok_or_else(|| {
warn!("Malformed Ollama response: missing message.content");
(
StatusCode::BAD_GATEWAY,
"Malformed LLM response: missing message.content".into(),
)
})
}
/// Strip `<think>...</think>` blocks from model output
pub fn strip_think_blocks(text: &str) -> String {
let mut result = String::new();