Fun changes
Some checks failed
CI / Python (lint + test) (push) Failing after 3m38s
CI / Rust (lint + test) (push) Failing after 3m32s
CI / Frontend (lint + typecheck) (push) Failing after 4m12s
Build and publish Docker image / build-and-push (push) Failing after 4m48s

This commit is contained in:
Andras Schmelczer 2026-04-04 22:59:44 +01:00
parent cd778dd088
commit 349a6c1d53
60 changed files with 1260 additions and 2600 deletions

View file

@ -98,8 +98,8 @@ Rust + Axum. Loads parquet into memory at startup.
**API endpoints:**
- `GET /api/features` — Feature metadata with histograms and 2nd/98th percentiles
- `GET /api/hexagons?resolution=&bounds=&filters=&fields=` — H3 aggregates (min/max per feature per hex), AABB-filtered to bounds
- `GET /api/postcodes?bounds=&filters=&fields=` — Postcode polygon aggregates, AABB-filtered to bounds
- `GET /api/hexagons?resolution=&bounds=&filters=&fields=&enum_dist=` — H3 aggregates (min/max per feature per hex), AABB-filtered to bounds. Optional `enum_dist=FeatureName` adds `dist_FeatureName: [count_per_value...]` arrays for pie chart visualization.
- `GET /api/postcodes?bounds=&filters=&fields=&enum_dist=` — Postcode polygon aggregates, AABB-filtered to bounds. Same `enum_dist` support as hexagons.
- `GET /api/postcode/:postcode` — Single postcode lookup (centroid + polygon)
- `GET /api/hexagon-properties?h3=&resolution=&filters=&limit=&offset=` — Paginated properties within a hexagon
- `GET /api/postcode-properties?postcode=&filters=&limit=&offset=` — Paginated properties within a postcode
@ -110,7 +110,8 @@ Serves `frontend/dist/` as static fallback in production **only** when `--dist`
**Data representation (unified model):**
- All features (numeric and enum): row-major flat `Vec<f32>`, NaN = null
- Enum features: stored as f32 indices (0.0, 1.0, 2.0...) with `enum_values: FxHashMap<usize, Vec<String>>` mapping feature index → string values
- Enum features: stored as f32 indices (0.0, 1.0, 2.0...) with `enum_values: FxHashMap<usize, Vec<String>>` mapping feature index → string values. Raw u16 indices are used directly for distribution counting (no dequantization needed for enums).
- Enum distribution: `Aggregator` optionally tracks per-value counts via `EnumDist` struct (configured by `EnumDistConfig`). Emitted as `dist_FeatureName: [count_val0, count_val1, ...]` in hex/postcode responses when `enum_dist` param is set.
- String fields (address, postcode): interned/packed for memory efficiency
- All CLI args are required (no hidden defaults). Optional services use `Option<String>`: `r5_url` (travel time disabled when None), `pocketbase_admin_email`/`password` (collection auto-creation skipped when None). Required config like `gemini_model` and `public_url` must be explicitly provided via env or CLI.
@ -161,6 +162,7 @@ React 18 + TypeScript. deck.gl `H3HexagonLayer` over MapLibre GL. TailwindCSS. N
- `features.ts``groupFeaturesByCategory(features)` groups FeatureMeta[] by their `group` field.
- `format.ts``formatNumber(value, decimals)` for number formatting. `calculateHistogramMean(histogram)` for weighted mean calculation.
- `property-fields.ts``getNum(property, key)` for getting a single numeric property value. Takes exactly one key — no fallback names.
- `PieHexExtension.ts` — deck.gl `LayerExtension` that turns polygon fills into hexagonal pie charts. Injects GLSL that computes angle from fragment position to centroid, picks slice color from ENUM_PALETTE. See "deck.gl LayerExtension patterns" below.
When adding new UI, prefer using these shared components over inline implementations to maintain consistency.
@ -171,6 +173,21 @@ When adding new UI, prefer using these shared components over inline implementat
- Extract to `lib/`: Pure functions used across components (formatting, calculations, lookups)
- Keep inline: One-off UI specific to a single component
**deck.gl LayerExtension patterns (CRITICAL — hard-won knowledge):**
Creating custom `LayerExtension`s that add per-instance attributes to CompositeLayer sublayers (H3HexagonLayer, PolygonLayer, GeoJsonLayer) requires following the exact canonical pattern. Getting any part wrong silently fails (attributes read as zero).
1. **`static defaultProps` with `type: 'accessor'`** — This is what tells `LayerExtension.getSubLayerProps()` to wrap accessors via `getSubLayerAccessor()`, which unwraps `__source.object` to reach the original data item through CompositeLayer sublayer chains. Without this, accessors receive `undefined` or binary data objects instead of the original data.
2. **`stepMode: 'dynamic'`** instead of `addInstanced()` — Use `am.add({...})` with `stepMode: 'dynamic'`, not `am.addInstanced({...})`. Dynamic step mode handles per-instance counting automatically for variable-geometry layers like SolidPolygonLayer.
3. **`isEnabled(layer)` must guard all hooks** — Check in `getShaders()` and `initializeState()`. For polygon fills, use `layer.id.endsWith('-fill')` to skip PathLayer (stroke) sublayers.
4. **Change layer ID when extensions change** — deck.gl recycles layers with the same ID. If you conditionally add/remove extensions, use a different layer ID (e.g., `'h3-hexagons-pie'` vs `'h3-hexagons'`) to force full teardown/rebuild. Otherwise `initializeState` never re-runs and attributes are never populated.
5. **Include `data` in updateTriggers for extension accessors** — When API data changes (e.g., new response with `dist_` fields), `colorTrigger` may not change. Include the `data` array reference in the extension accessor updateTriggers so the attribute manager re-runs the accessors on fresh data.
6. **FragmentGeometry only has `uv`** — In deck.gl v9's fragment shader, `geometry.position` does NOT exist. The `VertexGeometry` struct has `position`, `worldPosition`, `normal`, etc., but `FragmentGeometry` only has `uv`. To get fragment position in the FS, capture `geometry.position.xy` in the VS into a custom varying.
7. **Binary attribute overrides go in `data.attributes`** — In deck.gl v9, `props.instanceFoo` is rejected with "has been removed". Use `data.attributes.instanceFoo` instead. But for extensions using the accessor pattern above, this isn't needed.
8. **`getSubLayerProps` only forwards whitelisted props** — Custom props (binary buffers, accessors) set on a CompositeLayer are NOT automatically forwarded to sublayers. The `defaultProps` + `getSubLayerProps()` mechanism in step 1 is the ONLY reliable way to get extension data through the chain.
See `PieHexExtension.ts` for a working example and `DataFilterExtension` / `FillStyleExtension` in `@deck.gl/extensions` for reference implementations.
**Component size guideline:** If a component exceeds ~300 lines, look for extraction opportunities. Large components are usually doing too much — split into hooks (for logic) and child components (for UI sections).
**Naming conventions:**
@ -376,7 +393,7 @@ Follow these conventions in all Rust code:
- **POI transform validation**: Fails if any OSM category is unmapped — guarantees exhaustive coverage
- **Fuzzy join**: Groups by postcode, uses `thefuzz.token_sort_ratio` with numeric token compatibility, greedy assignment from highest score
- **Filter parsing is strict**: `parse_filters()` returns `Result` — malformed entries, unknown feature names, and unparseable numbers all return 400 Bad Request. No silent skipping of invalid filters.
- **Data loading is strict**: `extract_string_col` and `lookup_enum_value` take a single column name (no fallback names). H3 precomputation panics on invalid coordinates. All configured features (defined in `features.rs`) must exist in at least one data source — the server panics at startup if any are missing (no NaN placeholders). This means all pipeline steps must be complete before starting the server. Polars `diagonal: true` concat fills nulls for features that exist in some but not all sources (e.g. "Listing date" from listings only).
- **Data loading is strict**: `extract_string_col` and `lookup_enum_value` take a single column name (no fallback names). H3 precomputation panics on invalid coordinates. All configured features (defined in `features.rs`) must exist in the data — the server panics at startup if any are missing (no NaN placeholders). This means all pipeline steps must be complete before starting the server.
- **Travel time is strict**: `mode` param is required (400) when `destination` is set — no silent default to "car". R5 failures return 502 Bad Gateway, not silent omission. `r5_url` is `Option<String>` — returns 503 if travel time requested without R5 configured.
- **Filter bounds format**: `south,west,north,east` (not standard bbox order)
- **Server-side AABB filtering**: Both `/api/hexagons` and `/api/postcodes` filter results by bounding-box intersection with query bounds. Hexagons use `h3_cell_bounds()` (h3o returns degrees, not radians). Postcodes compute polygon AABB from vertices. See `bounds_intersect()` in `parsing/bounds.rs`.
@ -384,6 +401,7 @@ Follow these conventions in all Rust code:
- **GridIndex returns slightly more than requested**: The 0.01° grid cells mean properties up to ~1km outside the viewport may be returned. The AABB filter in the route handlers catches these extras.
- **POI proximity**: Uses 0.05° grid (~5km cells) to reduce candidates before haversine distance check
- **OG tag injection**: Uses `<meta name="x-og-placeholder" content="__PERFECT_POSTCODES_OG_TAGS__"/>` placeholder in HTML, replaced at runtime by middleware
- **Enum distribution (pie charts)**: When `enum_dist=FeatureName` is set on `/api/hexagons` or `/api/postcodes`, each cell includes `dist_FeatureName: [count_for_val0, count_for_val1, ...]`. The `Aggregator` struct has optional `EnumDist` that counts raw u16 enum indices per cell. `parse_enum_dist()` in `parsing/fields.rs` validates the feature name and confirms it's an enum. On the frontend, `PieHexExtension` (LayerExtension) injects GLSL into SolidPolygonLayer's fragment shader: computes angle from fragment position to hex centroid (passed as `instancePieCenter` varying), picks slice color from ENUM_PALETTE. `useMapData` adds the `enum_dist` query param when `viewFeatureIsEnum` is true.
- **Dev invite code**: The code `devdevdevdev` is recognized as a valid admin invite in dev mode only (`state.index_html.is_none()`, i.e., `--dist` not passed). Both `get_invite` and `post_redeem_invite` short-circuit for this code, returning a fake valid admin invite / no-op "licensed" response without hitting PocketBase. Preview at `http://localhost:3001/invite/devdevdevdev`.
## Rust Performance Patterns (server-rs)