Use self-hosted tiles

This commit is contained in:
Andras Schmelczer 2026-02-03 21:44:49 +00:00
parent 1cee9c38ce
commit 69de6d75af
6 changed files with 342 additions and 21 deletions

View file

@ -83,12 +83,11 @@ The server and frontend must handle these human-readable names. See the full ren
Rust + Axum. Loads parquet into memory at startup.
**Structure:**
- `data/property.rs` — Loads `wide.parquet`, auto-discovers numeric + enum features, computes histograms, sorts rows by spatial locality, precomputes H3 cells (resolutions 412)
- `data/poi.rs` — Loads `filtered_uk_pois.parquet`
- `index.rs``GridIndex`: 0.01° spatial grid for O(1) cell lookup
- `filter.rs` — Parses filter strings and checks rows. Format: `name:min:max` (numeric), `name:val1|val2` (enum)
- `routes/` — One file per endpoint
**Structure** (uses Rust 2018 module style — `foo.rs` + `foo/` directory, not `foo/mod.rs`):
- `data.rs` + `data/` — Property and POI data loading
- `parsing.rs` + `parsing/` — Filter parsing and bounds parsing
- `routes.rs` + `routes/` — One file per endpoint
- `utils.rs` + `utils/` — GridIndex, hashing, interned columns
- `consts.rs` — Key constants (histogram bins, H3 range, max enum cardinality, excluded columns)
**API endpoints:**
@ -100,10 +99,10 @@ Rust + Axum. Loads parquet into memory at startup.
Serves `frontend/dist/` as static fallback in production.
**Data representation:**
- Numeric features: row-major flat `Vec<f64>`, NaN = null
- Enum features: `Vec<u8>` indices into value list, 255 = null
- String fields (address, postcode): `Vec<String>`, empty = null
**Data representation (unified model):**
- All features (numeric and enum): row-major flat `Vec<f32>`, NaN = null
- Enum features: stored as f32 indices (0.0, 1.0, 2.0...) with `enum_values: FxHashMap<usize, Vec<String>>` mapping feature index → string values
- String fields (address, postcode): interned/packed for memory efficiency
- The server accepts the parquet path as a CLI argument (defaults to `data_sources/processed/wide.parquet`)
### Frontend (`frontend/`)
@ -216,14 +215,49 @@ Every UI element must use the correct token from this table. Do not invent new p
- [ ] Sidebars, dropdowns, and popups are readable in both modes
- [ ] HomePage and DataSourcesPage adapt correctly
## Coding Preferences
- **Unified data models over special-casing**: Prefer storing different data types uniformly (e.g., enums as f32 indices alongside numeric features) rather than maintaining separate code paths
- **Terse tests**: Test what matters in as few tests as possible — don't overcomplicate with excessive setup or edge cases that don't add value
- **Extract and organize**: Group related utilities into proper modules (e.g., `utils/`, `parsing/`) rather than leaving helpers scattered
- **Inline module tests**: Place `#[cfg(test)] mod tests { }` at the bottom of each module file rather than in separate test files
## Rust Code Style (server-rs)
Follow these conventions in all Rust code:
1. **Module style**: Use Rust 2018 module naming — `foo.rs` + `foo/` directory, NOT `foo/mod.rs`
2. **Imports over inline paths**: Import items at the top of the file, don't use `crate::` inline in code
```rust
// Good
use crate::utils::generate_priorities;
let p = generate_priorities(n);
// Bad
let p = crate::utils::generate_priorities(n);
```
3. **Tracing macros**: Import and use short form, not fully qualified
```rust
// Good
use tracing::{info, warn};
info!("message");
// Bad
tracing::info!("message");
```
4. **JSON serialization**: Use `serde_json` with `#[derive(Serialize)]` structs, not manual string building
5. **Precompute at startup**: For static/rarely-changing responses, compute once at startup and store in `AppState`
6. **Unique placeholders**: When injecting content into HTML, use distinctive markers like `__NARROWIT_OG_TAGS__` that won't accidentally match other content
## Key Implementation Details
- **Spatial sort**: Rows sorted by 0.01° grid cell at load time for cache-friendly sequential access
- **Row-major layout**: `feature_data[row * num_features + feat_idx]` — all features for one property are contiguous
- **Row-major layout**: `feature_data[row * num_features + feat_idx]` — all features (numeric and enum) for one property are contiguous
- **H3 precomputation**: Resolutions 412 computed in parallel (rayon) at startup
- **Histogram percentiles without sorting**: O(n) two-pass algorithm — build histogram, interpolate percentiles
- **Direct JSON writing**: Hexagon endpoint writes JSON via string buffer, avoids serde_json::Value allocations
- **Startup precomputation**: Static responses (like `/api/features`) are computed once at startup and cached in `AppState`
- **POI transform validation**: Fails if any OSM category is unmapped — guarantees exhaustive coverage
- **Fuzzy join**: Groups by postcode, uses `thefuzz.token_sort_ratio` with numeric token compatibility, greedy assignment from highest score
- **Filter bounds format**: `south,west,north,east` (not standard bbox order)
- **POI proximity**: Uses 0.05° grid (~5km cells) to reduce candidates before haversine distance check
- **OG tag injection**: Uses `<meta name="x-og-placeholder" content="__NARROWIT_OG_TAGS__"/>` placeholder in HTML, replaced at runtime by middleware