Use self-hosted tiles
This commit is contained in:
parent
1cee9c38ce
commit
69de6d75af
6 changed files with 342 additions and 21 deletions
58
CLAUDE.md
58
CLAUDE.md
|
|
@ -83,12 +83,11 @@ The server and frontend must handle these human-readable names. See the full ren
|
|||
|
||||
Rust + Axum. Loads parquet into memory at startup.
|
||||
|
||||
**Structure:**
|
||||
- `data/property.rs` — Loads `wide.parquet`, auto-discovers numeric + enum features, computes histograms, sorts rows by spatial locality, precomputes H3 cells (resolutions 4–12)
|
||||
- `data/poi.rs` — Loads `filtered_uk_pois.parquet`
|
||||
- `index.rs` — `GridIndex`: 0.01° spatial grid for O(1) cell lookup
|
||||
- `filter.rs` — Parses filter strings and checks rows. Format: `name:min:max` (numeric), `name:val1|val2` (enum)
|
||||
- `routes/` — One file per endpoint
|
||||
**Structure** (uses Rust 2018 module style — `foo.rs` + `foo/` directory, not `foo/mod.rs`):
|
||||
- `data.rs` + `data/` — Property and POI data loading
|
||||
- `parsing.rs` + `parsing/` — Filter parsing and bounds parsing
|
||||
- `routes.rs` + `routes/` — One file per endpoint
|
||||
- `utils.rs` + `utils/` — GridIndex, hashing, interned columns
|
||||
- `consts.rs` — Key constants (histogram bins, H3 range, max enum cardinality, excluded columns)
|
||||
|
||||
**API endpoints:**
|
||||
|
|
@ -100,10 +99,10 @@ Rust + Axum. Loads parquet into memory at startup.
|
|||
|
||||
Serves `frontend/dist/` as static fallback in production.
|
||||
|
||||
**Data representation:**
|
||||
- Numeric features: row-major flat `Vec<f64>`, NaN = null
|
||||
- Enum features: `Vec<u8>` indices into value list, 255 = null
|
||||
- String fields (address, postcode): `Vec<String>`, empty = null
|
||||
**Data representation (unified model):**
|
||||
- All features (numeric and enum): row-major flat `Vec<f32>`, NaN = null
|
||||
- Enum features: stored as f32 indices (0.0, 1.0, 2.0...) with `enum_values: FxHashMap<usize, Vec<String>>` mapping feature index → string values
|
||||
- String fields (address, postcode): interned/packed for memory efficiency
|
||||
- The server accepts the parquet path as a CLI argument (defaults to `data_sources/processed/wide.parquet`)
|
||||
|
||||
### Frontend (`frontend/`)
|
||||
|
|
@ -216,14 +215,49 @@ Every UI element must use the correct token from this table. Do not invent new p
|
|||
- [ ] Sidebars, dropdowns, and popups are readable in both modes
|
||||
- [ ] HomePage and DataSourcesPage adapt correctly
|
||||
|
||||
## Coding Preferences
|
||||
|
||||
- **Unified data models over special-casing**: Prefer storing different data types uniformly (e.g., enums as f32 indices alongside numeric features) rather than maintaining separate code paths
|
||||
- **Terse tests**: Test what matters in as few tests as possible — don't overcomplicate with excessive setup or edge cases that don't add value
|
||||
- **Extract and organize**: Group related utilities into proper modules (e.g., `utils/`, `parsing/`) rather than leaving helpers scattered
|
||||
- **Inline module tests**: Place `#[cfg(test)] mod tests { }` at the bottom of each module file rather than in separate test files
|
||||
|
||||
## Rust Code Style (server-rs)
|
||||
|
||||
Follow these conventions in all Rust code:
|
||||
|
||||
1. **Module style**: Use Rust 2018 module naming — `foo.rs` + `foo/` directory, NOT `foo/mod.rs`
|
||||
2. **Imports over inline paths**: Import items at the top of the file, don't use `crate::` inline in code
|
||||
```rust
|
||||
// Good
|
||||
use crate::utils::generate_priorities;
|
||||
let p = generate_priorities(n);
|
||||
|
||||
// Bad
|
||||
let p = crate::utils::generate_priorities(n);
|
||||
```
|
||||
3. **Tracing macros**: Import and use short form, not fully qualified
|
||||
```rust
|
||||
// Good
|
||||
use tracing::{info, warn};
|
||||
info!("message");
|
||||
|
||||
// Bad
|
||||
tracing::info!("message");
|
||||
```
|
||||
4. **JSON serialization**: Use `serde_json` with `#[derive(Serialize)]` structs, not manual string building
|
||||
5. **Precompute at startup**: For static/rarely-changing responses, compute once at startup and store in `AppState`
|
||||
6. **Unique placeholders**: When injecting content into HTML, use distinctive markers like `__NARROWIT_OG_TAGS__` that won't accidentally match other content
|
||||
|
||||
## Key Implementation Details
|
||||
|
||||
- **Spatial sort**: Rows sorted by 0.01° grid cell at load time for cache-friendly sequential access
|
||||
- **Row-major layout**: `feature_data[row * num_features + feat_idx]` — all features for one property are contiguous
|
||||
- **Row-major layout**: `feature_data[row * num_features + feat_idx]` — all features (numeric and enum) for one property are contiguous
|
||||
- **H3 precomputation**: Resolutions 4–12 computed in parallel (rayon) at startup
|
||||
- **Histogram percentiles without sorting**: O(n) two-pass algorithm — build histogram, interpolate percentiles
|
||||
- **Direct JSON writing**: Hexagon endpoint writes JSON via string buffer, avoids serde_json::Value allocations
|
||||
- **Startup precomputation**: Static responses (like `/api/features`) are computed once at startup and cached in `AppState`
|
||||
- **POI transform validation**: Fails if any OSM category is unmapped — guarantees exhaustive coverage
|
||||
- **Fuzzy join**: Groups by postcode, uses `thefuzz.token_sort_ratio` with numeric token compatibility, greedy assignment from highest score
|
||||
- **Filter bounds format**: `south,west,north,east` (not standard bbox order)
|
||||
- **POI proximity**: Uses 0.05° grid (~5km cells) to reduce candidates before haversine distance check
|
||||
- **OG tag injection**: Uses `<meta name="x-og-placeholder" content="__NARROWIT_OG_TAGS__"/>` placeholder in HTML, replaced at runtime by middleware
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue