perfect-postcode/pipeline/transform/postcode_boundaries/__init__.py

23 lines
1.2 KiB
Python

"""Generate postcode boundary polygons from OA boundaries, INSPIRE parcels, and UPRN data.
Produces per-district GeoJSON files compatible with the Rust server's postcode loader.
Each postcode gets a polygon (or MultiPolygon) guaranteed to be contained within its
Output Area(s), with 100% OA coverage and no overlaps between postcodes within an OA.
Algorithm per OA:
1. Single-postcode OA → entire OA polygon assigned to that postcode
2. Multi-postcode OA:
a. Assign INSPIRE parcels to postcodes via UPRN point-in-polygon majority vote
b. Union INSPIRE parcels per postcode, clip to OA → "claimed" area
c. Distribute remaining (unclaimed) OA area via Voronoi of UPRN points
d. Final polygon = claimed + Voronoi share
Memory-efficient design (<12GB total):
- INSPIRE polygons stored as raw coordinate bytes in parquet; Shapely objects built
lazily per-OA via numpy bbox pre-filter (~100-500 candidates at a time)
- UPRNs kept as sorted polars DataFrame with offset dict (Arrow storage, ~1.2GB)
- OA processing runs sequentially (no multiprocess INSPIRE duplication)
Output format: {output}/units/{DISTRICT}.geojson with properties.postcodes and
properties.mapit_code fields matching server-rs/src/data/postcodes.rs expectations.
"""