More posts

This commit is contained in:
Andras Schmelczer 2026-05-28 08:46:39 +01:00
parent 7d0f895074
commit 1b0a5c0b5d
20 changed files with 551 additions and 45 deletions

View file

@ -15,8 +15,8 @@
"purpose": "maskable"
}
],
"theme_color": "#fbfaf7",
"background_color": "#fbfaf7",
"theme_color": "#201f1d",
"background_color": "#201f1d",
"display": "standalone",
"start_url": "/",
"scope": "/"

View file

@ -6,8 +6,12 @@ const year = new Date().getFullYear();
<footer class="site-footer">
<div class="footer-meta">
<span>© {year} {site.name}</span>
{/* address wraps only the author's contact details, per HTML spec. */}
<span class="footer-copyright">
<span>&copy;</span>
<span>{year}</span>
<span class="footer-name">{site.name}</span>
</span>
{/* address marks only the author's contact details, per HTML spec. */}
<address class="footer-contact">
<a href={`mailto:${site.email}`}>Email</a>
<a href={site.cv} rel="noopener">CV</a>

View file

@ -102,7 +102,7 @@ const headerNavItems = navItems.filter((item) => item.href !== '/' && !item.foot
if (!switcher) return;
// Keep in sync with --color-bg in global.css and theme-init.js.
var THEME_BG = { light: '#fbfaf7', dark: '#151514' };
var THEME_BG = { light: '#fbfaf7', dark: '#201f1d' };
var themeColorMetas = document.querySelectorAll('meta[name="theme-color"]');
function sync(theme) {

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 377 KiB

View file

@ -0,0 +1,102 @@
---
title: Backing Up Running Databases Without Stopping Them
description: A Bash container around BorgBackup. BTRFS snapshots give atomic consistency, numeric env vars give multi-target 3-2-1, the loop is sleep not cron.
date: 2026-05-29
draft: true
projectPeriod: '2024-2026'
thumbnail:
src: ./_assets/backup.png
alt: Placeholder thumbnail for the backup container post.
tags: ['systems', 'tools']
role: Container and script author
stack: ['Bash', 'BorgBackup', 'BTRFS', 'Alpine', 'Docker', 'SSH', 'zstd']
scale: One container, multiple targets per host, two years of restored incidents
outcome: A self-hosted backup that has survived every actual incident I've thrown at it
audience: technical
links:
- label: Source
url: https://github.com/schmelczer/backup-container
- label: Container image
url: https://github.com/schmelczer/backup-container/pkgs/container/backup-container
---
**The short version:**
- One Alpine container, ~75 lines of Bash, that snapshots a BTRFS volume and pushes the snapshot to one or more [Borg](https://borgbackup.readthedocs.io/) repositories on a fixed interval. The snapshot is the only thing standing between "consistent backup" and "corrupt database in the archive."
- Multi-target via numeric env vars (`BORG_REPO_0`, `BORG_REPO_1`, ...). The wrapper iterates until the next index isn't set. No config format, no DSL — the env file is the configuration.
- Two years of self-hosting, multiple restored incidents, zero data loss I noticed.
## The problem the snapshot solves
I self-host several databases that are mid-write at every moment of the day. `tar | borg create` against the live volume is a race: a Postgres or SQLite file that's half-written when borg reads it goes into the archive in a state nothing on Earth can replay. The "right" answer is to coordinate a quiesce with every database — a fan-out of `pg_dump`, SQLite `.backup`, Redis `BGSAVE`, and so on, all with retry, timeouts, and per-app credentials.
The cheaper answer, if you've put everything on one BTRFS volume, is `btrfs subvolume snapshot`. It returns instantly with a copy-on-write fork of the entire filesystem. Every file is now atomically consistent at exactly the same instant. Run borg against the snapshot, not against the live volume.
```bash
btrfs subvolume snapshot /btrfs-root /snapshot
cd "/snapshot/btrfs-root${BACKUP_RELATIVE_PATH:-}"
borg create ... ::"{hostname}-{now:%Y-%m-%dT%H:%M:%S}" .
```
The snapshot lives only for the duration of the borg run. A `trap cleanup EXIT` deletes the subvolume whether the backup succeeded, failed, or was killed. The next run snapshots fresh.
This shifts the entire correctness argument from "did I quiesce every database in time" to "does BTRFS give me a consistent snapshot." It does. That's why everything below it can be a shell script.
## Multi-target as numeric env vars
The 3-2-1 backup rule wants three copies, two media, one offsite. My answer is a remote (rsync.net) and a local HDD, both fed from the same snapshot. The wire format for "multiple targets" is just numbered env vars:
```sh
BORG_PASSPHRASE_0=...
BORG_REMOTE_PATH_0=borg1
BORG_REPO_0=username@username.rsync.net:~/backup
BORG_PASSPHRASE_1=...
BORG_REPO_1=/local-backup
```
`backup-wrapper.sh` loops `index=0` upward, exports `BORG_PASSPHRASE` / `BORG_REPO` / `BORG_REMOTE_PATH` from the indexed copies, runs `backup.sh`, unsets them, increments. Stops the first time the next index has no passphrase.
There's also a no-index fallback (`BORG_REPO=...` with no number) for the single-target case. Same script, no extra config plane.
I keep coming back to this pattern for small-system orchestration. The env file *is* the data structure. There's no YAML parsing, no JSON schema, no config-validation layer between you and the variable that actually matters.
## The scheduler is a sleep, not cron
```bash
while true; do
/src/backup-wrapper.sh 2>&1 | log_message
sleep "$SLEEP_TIME"
done
```
A comment in the file says it out loud: "Using a simple sleep loop to schedule backups instead of cron to avoid concurrency issues." Cron with a one-hour cadence and a backup that occasionally takes 70 minutes will eventually overlap itself. The sleep-loop can't: the next run starts when the previous one is done, plus the interval. One process, one snapshot, one borg invocation. Concurrency bugs you can't have are concurrency bugs you don't have.
## Healthcheck is a file mtime
`borg create` succeeded? Write `date > /health/backup_completion_time.log`. The Docker healthcheck shells out every 10 seconds and compares that mtime against `MAX_BACKUP_AGE_SECONDS` (default 86400). Older than that, container is unhealthy and whatever's watching containers (in my case a notification hook) finds out.
Two subtleties worth naming:
- **First-boot grace period.** If `backup_completion_time.log` doesn't exist yet (fresh container, first backup still running), fall back to `container_start_time.log` so the container isn't reported unhealthy during the first scheduled run.
- **Partial success is not success.** In multi-target mode, the completion log is only written if *every* target succeeded. One repo failing means the healthcheck stays red even if the other two are fine. Stale-but-quiet was the failure mode I wanted to make impossible.
## Smaller calls
- **`borg break-lock` at the start of every run.** If the previous container was killed mid-backup, the repo is locked and the next `borg create` will hang. Just break it. There's only ever one writer because of the sleep loop.
- **`set -e` after `borg init`, not before.** The init line is the only one allowed to fail (first run on a fresh repo). Everything after halts on error.
- **`BORG_RSH='ssh -oBatchMode=yes'`.** Fail fast if SSH would have prompted, instead of hanging forever inside a detached container.
- **`ServerAliveInterval 30` in `ssh_config`.** Long borg transfers across home-ISP NAT get killed if nothing flows for a few minutes. Keepalives keep the tunnel open.
- **`--files-cache=ctime,size,inode`.** The default `mtime,size,inode` re-hashes files when their mtime changes; on BTRFS, ctime is the more honest signal of "this content actually changed."
- **`compression=zstd,12`.** The sweet spot for backup data on my hardware: substantially better than zlib, not so slow it dominates the run.
- **`borg compact --threshold=5 --cleanup-commits`.** Reclaims space from pruned archives whenever the segment-file fragmentation crosses 5%.
- **`IGNORE_GIT_UNTRACKED=true`.** Optional. Walks every `.git` dir under the snapshot, runs `git ls-files --others --exclude-standard`, and feeds the result into `--exclude-from`. Skips `target/`, `node_modules/`, build caches — anything the repo already knows isn't worth keeping.
- **`SYS_ADMIN` capability on the container.** Needed for `btrfs subvolume snapshot` and `delete` from inside the namespace. The narrower capability set didn't have a way through.
## What I'd change
- **A test rig that restores into an empty volume on a schedule.** "Backups exist" is not the property I care about. "Backups restore" is. I have anecdotal evidence after every incident; I don't have a green checkmark before one.
- **A failure notifier separate from the healthcheck.** Docker healthcheck-unhealthy is one signal; I'd also want an explicit push (ntfy, email, Telegram) on first failure of a run, so I don't have to be watching the container state.
- **Parallel targets when network and disk don't compete.** The current loop is strictly sequential: rsync.net then local HDD. They share neither bandwidth nor spindles; they could run in parallel and halve the wall-clock. Sequential made the wrapper trivial; the trade was knowable and I made it.
Two years in, the part I'd defend hardest is the snapshot. Everything above it is a wrapper that could be rewritten in an afternoon. The snapshot is what makes the wrapper allowed to be one.

View file

@ -0,0 +1,77 @@
---
title: An E-Ink Photo Frame That Sleeps When the House Is Empty
description: A Pi, a 6-colour e-ink panel, and a self-hosted Immich library. Photos chosen by date and favourites, gated on Home Assistant presence, dithered with Atkinson.
date: 2026-05-27
projectPeriod: '2026'
thumbnail:
src: ./_assets/frame.jpg
alt: The e-ink frame on the wall showing a dithered landscape photo with the capture age and EXIF location painted into the bottom corners.
tags: ['embedded', 'systems', 'tools']
role: Frame builder and pipeline author
stack: ['Python', 'Raspberry Pi Zero 2W', 'Waveshare 7.3" 6-colour panel', 'Immich', 'Home Assistant', 'numba', 'Atkinson dither']
scale: One panel, one household, ~64 refreshes a day at peak
outcome: A wall-mounted photo frame that pulls from self-hosted Immich, gated on home presence, with no cloud dependencies
audience: general
links:
- label: Source
url: https://home.schmelczer.dev/git/andras/frame
media:
- type: image
src: ./_assets/frame.jpg
alt: The frame on the wall showing a 6-colour Atkinson-dithered landscape photo, with "2 years ago" and a location label painted into the bottom corners.
caption: The bottom corners carry the photo's age and EXIF location. Painted as text on top, so the dither can't smear them.
---
**The short version:**
- A Raspberry Pi Zero 2W drives Waveshare's [PhotoPainter](https://www.waveshare.com/wiki/PhotoPainter), a 7.3" 6-colour ACeP e-ink panel. Cron fires every 15 minutes; if [Home Assistant](https://www.home-assistant.io/) says the house is empty (or it's between midnight and 7am), the script quits.
- Photo source is my self-hosted [Immich](https://immich.app/) library. The pool is weighted toward "on this day," favourites, and recent uploads, with a 7-day rolling history to avoid repeats.
- Each accepted candidate is face-aware cropped, contrast and saturation boosted (e-ink lacks both), Atkinson-dithered to the 6-colour palette, then labelled with capture age and EXIF location before pushing. A few hundred lines of stdlib Python on top of Waveshare's reference driver.
## Why a stupid amount of engineering for a picture on a wall
That's the point. Albert Borgmann once distinguished *devices* — which efficiently deliver a commodity and disappear into the wall — from *focal things*, which gather a practice around them. A Nest Hub is a device; it shows you photos the way a microwave delivers heat. The frame is a focal thing. I curated the weights. I hung it where the light was right. I tweak it when something feels off. It doesn't sell my attention back to me; it asks me to pay some.
The medium helps. E-ink doesn't glow and doesn't beep. From across the room it reads as *image*, not as *screen* — and that one perceptual difference changes how often I actually look at it.
## The presence gate
The cron line does most of the work. Every 15 minutes, the script checks the time of day, then asks Home Assistant whether anyone in `HA_PRESENCE` is home. If not, it quits. The panel keeps showing the last photo, because e-ink — so you walk in to whatever was there when the house emptied.
The point isn't power saving. John Berger drew a line between photographs kept inside a context of lived meaning — private — and ones severed and circulated — public. Google Photos hands you the public mode dressed as the private. A wall in the hallway, lit only when your people are home, restores the context. The same photograph means something different surfacing while you're cooking dinner than it does in a feed at 11pm.
## How a photo gets picked
The pool is biased the way memory is biased: four buckets, weighted ~30% "on this day" (dropping to ~10% if only the ±3-day fallback fires), ~18% favourites, ~36% the last 30 days, ~36% everything else. Within those buckets, orientation-match against the current frame gets 4x the weight of mismatch, because cropping landscape to portrait works less often than the reverse.
A 7-day rolling history filters repeats. Before accepting a candidate, the picker runs `heads_fit_in_crop` against Immich's detected face boxes, extended upward to cover the skull and padded by `HEAD_SAFETY_MARGIN`: if the planned crop would slice into any visible head, that candidate is rejected and another is drawn. A wall photo with half a face in it is worse than the same photo not on the wall at all.
`face_aware_crop` does the actual cropping — resize-cropping to fill the frame while biasing the window around detected faces. A landscape shot with room around the subject usually crops cleanly to portrait this way; the guardrail above catches the ones that don't.
## Tuning the pipeline somewhere else
Iterating on the Pi means waiting 12+ seconds per refresh. Both the face-aware crop and the dither were tuned in Jupyter against a local pool of a few hundred photos, then frozen and shipped.
The dither is where the choice visibly matters. The panel can only show black, white, red, yellow, blue, green — no intensity control, every pixel is one of those six. I compared Floyd-Steinberg, Stucki, and a couple of ordered variants. Atkinson kept the highest perceived contrast on the 6-colour palette without smearing skin tones into the nearest yellow. Pure-Python Atkinson on the Pi Zero was unusably slow, so the inner loop runs through `numba` with perceptual-weighted nearest-colour matching (0.299/0.587/0.114). Roughly 100x faster after the JIT cache warms.
## The weekend-reimplementable rule
Hundred Rabbits, a couple who live offshore on a sailboat doing permacomputing in practice, hold themselves to a rule: any system they depend on should be reimplementable in a weekend. The frame meets the bar. A few hundred lines of stdlib Python on a documented panel, reading from an HTTP endpoint that returns JPEGs. It came together over an afternoon with Claude Code plus a couple of weekends tuning the picker and the dither; the repo is public partly as a reference for anyone wanting to do something similar. If Immich disappears tomorrow the selection logic is eighty lines I can repoint at whatever replaces it.
This stopped being hobbyist territory around 2024, when researchers found family-blog photos of Brazilian children inside the LAION training set. Self-hosting your photos used to be a preference; it's becoming a safeguarding decision. Don't ask whether the hassle is worth it now. Ask what state you'd be in if any one of your platforms went dark — and notice that this isn't a hypothetical. Nixplay's cloud-tied frames have bricked. Funimation deleted libraries people had paid for. The parenthetical in *useless when the company closes its doors* does the whole argument's work.
## Smaller calls
- **Capture age and EXIF location painted as text.** White on a black stroke, written *after* dithering, so the labels stay sharp on the 6-colour palette.
- **CLI flags for the awkward photos.** `--album`, `--people`, `-o 90` (portrait), and `--saturation`/`--contrast`/`--gamma` are flags on the cron command. The defaults are tuned for the average photo; the flags exist for the few that aren't.
- **`flock` around the render.** A slow refresh can't overlap the next 15-minute tick.
- **Wifi power-save reconnect job.** The Pi Zero 2W's wifi drops if power-save kicks in. A separate `wifi-check.sh` every five minutes brings it back.
- **Swap masked, journald volatile.** The SD card is the most likely thing to die on this build. Don't write to it unless you have to.
## What I'd change
- **Lower-power hardware.** The Pi Zero 2W is overkill and idles 14 minutes out of every 15. The Waveshare board didn't have an RTC interrupt pin soldered, and rather than hack one in, I'd reach for an ESP32 next time. Deep sleep has plenty of time to do the image work inside a 15-minute window.
- **A bigger panel and a small light.** The [Inky Impression](https://shop.pimoroni.com/products/inky-impression) 13" with a custom frame and integrated lighting would help most in the evenings, when the e-ink reads muddled under warm lamps.
- **A daytime cadence curve.** 15 minutes is constant. It should slow at night and speed up around the times we're actually in the hallway.
The frame is small, slow, and almost entirely silent. It does one thing for one household and doesn't tell anyone about it. The smallness is the point. There should be more of this kind of thing.

View file

@ -0,0 +1,113 @@
---
title: 25 Million UK Property Rows in a Single Rust Process
description: Notes on the perfect-postcode.co.uk server. Every numeric feature is u16-quantised in a flat row-major array, so filter eval is two integer compares per row.
date: 2026-05-28
projectPeriod: '2026'
thumbnail:
src: ./_assets/perfect-postcode.jpg
alt: The Perfect Postcode dashboard with active filters on property type, price, transit time, and crime, showing a Manchester map with matching properties highlighted as a heatmap.
tags: ['systems', 'web', 'tools']
role: Server architect and operator
stack: ['Rust', 'Axum', 'Polars', 'h3o', 'rayon', 'PocketBase', 'PMTiles', 'MapLibre', 'deck.gl', 'Conveyal R5', 'Gemini']
scale: ~25M historical properties, ~2.5M postcodes, ~150 numeric features per row, all in RAM on a single VM
outcome: A single-binary UK property-intelligence service with sub-100ms hexagon aggregations under filter
audience: technical
links:
- label: Site
url: https://perfect-postcode.co.uk
media:
- type: image
src: ./_assets/perfect-postcode.jpg
alt: A Perfect Postcode dashboard view of Manchester with five active filters (property type, price, public-transport time to Manchester city centre, crime, noise) and a hex heatmap of 1,247 matching properties.
caption: A normal user pan triggers a hexagon aggregation under filter. The hot path holds itself to two u16 compares per row.
---
**The short version:**
- One Rust binary (Axum, Polars, h3o, rayon) holds the entire UK property history in RAM: ~25M historical transactions, ~150 numeric features per row, plus postcode features, POIs, places, sparse travel-time matrices, and PMTiles. The whole resident set fits inside a VM you can rent.
- The hot loop dictates the data layout. Every numeric feature is u16-quantised against a per-feature `(min, scale)`. Filter evaluation per row, per filter, is `raw != NAN_U16 && raw >= min_u16 && raw <= max_u16` — three integer compares, no floats, no decoding.
- An H3 cell is precomputed per property at resolution 12. A CSR-laid-out 0.01°-cell grid handles bbox queries. Aggregation goes serial under 50,000 candidate rows and parallel above it.
## The constraint that shapes everything
The answer to *"what's the median price in this hexagon, filtered to four-bedroom terraces under £450k with a 35-minute transit to Manchester"* needs to come back inside a single map pan. Per visible cell, per request, every time the user moves anything. That's the work.
At the resolution we want, the inputs are roughly 25M historical transactions, each with around 150 numeric features (price, EPC, deprivation deciles, school catchment metrics, POI proximities, noise, crime, …). Naively f32 per cell, that's ~15 GB before you count anything else — postcodes, POIs, places, tiles, travel times. The rest of the architecture is the consequence of insisting it all lives in one process on one rentable box.
## u16 quantisation in a row-major flat array
Every numeric feature is encoded as `((value - feature_min) / feature_range) * 65534`. Dequant is `raw * dequant_a + quant_min`. `u16::MAX` is reserved as `NAN_U16` — the explicit missing-value sentinel — so the live range is 65534, not 65535. Per feature we keep a `(min, scale, p1, p99)` tuple and a 100-bucket histogram for the UI sliders.
Storage is a single `Vec<u16>` laid out row-major: `feature_data[row * num_features + feat_idx]`. Sixteen features fit in one 64-byte cache line; a row scan stays in L1 for several rows at a time. With 25M rows × ~150 features × 2 bytes, the property matrix is around 7.5 GB — comfortably inside a 16 GB instance once the rest of the data joins it.
The precision loss is real but bounded: 0.010.1% per feature on the data we have, below the noise floor of any downstream statistic. The win is that the hot loop never touches an `f32`.
## The hot loop is three integer compares
`ParsedFilter` carries `min_u16` and `max_u16` — the user's bounds requantised against the same per-feature `(min, scale)` at parse time. The row test is literal:
```rust
let raw = feature_data[base + filter.feat_idx];
raw != NAN_U16 && raw >= filter.min_u16 && raw <= filter.max_u16
```
No string keys. No `f32` decoding. Enum features go through a pre-built `FxHashSet<u16>` of allowed raw values, same shape.
Two small parse-time choices made this fast in practice:
- **Sort filters by selectivity.** `numeric.sort_unstable_by_key(|f| f.max_u16.saturating_sub(f.min_u16))` puts the narrowest ranges first. A 50-filter request usually short-circuits on filter two or three.
- **Reject inverted ranges at parse time.** `min > max` errors out, so `saturating_sub` can't wrap a huge u16 into the sort key and silently reorder things.
## Spatial: a CSR grid plus precomputed H3
Two indexes, used for different things.
A 0.01° (~1 km) regular grid in CSR layout — a single flat `values: Vec<u32>` of row indices and an `offsets: Vec<u32>` of per-cell starts — answers bbox queries. CSR avoids the 24-byte-per-cell `Vec` header you'd pay with `Vec<Vec<u32>>`, which is the difference between a few MB and a few hundred MB at UK scale. `for_each_in_bounds` is the variant that skips the result allocation; aggregators stream into it directly.
An H3 cell at resolution 12 is precomputed per property at boot, stored as `Vec<u64>`. Lower-resolution cells are derived via `CellIndex::parent()` — fast and exact. The hexagon endpoint thresholds at `PARALLEL_THRESHOLD = 50_000`: below, plain serial aggregation; above, `rayon::par_chunks()` with `chunk = max(1000, rows / num_threads)`. Below the threshold, rayon's per-chunk overhead dominates the work it's parallelising — it's worse than the obvious thing. Above, the slope flips.
A small per-thread `FxHashMap<u64, u64>` H3 cache inside each rayon chunk takes care of properties touched by multiple aggregations within the same chunk.
## State is an Arc-clone away
`AppState` is large and immutable after the boot-time loads. `SharedState = RwLock<Arc<AppState>>` wraps it; every handler does `shared.load_state()` — a brief read lock, an `Arc::clone`, no further lock contention for the request.
The standard read-mostly pattern, but worth naming for one reason: it makes hot-reloading the parquet trivial later. Build a new `AppState` from disk, take the write lock, swap the `Arc`, drop the old one when the last in-flight request finishes. None of the handlers need to change.
On top of that there's a per-endpoint `ConcurrencyLimitLayer::new(N)`. The expensive endpoints (filter-counts, hexagon-stats, screenshot, export) get 35; the cheap ones get 2030. It is the simplest backpressure you can write and it does most of the work.
## PocketBase as the distributed lock
For mutations that need exclusion (subscription state transitions, redeem-invite races), there is no Redis. Instead, `acquire_pocketbase_lock` does an optimistic create against a `locks` collection. If create succeeds, we own it; if it fails on conflict, we fetch the existing lock, check `expires_at_unix`, and if it's expired we delete and retry. Owner ID is a 24-char random string so stale-lock detection doesn't rely on host identity or wall-clock skew.
Release is a `Drop` handler that spawns a tokio task to delete the record — async cleanup keeps the synchronous drop path free of I/O. 100 ms retry, 10-second acquire deadline. Coarse, but correct, audit-loggable in PocketBase, and adds zero new infrastructure to operate.
## Cost-capping the LLM endpoint
The AI filter parser is a Gemini call. Two structural choices made it cheap enough to leave on:
- **One system prompt, computed once.** `build_system_prompt(features, mode_destinations)` runs at boot. The feature catalogue, the enum of available travel modes, the few-shot examples — all concatenated once into a `String` on `AppState`. Every request reuses the same bytes, which Gemini's input cache likes.
- **A `search_destinations` tool with a closed enum of modes.** The LLM doesn't get to invent place slugs. It can call the function; the server slugifies and resolves against the loaded travel-time directory using a word-overlap matcher tolerant of `kings-cross` vs `King's Cross`.
On top: a per-week token budget (`AI_FILTERS_WEEKLY_TOKEN_LIMIT = 10_000_000`) and a 2,000-token output cap. The budget is the actual cost guarantee; the per-call cap is belt-and-braces.
## Smaller calls
- **`mlockall(MCL_CURRENT | MCL_FUTURE)` at startup.** The hot dataset has to never page out. With `CAP_IPC_LOCK` it works; without it we log and continue.
- **`malloc_trim(0)` after each big load.** Polars leaves a high allocator water-mark after parquet scans. Trimming after each major load gives back hundreds of MB of RSS before steady state.
- **Prometheus path normalisation.** `/api/tiles/5/16/10` becomes `/api/tiles/:z/:x/:y` before it becomes a label. Otherwise `/.env`, `/wp-admin/...`, and bot scans explode cardinality.
- **Median-half eviction over LRU.** Token, share-bounds, and superuser-token caches evict the older half on overflow instead of one entry at a time. Cheap, and it spreads the re-validation cost instead of triggering a thundering herd.
- **`spawn_blocking` for Polars I/O.** Parquet scans are CPU-bound. They block the tokio executor if you let them; they don't if you don't.
- **`Box<[T]>` instead of `Vec<T>` for aggregator accumulators.** No `capacity` field, 8 bytes saved per slot. At hundreds of hexagons × six features per request it adds up.
- **String interning, three times.** Postcodes (~2.5M unique from 25M rows) live in a `lasso::RodeoReader`; each row stores a `Spur` (~4 bytes). Address tokens are flattened into one buffer with per-row `(offset, length)` arrays. The same pattern for enum value strings.
- **Free-zone bbox check, not point check.** Unlicensed queries must have their *entire* bbox inside `FREE_ZONE_BOUNDS`. Point-in-zone would be convenient and wrong — it would let users pan to anywhere from a free-zone centre.
- **Share-link bounds are server-computed.** `bounds_from_view(lat, lon, zoom)` derives the bbox from a UK-aware longitude/latitude span (`half_lat = half_lon * 0.6`) and clamps it. Legacy short URLs without server-stored bounds grant nothing.
## What I'd change
- **Pin the allocator.** I rely on `malloc_trim` to keep RSS predictable. A jemalloc with explicit purge would behave better than glibc plus periodic trimming, especially under sustained load.
- **One bench for the hot loop.** I trust the structure but I have no number for *filter throughput per row per filter under typical load*. That number would tell me when the u16 trick stops being enough.
- **Move free-zone bounds to PocketBase.** `FREE_ZONE_BOUNDS` is a `const`. It's been right for the demo region for a year. The next time it changes I'll regret hardcoding it.
- **A typed query DSL instead of `;;`-separated strings.** The current filter wire format is `name:min:max;;name:val1|val2`. Cheap to parse, awful to evolve. A small JSON envelope would survive the next feature.
There's something a little embarrassing about a binary that just memory-maps a country. But the architecture made the latencies trivial, and the latencies are most of what a user feels.

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 377 KiB

View file

@ -0,0 +1,17 @@
---
title: Backup Container
description: A Bash container around BorgBackup. BTRFS snapshot for atomic consistency, numeric env vars for multi-target 3-2-1, sleep-loop instead of cron.
thumbnail:
src: ./_assets/backup.png
alt: Placeholder thumbnail for the backup container project.
period: '2024-2026'
sortDate: 2024-06-01
technologies: ['Bash', 'BorgBackup', 'BTRFS', 'Alpine', 'Docker', 'SSH', 'zstd']
selected: false
essay: backup-container-btrfs-borg
links:
- label: Source
url: https://github.com/schmelczer/backup-container
- label: Container image
url: https://github.com/schmelczer/backup-container/pkgs/container/backup-container
---

View file

@ -0,0 +1,15 @@
---
title: Frame
description: A LAN-only e-ink photo frame. Pulls from self-hosted Immich, gated on Home Assistant presence, Atkinson-dithered to 6 colours, no cloud.
thumbnail:
src: ./_assets/frame.jpg
alt: The e-ink frame on the wall showing a dithered landscape photo with the capture age and EXIF location painted into the bottom corners.
period: '2026'
sortDate: 2026-05-01
technologies: ['Python', 'Raspberry Pi Zero 2W', 'Waveshare PhotoPainter', 'Immich', 'Home Assistant', 'numba', 'Atkinson dither']
selected: true
essay: frame-eink-photo-display
links:
- label: Source
url: https://home.schmelczer.dev/git/andras/frame
---

View file

@ -0,0 +1,15 @@
---
title: Perfect Postcode
description: A UK property-intelligence map. ~25M historical transactions, ~150 features per row, all u16-quantised in RAM, served from a single Rust binary.
thumbnail:
src: ./_assets/perfect-postcode.jpg
alt: The Perfect Postcode dashboard with active filters on property type, price, transit time, and crime, showing a Manchester map with matching properties as a heatmap.
period: '2026'
sortDate: 2026-05-01
technologies: ['Rust', 'Axum', 'Polars', 'h3o', 'rayon', 'PocketBase', 'PMTiles', 'MapLibre', 'deck.gl', 'Conveyal R5', 'Gemini']
selected: true
essay: perfect-postcode-rust-property-server
links:
- label: Site
url: https://perfect-postcode.co.uk
---

View file

@ -101,7 +101,7 @@ const jsonLdStrings = jsonLdEntries.map((entry) =>
{noindex && <meta name="robots" content="noindex,follow" />}
{!noindex && <link rel="canonical" href={canonical} />}
<meta name="theme-color" content="#fbfaf7" media="(prefers-color-scheme: light)" />
<meta name="theme-color" content="#151514" media="(prefers-color-scheme: dark)" />
<meta name="theme-color" content="#201f1d" media="(prefers-color-scheme: dark)" />
<script is:inline data-theme-script set:html={themeInit} />
<link
rel="preload"

View file

@ -2,16 +2,19 @@
import type { ComponentProps } from 'astro/types';
import Base from './Base.astro';
type Props = Omit<ComponentProps<typeof Base>, 'title'> & { title: string };
type Props = Omit<ComponentProps<typeof Base>, 'title'> & {
title: string;
fullWidth?: boolean;
};
const { title, description } = Astro.props;
const { title, description, fullWidth } = Astro.props;
if (!title) {
throw new Error('Page layout requires a `title` prop.');
}
---
<Base {...Astro.props}>
<div class="page-shell">
<div class:list={['page-shell', fullWidth && 'page-shell--full-width']}>
<header class="page-header">
<slot name="breadcrumbs" />
<h1>{title}</h1>

View file

@ -20,12 +20,11 @@ const startingPoints = posts
.slice(0, STARTING_POINTS);
const STARTING_POINT_NOTES: Record<string, string> = {
'greatai-ai-deployment-api': 'Small API as policy.',
'reconcile-text-3-way-merge':
'Constraints (no history, three runtimes) pick the design.',
'sdf-2d-ray-tracing': 'Mobile GPU as the architecture.',
'life-towers-immutable-tries': 'Data structure as the protocol.',
'nuclear-cooling-simulation': 'Two graphs are simpler than one big one.',
'greatai-ai-deployment-api': 'Policy expressed as a small API.',
'reconcile-text-3-way-merge': 'A merge design shaped by no history and three editors.',
'sdf-2d-ray-tracing': 'Mobile GPU limits drive the rendering architecture.',
'life-towers-immutable-tries': 'Immutable tries make sync cheap and explicit.',
'nuclear-cooling-simulation': 'Separate graph passes keep simulation logic readable.',
};
const startingPointsAnnotated = startingPoints.map((post) => ({
@ -65,8 +64,9 @@ const personJsonLd = buildPersonJsonLd({
description="A few sentences about the two moves I keep reaching for, and the posts that show them in different shapes."
jsonLd={personJsonLd}
ogType="profile"
fullWidth
>
<div class="prose">
<div class="prose about-copy">
<p>
I'm Andras. I write software for a living, and have done so for about six years. MSc
in CS. The first non-trivial thing I finished was a Raspberry Pi music visualiser
@ -119,12 +119,15 @@ const personJsonLd = buildPersonJsonLd({
</dl>
</section>
<section class="about-section">
<section class="about-section about-section--starting-points">
<div class="section-heading">
<h2 id="why-these-five">Why these five</h2>
<div class="section-heading__text">
<h2 id="selected-writeups">Selected writeups</h2>
<p>Finished projects where a hard constraint did most of the design work.</p>
</div>
<a href="/articles/">All articles <span aria-hidden="true">→</span></a>
</div>
<ol class="starting-points" aria-label="Starting point articles">
<ol class="starting-points" aria-label="Selected article writeups">
{
startingPointsAnnotated.map(({ post, href, note }) => (
<li>
@ -151,7 +154,7 @@ const personJsonLd = buildPersonJsonLd({
<section class="about-section facts">
<h2 id="working-style">A few things I believe</h2>
<div class="prose">
<div class="prose about-copy">
<ul>
<li>
Most "interesting algorithm" problems are actually data-structure problems

View file

@ -10,13 +10,14 @@
document.documentElement.classList.add('js');
var STORAGE_KEY = 'theme';
var THEME_BG = { light: '#fbfaf7', dark: '#151514' };
var THEME_BG = { light: '#fbfaf7', dark: '#201f1d' };
var saved = null;
try {
var value = localStorage.getItem(STORAGE_KEY);
if (value === 'light' || value === 'dark') saved = value;
} catch (e) {}
var theme = saved || (matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light');
var theme =
saved || (matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light');
document.documentElement.dataset.theme = theme;
document.documentElement.style.colorScheme = theme;
var themeColorMetas = document.querySelectorAll('meta[name="theme-color"]');

View file

@ -33,28 +33,28 @@
--font-mono: 'IBM Plex Mono', 'JetBrains Mono', ui-monospace, SFMono-Regular, monospace;
/* Palette: light-dark() pairs each token (light, dark) */
--color-bg: light-dark(#fbfaf7, #151514);
--color-fg: light-dark(#181817, #f1eee7);
/* Contrast with --color-bg: light ~5.4:1, dark ~7.1:1 (both clear WCAG AA
--color-bg: light-dark(#fbfaf7, #201f1d);
--color-fg: light-dark(#181817, #d8d0c3);
/* Contrast with --color-bg: light ~5.4:1, dark ~6.5:1 (both clear WCAG AA
4.5:1 for normal text). Darken-on-light / lighten-on-dark slightly from
the previous values that fell just below threshold. */
--color-muted: light-dark(#3d3b35, #c8c0b3);
--color-link: light-dark(#285f74, #8ab8c8);
--color-muted: light-dark(#3d3b35, #aaa299);
--color-link: light-dark(#285f74, #7fa8b7);
--color-link-hover: light-dark(
color-mix(in oklch, #285f74 70%, black 30%),
color-mix(in oklch, #8ab8c8 70%, white 30%)
color-mix(in oklch, #7fa8b7 80%, white 20%)
);
--color-accent: light-dark(oklch(55% 0.13 15), oklch(72% 0.13 15));
--color-rule: light-dark(#d9d5ca, #39352f);
--color-rule-medium: light-dark(#7a7466, #8a8478);
--color-rule-strong: light-dark(#4a4340, #d0c5b7);
--color-code-bg: light-dark(#efede6, #2f2c27);
--color-callout-bg: light-dark(#f4f1e8, #211f1c);
--color-accent: light-dark(oklch(55% 0.13 15), oklch(68% 0.11 15));
--color-rule: light-dark(#d9d5ca, #5a5247);
--color-rule-medium: light-dark(#7a7466, #70695f);
--color-rule-strong: light-dark(#4a4340, #aaa196);
--color-code-bg: light-dark(#efede6, #2a2824);
--color-callout-bg: light-dark(#f4f1e8, #292723);
--color-selection-bg: light-dark(#ecddd0, #4a3a2e);
--theme-switcher-track: var(--color-rule-medium);
--theme-switcher-icon-light: #f0e2b6;
--theme-switcher-icon-dark: #f1eee7;
--theme-switcher-icon-dark: #d8d0c3;
/* Typography */
--fs-xs: 0.75rem;
@ -109,6 +109,16 @@
color-scheme: dark;
}
@media (max-width: 700px) {
:root {
--fs-body: 1.0625rem;
--fs-dek: 1rem;
--fs-lg: 1.125rem;
--fs-xl: 1.375rem;
--fs-3xl: 1.75rem;
}
}
/* =========================================================================
Reset
========================================================================= */
@ -339,17 +349,18 @@
.footer-meta {
display: flex;
align-items: center;
flex-wrap: wrap;
flex-wrap: nowrap;
gap: var(--space-2) var(--space-5);
margin: 0;
padding: 0;
list-style: none;
color: var(--color-muted);
font-size: var(--fs-caption);
white-space: nowrap;
}
.footer-meta a,
.footer-meta span {
.footer-meta > span {
min-height: 44px;
display: inline-flex;
align-items: center;
@ -360,11 +371,16 @@
margin-inline: calc(-1 * var(--space-1));
}
.footer-copyright {
gap: 0.25em;
}
.footer-contact {
display: flex;
align-items: center;
flex-wrap: wrap;
flex-wrap: nowrap;
gap: var(--space-2) var(--space-5);
min-width: 0;
}
/* Page header (shared by .home-intro, .page-header, .post-header) */
@ -395,12 +411,25 @@
font-size: var(--fs-dek);
}
.page-header,
.post-header {
max-width: var(--measure-wide);
padding-block: var(--space-10) var(--space-6);
}
.page-header {
max-width: var(--measure-wide);
padding-block: var(--space-2) var(--space-6);
}
.page-shell--full-width .page-header,
.page-shell--full-width .page-header p,
.page-shell--full-width .about-copy,
.page-shell--full-width > .about-copy > p {
width: 100%;
max-width: none;
max-inline-size: none;
}
.post-header .dek {
margin-block: var(--space-4) 0;
}
@ -469,6 +498,19 @@
text-underline-offset: 0.25em;
}
.section-heading__text {
flex: 1 1 11rem;
min-width: 0;
}
.section-heading__text p {
max-width: var(--measure);
margin-top: var(--space-1);
color: var(--color-muted);
font-size: var(--fs-caption);
line-height: 1.4;
}
/* -- Breadcrumbs ------------------------------------------------------ */
.breadcrumbs {
@ -528,11 +570,11 @@
}
.tag-list a {
min-height: 44px;
min-height: 2rem;
display: inline-flex;
align-items: center;
padding-inline: var(--space-2);
margin-inline: calc(-1 * var(--space-2));
padding-inline: var(--space-1);
margin-inline: calc(-1 * var(--space-1));
color: var(--color-muted);
text-decoration: none;
}
@ -645,6 +687,8 @@
.project-list p {
margin: var(--space-1) 0 0;
color: var(--color-muted);
font-size: var(--fs-base);
line-height: var(--leading-snug);
}
/* -- Thumbnail -------------------------------------------------------- */
@ -929,6 +973,17 @@
margin-top: var(--space-10);
}
.about-section--starting-points {
margin-top: var(--space-6);
padding-top: var(--space-6);
border-top: 1px solid var(--color-rule);
}
.about-section--starting-points .section-heading {
align-items: flex-start;
padding-top: 0;
}
.about-section.facts {
max-width: none;
}
@ -937,13 +992,17 @@
margin-top: var(--space-4);
}
.prose.about-copy {
width: 100%;
max-inline-size: none;
}
.starting-points {
display: grid;
grid-template-columns: repeat(5, minmax(0, 1fr));
gap: var(--space-3);
margin: 0;
padding: var(--space-4) 0 0;
border-top: 1px solid var(--color-rule);
list-style: none;
}
@ -1109,6 +1168,8 @@
.prose {
max-inline-size: var(--measure);
line-height: var(--leading-prose);
hyphens: auto;
hyphenate-limit-chars: 7 3 3;
}
.prose > * + * {
@ -1689,7 +1750,15 @@
}
.home-intro {
padding-block: var(--space-8) var(--space-6);
padding-block: var(--space-6) var(--space-5);
}
.home-section {
margin-top: var(--space-6);
}
.page-shell {
margin-top: 0;
}
.at-a-glance__row,
@ -1782,13 +1851,48 @@
.page-header,
.post-header {
padding-block: var(--space-8) var(--space-5);
padding-block: var(--space-6) var(--space-4);
}
.post > .prose {
margin-top: var(--space-6);
}
.prose {
line-height: 1.55;
}
.prose code {
overflow-wrap: anywhere;
}
.prose pre {
overflow-x: hidden;
scrollbar-gutter: auto;
white-space: pre-wrap;
}
.prose pre code,
.prose pre .line {
overflow-wrap: anywhere;
white-space: pre-wrap;
}
.tag-filter {
display: block;
margin-bottom: var(--space-5);
padding-block: var(--space-2);
}
.tag-filter .tag-list {
gap: 0 var(--space-3);
margin-top: var(--space-2);
}
.tag-list {
gap: 0 var(--space-3);
}
:focus-visible {
outline-offset: 1px;
}
@ -1821,6 +1925,58 @@
.site-nav {
gap: var(--space-1) var(--space-6);
}
.footer-meta,
.footer-contact {
gap: var(--space-2);
}
.footer-meta {
font-size: var(--fs-xs);
}
}
@media (min-width: 430px) and (max-width: 700px) {
.post > .prose > p,
.page-shell--full-width > .about-copy > p {
text-align: justify;
text-align-last: start;
text-justify: auto;
overflow-wrap: normal;
word-break: normal;
}
}
@media (min-width: 701px) {
.page-shell--full-width > .about-copy > p {
text-align: justify;
text-align-last: start;
text-justify: auto;
overflow-wrap: normal;
word-break: normal;
}
}
@media (max-width: 360px) {
.site-header {
position: relative;
}
.theme-switcher {
position: absolute;
inset-block-start: var(--space-6);
inset-inline-end: 0;
margin: 0;
}
.header-actions,
.site-nav {
column-gap: var(--space-3);
}
.footer-name {
display: none;
}
}
/* Reduced motion */