# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project shape VaultLink is a self-hosted Obsidian file-sync system. Two halves of one repo: - `sync-server/` — Rust (axum + sqlx/SQLite). Source of truth for vault state, broadcasts changes via WebSocket. - `frontend/` — npm workspaces. The sync engine (`sync-client`) is consumed by an Obsidian plugin, a standalone CLI, a fuzz E2E harness, a scripted determinism harness, and a history UI. The HTTP/WS API types are generated from Rust (`ts-rs`) and mirrored into the TS workspaces. **Never hand-edit files in `frontend/sync-client/src/services/types/` or `frontend/history-ui/src/lib/types/`** — run `scripts/update-api-types.sh` after changing anything Serde-derived in the server. ### Frontend workspaces - `sync-client` — the sync engine; published to consumers via `dist/`. All other TS workspaces depend on it via `file:../sync-client`. - `obsidian-plugin` — Obsidian plugin built from `sync-client`. - `local-client-cli` — same engine wrapped as a standalone CLI. - `history-ui` — vault-history web UI. - `test-client` — fuzz E2E harness (random ops across N processes). - `deterministic-tests` — scripted multi-client tests with an in-memory FS, run against a real server. ## Common commands Pre-push hygiene (formats, lints, runs tests, requires clean git state): ```sh scripts/check.sh --fix ``` Run the fuzz E2E (N parallel processes): ```sh scripts/e2e.sh 12 # Logs land in logs/log_.log. Clean with scripts/clean-up.sh ``` Run deterministic tests (require a release-built server in `sync-server/target/release/sync_server` — they spawn it themselves): ```sh cd sync-server && cargo build --release && cd .. cd frontend npm run build -w sync-client -w deterministic-tests node deterministic-tests/dist/cli.js # all node deterministic-tests/dist/cli.js --filter=rename # subset node deterministic-tests/dist/cli.js --filter=… -j 4 # cap parallelism ``` Run a single sync-client unit test by file: ```sh cd frontend/sync-client && npx tsx --test 'src/**/sync-event-queue.test.ts' ``` Server: dev runs from `sync-server/` against `config-e2e.yml`: ```sh cd sync-server cargo run config-e2e.yml # dev cargo build --release # used by both e2e harnesses cargo test # unit + ts-rs binding export tests ``` Frontend dev (sync-client + obsidian-plugin watch in parallel): ```sh cd frontend && npm install && npm run dev ``` Regenerate TS bindings from Rust types (touches `frontend/{sync-client,history-ui}/src/.../types/`): ```sh scripts/update-api-types.sh ``` ## SQLite / sqlx The server uses `sqlx::query!` macros that need a prepared `.sqlx` cache to compile offline. Touching any SQL means regenerating it: ```sh cd sync-server sqlx database create --database-url sqlite://db.sqlite3 sqlx migrate run --source src/app_state/database/migrations --database-url sqlite://db.sqlite3 cargo sqlx prepare --workspace ``` New migrations: `sqlx migrate add --source src/app_state/database/migrations `. ## Sync engine architecture Read `frontend/sync-client/src/sync-operations/` to follow the sync engine; the rest of `sync-client` is plumbing (filesystem ops, persistence, services, telemetry). **`SyncEventQueue`** (`sync-event-queue.ts`) holds two things: - `documents: Map` — the local "settled" view of tracked docs. - `events: SyncEvent[]` — pending operations (creates, updates, deletes, remote changes) in FIFO drain order. The map is keyed by `record.path`; the invariant `documents.get(record.path) === record` is maintained by every mutation point (constructor, `setDocument`, the rename branch in `enqueue`). `setDocument` mutates the same record object in place when relocating, so callers holding a reference to the record see path changes on the next read — this is load-bearing for `Syncer`'s drain handlers, which await across HTTP roundtrips and would otherwise see a captured-string-stale path. Always read `record.path` live; only snapshot it into a local for the explicit "did the path change during my await" comparison (`pathBeforeRoundtrip` in `handleMaybeMergingResponse` / `processRemoteUpdate`). **`Syncer`** (`syncer.ts`) drains events one at a time. Local creates/updates/deletes round-trip to the server over HTTP; remote changes arrive over the WebSocket and are enqueued as `RemoteChange` events that the same drain processes. `handleMaybeMergingResponse` is the shared response handler for create-and-update flows. **Conflict-uuid paths.** When a remote create or remote-rename can't claim its server-side path locally (the slot is occupied), the local file lands at `conflict--` and `record.intendedPath` records the path the server has it at. All server-bound requests honor `intendedPath`/`event.originalPath`, so the conflict-uuid path never leaks to the server. There is no automatic unwinding — convergence at conflict points is left to manual user resolution. **Watermark.** `lastSeenUpdateId` uses a `MinCovered` (a contiguous-prefix tracker over a stream of integers): we only advance the published min when the next consecutive id has been processed, so out-of-order RemoteChange ids don't fool the WebSocket handshake into requesting a too-recent catch-up. **Server catch-up.** The server's WS handshake replays events newer than the client's `last_seen_vault_update_id` from the `latest_document_versions` view (one row per doc, the latest). On those replayed rows `is_new_file` means *new to this client* (`creation_vault_update_id > last_seen_vault_update_id`), not "this row is the doc's first version" — necessary because the catch-up only carries the latest version; if a doc was created and updated past the watermark, the client never sees its create otherwise. ## Edge-case patterns the sync engine has to survive These are non-obvious from reading any single file; they fall out of the interaction between the queue, the watcher, the WebSocket, and the server's commit ordering. Treat the engine as a black box and what follows is the kinds of bugs you should expect to see: **FIFO drain order ≠ user's perceived order.** The queue is single-consumer and FIFO at processing time, but the producers are concurrent and async indirected: user FS actions go through watcher → microtask → enqueue (several microtasks deep), while WS messages go through the onmessage handler. A WS-driven event can land in the queue *between* two user actions even when the user "did them in order". When you read a log, "Decided to ..." timestamps mark the user's intent; they do **not** map to the order of `events.push`. **`event.path` is a side channel through disk.** Drain serialises which event runs, but it can't lock disk between events. Between an event's enqueue and its drain, another in-band event can have rewritten the file at that path (a remote-create that landed on the slot, a delete + re-create cycle by the user). Reading at drain time gets *current* disk content — which may be a different doc's bytes — and uploading them as the queued event's content is a duplicate-create / wrong-content bug. **Pending-create docId is a `Promise`, not a string, until the create acks.** Any event queued behind a still-in-flight LocalCreate that references the same doc carries the create's `resolvers.promise` as its `documentId`. Two consequences: (a) `===` comparisons against the resolved string in any rewrite loop silently fail; (b) the order of "swap Promise→docId" vs "rewrite paths in events" matters — swap first or the rewrite walks past the events you wanted to retarget. This is load-bearing in any code that touches the queue right after a create resolves. **`record.path` is mutated in place across awaits.** When a user rename runs while a drain handler is awaiting an HTTP roundtrip, the queue mutates the in-flight event's record so subsequent reads see the new path. Snapshotting `record.path` into a local at function entry and using it after an `await` writes/reads from a now-vacated slot. Snapshot only for the *deliberate* "did the path change while I was awaiting" comparison; everywhere else, read `record.path` live. **Conflict-uuid stashes are local-only divergence.** Whenever a slot collision deflects a doc to `conflict--…`, only the agent that deflected has that file. The cross-agent fuzz assertion ("every path matches across clients") will fire on it. By design these are awaiting manual user resolution — but if your fix silently creates one in a race that *would* converge given more time, the e2e fuzz will show it. **`MoveOnConflict.NEW` vs `EXISTING` is a policy choice, not a default.** NEW preserves the occupant and stashes us at conflict-uuid; EXISTING evicts the occupant and stashes *them*. Picking wrong creates either an orphaned stash on us or an orphaned tracking entry on the occupant. The right choice depends on whether the occupant is tracked, whether they have a pending RemoteChange that will move them, and which side the server has already committed to. **Pause / disable-sync mid-flight is a destabiliser.** A request whose HTTP committed server-side but whose response was discarded by an abort leaves the server holding a doc the client has no record of. The next re-enable's offline scan re-derives state from disk vs. the (now incomplete) `documents` map and emits a fresh LocalCreate — a duplicate of a doc already on the server, with a new docId. The catch-up then delivers the orphan as a "new" doc and writes it to disk. Final state: two files, two docIds, same content. Anything that aborts in-flight HTTPs (start-reset, vault change, destroy) needs the queue's documents map to be wiped or rebuilt from the server, not just the events array. **`scheduleSyncForOfflineChanges` clears `events[]` but not `documents`.** Every enable-sync wipes pending local events. The offline scan re-derives them by comparing disk to the documents map (matching by content hash to recognise renames). This is correct *if* the documents map reflects the last server state we committed to. If it lags (an in-flight create whose response we lost; a remote update we haven't applied yet), the scan misclassifies — a real rename becomes a delete + create with a new docId; a still-tracked doc whose file we deleted becomes a delete the server hasn't seen. **Watermark advancement is load-bearing both ways.** Branches that *skip* a remote event without advancing `lastSeenUpdateId` create permanent gaps that re-deliver forever. Branches that *advance* the watermark without applying the content lose data — the server has no further event to re-deliver, the catch-up only carries the latest version, and any state in between is gone. When in doubt: don't advance unless the event was actually applied (or deliberately discarded after weighing both halves). **`isNewFile` semantics differ between catch-up and real-time.** On WS handshake replay it means *new to this client* (`creation_vault_update_id > last_seen_vault_update_id`); on real-time broadcasts it means *this version is the create* (`creation_vault_update_id == vault_update_id`). A handler that receives "untracked doc + isNewFile=false" and decides based on one of the two interpretations will be wrong on the other channel. Reasoning about whether to fetch-and-treat-as-new vs. ignore needs to know which channel delivered the event. **Race-shape catalogue.** Bugs in this codebase tend to fall into a small set of shapes; recognising the shape from the log gets you most of the way to the cause: - *Same-path dedup race*: two clients create at the same path. Server deconflicts the second to `path (1)`. The losing client must relocate locally; mishandling routes the local file to a stash. - *Concurrent rename of same doc*: both clients rename. Server applies in commit order; the loser's local-rename HTTP must rebase against the server's new path or be dropped. - *Local rename + remote rename of same doc*: the local rename's HTTP needs to find the doc at the (now-different) server path; the matching disk file needs to follow without stranding. - *Pending create + remote create at same path*: the agent's pending file is already at the slot the remote wants; the remote's pending bytes will reach the slot the agent is trying to upload from. - *Create + delete + remote create at same path*: the user's local cycle queues two events; a remote create lands in between. The queued LocalCreate (or a re-emitted offline-scan one) reads disk content placed by the remote and uploads it as a third doc. - *Pause-mid-flight*: in-flight HTTP committed server-side, response abandoned client-side. After re-enable the offline scan can't tell the doc was already created and creates a duplicate. When triaging a fuzz failure, find the divergent file in `e2e-run.log`'s final dump (it shows each agent's tracked docs), grep the `log_.log` for that path/docId, and match the lifecycle against this catalogue before going deeper. ## Two complementary E2E harnesses - **`test-client` (fuzz):** random ops across N parallel processes for many minutes. Used by `scripts/e2e.sh`. Catches bugs nobody thought to write a test for, but reproductions are noisy. - **`deterministic-tests`:** scripted scenarios with an in-memory FS pinned to a real server. Used to *capture* a fuzz-discovered bug as a minimal repro before fixing it. See `frontend/deterministic-tests/README.md` for the step grammar (`pause-server`, `pause-websocket`, `barrier`, `assert-consistent`, etc.). When a fuzz failure surfaces, the workflow is: root-cause from logs → write a deterministic test that fails on the bug → fix → confirm both the deterministic test and `e2e.sh` pass. ## Style - TS: 4-space indent, no tabs, LF, prettier (`trailingComma: "none"`). YAML/MD use 2-space indent. - Rust: `rustfmt.toml` enforces 4-space spaces, LF. - Lint: ESLint for TS, Clippy for Rust, `cargo machete` for unused deps. All wired into `scripts/check.sh`.