schmelczer-dev/src/content/posts/reconcile-text-3-way-merge.md
Andras Schmelczer 2c37e7fa62
All checks were successful
Deploy to Pages / build (pull_request) Successful in 1m48s
Non-cringify
2026-05-25 21:31:09 +01:00

5.6 KiB

title description date projectPeriod thumbnail tags featuredOrder role stack scale outcome audience links media
A 3-Way Text Merger That Never Shows Conflict Markers reconcile-text merges Markdown notes from three editors I don't control, with no operation history. Here's why git, CRDTs, and diff-match-patch each failed me. 2026-05-21 2025
src alt
./_assets/reconcile.png The reconcile-text logo and tagline "Conflict-free 3-way text merging".
systems
tools
web
2 Library author
Rust
WebAssembly
Python
pyo3
wasm-bindgen
One Rust core, three published packages (crates.io, npm, PyPI), driving an Obsidian sync plugin A small Rust library that auto-resolves prose conflicts, with WASM and Python bindings recruiter-relevant
label url
Demo /reconcile/
label url
Source https://github.com/schmelczer/reconcile
label url
crates.io https://crates.io/crates/reconcile-text
label url
npm https://www.npmjs.com/package/reconcile-text
label url
PyPI https://pypi.org/project/reconcile-text/
type src alt caption
image ./_assets/reconcile.png The reconcile-text logo, a stylised merge arrow, with the tagline "Conflict-free 3-way text merging". reconcile-text weaves conflicting edits together instead of asking a human to choose.

The two-bullet version:

  • Given a parent text and two edited versions, return one merged string. No conflict markers, no dropped edits, no operation log required.
  • Single Rust core, shipped as a crate, an npm package (via wasm-bindgen), and a PyPI package (via pyo3). The Obsidian sync plugin I wrote alongside it is the first consumer.

Why I wrote it

I keep Markdown notes in three editors I don't control the internals of: Vim on my laptop, VS Code on my work machine, Obsidian on my phone. When two of them edit the same note between syncs, I have three files: the last-synced parent and two divergent children. That's the input. I want one merged file out, and I want to hand it back to the editors without conflict markers, because <<<<<<< HEAD is not something a notes app should ever show me.

Every existing tool got close and missed:

  • git merge-file does exactly the right thing structurally, then writes markers into the output. That's correct for source code and wrong for prose.
  • CRDTs and OT both assume you own the editing pipeline down to the keystroke. I don't. I'm looking at three files.
  • diff-match-patch doesn't take a common ancestor. On adjacent edits it quietly produces wrong output. I have a runnable example in the repo.

So the library does exactly one thing: pure function from three strings to one. No async, no networking, no concurrency, no plugins. Anything outside that boundary is somebody else's library.

The decisions worth naming

Myers diff per side, then weave the diffs. Each child is diffed against the parent, the two edit scripts are optimised so adjacent changes group cleanly, then a single weaving pass interleaves them into one ordered op sequence that produces the merged text. The weave borrows the shape of operational transformation, but the inputs are batched complete diffs, not live keystrokes, so it only runs once per merge.

Tokeniser is the user knob. This is the choice I'd defend hardest. Most of what people want when they say "merge differently" isn't a new algorithm — it's a different unit. Word-level tokenisation turns most "conflicts" in prose into two adjacent edits that coexist. Line-level makes it behave like git merge-file. Markdown-level merges on headings and list items. Same engine, four different products depending on what you call a token.

Cursors are first-class merge inputs. Each cursor has a stable ID and rides through the merge so a collaborative editor can ask "where did this cursor go?" without reconstructing it from the output text. This is the bit that made it useful to anything that wasn't just my sync script.

The Rust core is generic; the FFI surface is not. Inside Rust, the tokeniser is a dyn Fn(&str) -> Vec<Token<T>>. That dies the moment you try to pass it through wasm-bindgen or pyo3. The fix was a closed enum of built-in tokenisers for non-Rust callers, with the generic version reserved for Rust users. Not elegant, but the alternative was per-binding glue forever.

WASM size mattered enough to tune for it. The release profile is aggressive about size, and the JS package ships a small leak detector that warns if you forget to free wasm-bindgen objects. I lost an afternoon to that the first time and didn't want anyone else to.

What's held up, what I'd change

  • Kept: the never-emits-markers, never-drops-edits guarantee. It's the only reason a sync engine can call this library without an escape hatch.
  • Kept: the comparison example against diff-match-patch. It's a runnable program in the repo showing exact inputs where the alternative is wrong. Way more convincing than a benchmark table.
  • Cut: the snapshot tests do well on regressions and badly on unknown edge cases. Three-way merging is exactly what proptest was made for, and I should have written generators on day one.
  • Next: I want to be more explicit about the boundary. reconcile-text is a merge primitive, not a live collab engine. If you have a keystroke stream and a real-time channel, use Yjs or Automerge. This library is for when you don't.

If you take one idea from this

Prose deserves a merger that prefers a slightly clumsy sentence over a marker. Code doesn't. That one asymmetry is the whole reason the library exists in the shape it does; everything else fell out of taking it seriously.