schmelczer-dev/src/content/posts/reconcile-text-3-way-merge.md at 2c37e7fa62ceb79020f9f3821762c8850a837183

andras/schmelczer-dev

Fork 0

Andras Schmelczer 2c37e7fa62

Deploy to Pages / build (pull_request) Successful in 1m48s

Details

Non-cringify

2026-05-25 21:31:09 +01:00

5.6 KiB

Raw Blame History

title

description

date

projectPeriod

thumbnail

tags

featuredOrder

role

stack

scale

outcome

audience

links

media

A 3-Way Text Merger That Never Shows Conflict Markers

reconcile-text merges Markdown notes from three editors I don't control, with no operation history. Here's why git, CRDTs, and diff-match-patch each failed me.

2026-05-21

2025

src	alt
./_assets/reconcile.png	The reconcile-text logo and tagline "Conflict-free 3-way text merging".

systems

tools

web

Library author

Rust

WebAssembly

Python

pyo3

wasm-bindgen

One Rust core, three published packages (crates.io, npm, PyPI), driving an Obsidian sync plugin

A small Rust library that auto-resolves prose conflicts, with WASM and Python bindings

recruiter-relevant

label	url
Demo	/reconcile/

label	url
Source	https://github.com/schmelczer/reconcile

label	url
crates.io	https://crates.io/crates/reconcile-text

label	url
npm	https://www.npmjs.com/package/reconcile-text

label	url
PyPI	https://pypi.org/project/reconcile-text/

type	src	alt	caption
image	./_assets/reconcile.png	The reconcile-text logo, a stylised merge arrow, with the tagline "Conflict-free 3-way text merging".	reconcile-text weaves conflicting edits together instead of asking a human to choose.

The two-bullet version:

Given a parent text and two edited versions, return one merged string. No conflict markers, no dropped edits, no operation log required.
Single Rust core, shipped as a crate, an npm package (via wasm-bindgen), and a PyPI package (via pyo3). The Obsidian sync plugin I wrote alongside it is the first consumer.

Why I wrote it

I keep Markdown notes in three editors I don't control the internals of: Vim on my laptop, VS Code on my work machine, Obsidian on my phone. When two of them edit the same note between syncs, I have three files: the last-synced parent and two divergent children. That's the input. I want one merged file out, and I want to hand it back to the editors without conflict markers, because <<<<<<< HEAD is not something a notes app should ever show me.

Every existing tool got close and missed:

git merge-file does exactly the right thing structurally, then writes markers into the output. That's correct for source code and wrong for prose.
CRDTs and OT both assume you own the editing pipeline down to the keystroke. I don't. I'm looking at three files.
diff-match-patch doesn't take a common ancestor. On adjacent edits it quietly produces wrong output. I have a runnable example in the repo.

So the library does exactly one thing: pure function from three strings to one. No async, no networking, no concurrency, no plugins. Anything outside that boundary is somebody else's library.

The decisions worth naming

Myers diff per side, then weave the diffs. Each child is diffed against the parent, the two edit scripts are optimised so adjacent changes group cleanly, then a single weaving pass interleaves them into one ordered op sequence that produces the merged text. The weave borrows the shape of operational transformation, but the inputs are batched complete diffs, not live keystrokes, so it only runs once per merge.

Tokeniser is the user knob. This is the choice I'd defend hardest. Most of what people want when they say "merge differently" isn't a new algorithm — it's a different unit. Word-level tokenisation turns most "conflicts" in prose into two adjacent edits that coexist. Line-level makes it behave like git merge-file. Markdown-level merges on headings and list items. Same engine, four different products depending on what you call a token.

Cursors are first-class merge inputs. Each cursor has a stable ID and rides through the merge so a collaborative editor can ask "where did this cursor go?" without reconstructing it from the output text. This is the bit that made it useful to anything that wasn't just my sync script.

The Rust core is generic; the FFI surface is not. Inside Rust, the tokeniser is a dyn Fn(&str) -> Vec<Token<T>>. That dies the moment you try to pass it through wasm-bindgen or pyo3. The fix was a closed enum of built-in tokenisers for non-Rust callers, with the generic version reserved for Rust users. Not elegant, but the alternative was per-binding glue forever.

WASM size mattered enough to tune for it. The release profile is aggressive about size, and the JS package ships a small leak detector that warns if you forget to free wasm-bindgen objects. I lost an afternoon to that the first time and didn't want anyone else to.

What's held up, what I'd change

Kept: the never-emits-markers, never-drops-edits guarantee. It's the only reason a sync engine can call this library without an escape hatch.
Kept: the comparison example against diff-match-patch. It's a runnable program in the repo showing exact inputs where the alternative is wrong. Way more convincing than a benchmark table.
Cut: the snapshot tests do well on regressions and badly on unknown edge cases. Three-way merging is exactly what proptest was made for, and I should have written generators on day one.
Next: I want to be more explicit about the boundary. reconcile-text is a merge primitive, not a live collab engine. If you have a keystroke stream and a real-time channel, use Yjs or Automerge. This library is for when you don't.

If you take one idea from this

Prose deserves a merger that prefers a slightly clumsy sentence over a marker. Code doesn't. That one asymmetry is the whole reason the library exists in the shape it does; everything else fell out of taking it seriously.

5.6 KiB Raw Blame History

Why I wrote it

The decisions worth naming

What's held up, what I'd change

If you take one idea from this

5.6 KiB

Raw Blame History