Change style

This commit is contained in:
Andras Schmelczer 2026-03-10 20:42:09 +00:00
parent 408ce5268f
commit deffa195b3
23 changed files with 72 additions and 76 deletions

View file

@ -13,11 +13,11 @@ A Rust and TypeScript library for merging conflicting text edits without manual
## Key features
- **No conflict markers** Clean, merged output without Git's `<<<<<<<` markers
- **Cursor tracking** Automatically repositions cursors and selections throughout the merging process
- **Flexible tokenisation** Word-level (default), character-level, line-level, or custom tokenisation strategies
- **Unicode support** Full UTF-8 support with proper handling of complex scripts and grapheme clusters
- **Cross-platform** Native Rust performance with WebAssembly bindings for JavaScript environments
- **No conflict markers** - Clean, merged output without Git's `<<<<<<<` markers
- **Cursor tracking** - Automatically repositions cursors and selections throughout the merging process
- **Flexible tokenisation** - Word-level (default), character-level, line-level, or custom tokenisation strategies
- **Unicode support** - Full UTF-8 support with proper handling of complex scripts and grapheme clusters
- **Cross-platform** - Native Rust performance with WebAssembly bindings for JavaScript environments
## Quick start
@ -93,12 +93,12 @@ Differential sync is implemented by [universal-sync](https://github.com/invisibl
`reconcile-text` starts off similarly to `diff3` ([4], [5]) but adds automated conflict resolution. Given a **parent** document and two modified versions (`left` and `right`), the following happens:
1. **Tokenisation** Input texts are split into meaningful units (words, characters, etc.) for granular merging
2. **Diff computation** Myers' algorithm calculates differences between (parent ↔ left) and (parent ↔ right)
3. **Diff optimisation** Operations are reordered and consolidated to maximise chained changes
4. **Operational Transformation** Edits are woven together using OT principles, preserving all modifications and updating cursors
1. **Tokenisation** - Input texts are split into meaningful units (words, characters, etc.) for granular merging
2. **Diff computation** - Myers' algorithm calculates differences between (parent ↔ left) and (parent ↔ right)
3. **Diff optimisation** - Operations are reordered and consolidated to maximise chained changes
4. **Operational Transformation** - Edits are woven together using OT principles, preserving all modifications and updating cursors
Whilst the primary goal of `reconcile-text` isn't to implement OT, it provides an elegant way to merge Myers' diff outputs. (For a dedicated Rust OT implementation, see [operational-transform-rs](https://github.com/spebern/operational-transform-rs).) The same could be achieved with CRDTs, which many libraries implement well for textsee [Loro](https://github.com/loro-dev/loro/), [cola](https://github.com/nomad/cola), and [automerge](https://github.com/automerge/automerge) as excellent examples.
Whilst the primary goal of `reconcile-text` isn't to implement OT, it provides an elegant way to merge Myers' diff outputs. (For a dedicated Rust OT implementation, see [operational-transform-rs](https://github.com/spebern/operational-transform-rs).) The same could be achieved with CRDTs, which many libraries implement well for text (see [Loro](https://github.com/loro-dev/loro/), [cola](https://github.com/nomad/cola), and [automerge](https://github.com/automerge/automerge)).
However, when only the end result of concurrent changes is observable, merge quality depends entirely on the quality of the underlying 2-way diffs. For instance, `move` operations cannot be supported because Myers' algorithm decomposes them into separate `insert` and `delete` operations, regardless of the merging algorithm used.
@ -114,17 +114,17 @@ Tools like `diff3` ([4]) and Git produce **conflict markers** (`<<<<<<<` / `====
The key differences from `reconcile-text`:
- **2-way vs 3-way** diff-match-patch diffs two texts and applies the result as a patch. It has no concept of a common ancestor and cannot reason about "left changes" vs "right changes". `reconcile-text` performs true 3-way merging, understanding the intent behind each side's edits.
- **2-way vs 3-way** - diff-match-patch diffs two texts and applies the result as a patch. It has no concept of a common ancestor and cannot reason about "left changes" vs "right changes". `reconcile-text` performs true 3-way merging, understanding the intent behind each side's edits.
- **Character-level only** Word-level and line-level diffs require encoding tokens as single Unicode characters before diffing ([7]). `reconcile-text` supports word, character, line, and custom tokenisation natively.
- **Character-level only** - Word-level and line-level diffs require encoding tokens as single Unicode characters before diffing ([7]). `reconcile-text` supports word, character, line, and custom tokenisation natively.
- **Patches can fail** `patch_apply` returns a boolean array indicating success per patch; failed patches are silently dropped. In Differential Synchronisation, failures self-correct in the next cycle, but for one-shot merges edits can be lost. `reconcile-text` always produces a complete merged result.
- **Patches can fail** - `patch_apply` returns a boolean array indicating success per patch; failed patches are silently dropped. In Differential Synchronisation, failures self-correct in the next cycle, but for one-shot merges edits can be lost. `reconcile-text` always produces a complete merged result.
- **No cursor tracking or change provenance** diff-match-patch does not reposition cursors or track which side made which edit. `reconcile-text` does both automatically.
- **No cursor tracking or change provenance** - diff-match-patch does not reposition cursors or track which side made which edit. `reconcile-text` does both automatically.
See the [comparison example](examples/compare-with-diff-match-patch.rs) for concrete cases where diff-match-patch garbles adjacent edits and silently drops an entire sentence, while `reconcile-text` merges both users' changes correctly.
> **When to use diff-match-patch instead**: when you don't have a common ancestor—for example, synchronising texts that have diverged through an unknown sequence of edits. If you have a common ancestor (as in most version control and collaborative editing scenarios), `reconcile-text` produces more reliable results.
> **When to use diff-match-patch instead**: when you don't have a common ancestor, for example synchronising texts that have diverged through an unknown sequence of edits. If you have a common ancestor (as in most version control and collaborative editing scenarios), `reconcile-text` produces more reliable results.
### CRDTs (Yjs, Automerge, Loro, diamond-types)
@ -132,13 +132,13 @@ Conflict-free Replicated Data Types guarantee convergence by mathematical constr
CRDTs capture every individual keystroke or operation, assigning each a unique identity. This makes them ideal when you control the complete editing infrastructure: the editor, the transport layer, and the storage format. They work peer-to-peer, handle arbitrary numbers of concurrent editors, and never lose an edit.
The trade-off is that CRDTs require **maintaining document state over time**an operation log or internal data structure that grows with the document's edit history. You cannot simply hand a CRDT library three plain strings and get a merged result. This makes them unsuitable for Differential Synchronisation scenarios where you only observe the final state of each document, which is exactly the niche `reconcile-text` fills.
The trade-off is that CRDTs require **maintaining document state over time** - an operation log or internal data structure that grows with the document's edit history. You cannot simply hand a CRDT library three plain strings and get a merged result. This makes them unsuitable for Differential Synchronisation scenarios where you only observe the final state of each document, which is exactly the niche `reconcile-text` fills.
> **When to use CRDTs instead**: if you control the complete editing stack and can capture every operation as it happens, CRDTs provide stronger convergence guarantees. They also support more than two concurrent editors naturally, whereas `reconcile-text` merges exactly two forks at a time (though merges can be chained).
### Operational Transformation (OT)
OT libraries like [ot.js](https://ot.js.org/) and [ShareJS](https://github.com/josephg/ShareJS) transform concurrent operations against each other so that applying them in any order produces the same result. Like CRDTs, they capture individual operations and require infrastructure to coordinate themtypically a central server that determines the canonical operation order.
OT libraries like [ot.js](https://ot.js.org/) and [ShareJS](https://github.com/josephg/ShareJS) transform concurrent operations against each other so that applying them in any order produces the same result. Like CRDTs, they capture individual operations and require infrastructure to coordinate them, typically a central server that determines the canonical operation order.
`reconcile-text` borrows the *concept* of OT (transforming one side's edits against the other) but applies it to a different problem. Instead of transforming individual keystrokes in real time, it transforms the consolidated diff output of two complete edits. This means it doesn't need a server, doesn't need to capture operations as they happen, and works entirely offline.

View file

@ -42,9 +42,9 @@ console.log(result.history); /*
`reconcile-text` offers different approaches to split text for merging:
- **Word tokeniser** (`"Word"`) Splits on word boundaries (recommended for prose)
- **Character tokeniser** (`"Character"`) Individual characters (fine-grained control)
- **Line tokeniser** (`"Line"`) Line-by-line (similar to `git merge` or more precisely [`git merge-file`](https://git-scm.com/docs/git-merge-file))
- **Word tokeniser** (`"Word"`) - Splits on word boundaries (recommended for prose)
- **Character tokeniser** (`"Character"`) - Individual characters (fine-grained control)
- **Line tokeniser** (`"Line"`) - Line-by-line (similar to `git merge` or more precisely [`git merge-file`](https://git-scm.com/docs/git-merge-file))
## Cursor Tracking

View file

@ -47,7 +47,7 @@ fn try_merge(parent: &str, left: &str, right: &str) {
}
/// Demonstrates cases where diff-match-patch silently produces incorrect
/// output, while reconcile-text preserves both users' edits correctly.
/// output, while reconcile-text preserves both users' edits correctly
///
/// Run it with:
/// `cargo run --example compare-with-diff-match-patch`

View file

@ -8,12 +8,12 @@
/>
<meta
name="description"
content="3-way text merging that automatically resolves conflicts. No more Git conflict markers just clean, merged results."
content="3-way text merging that automatically resolves conflicts. No more Git conflict markers - just clean, merged results."
/>
<meta property="og:title" content="3-Way Text Merge" />
<meta
property="og:description"
content="3-way text merging that automatically resolves conflicts. No more Git conflict markers just clean, merged results."
content="3-way text merging that automatically resolves conflicts. No more Git conflict markers - just clean, merged results."
/>
<meta property="og:type" content="website" />
<meta property="og:url" content="https://schmelczer.dev/reconcile" />
@ -85,7 +85,7 @@
>documentation</a
>
or try editing the text boxes below to see <code>reconcile-text</code> in
action. Use the tokenisation options to experiment with different approaches
action. Use the tokenisation options to experiment with different approaches -
the Rust library also supports custom tokenisers.
</p>
</header>
@ -145,7 +145,7 @@
<div class="text-area-card diamond-left">
<label
for="left"
title="First user's edits changes from this box appear in green in the result."
title="First user's edits - changes from this box appear in green in the result."
>
First user's edits
<div class="box Left"></div>
@ -156,7 +156,7 @@
<div class="text-area-card diamond-right">
<label
for="right"
title="Second user's edits changes from this box appear in blue in the result."
title="Second user's edits - changes from this box appear in blue in the result."
>
Second user's edits
<div class="box Right"></div>
@ -167,7 +167,7 @@
<div class="text-area-card diamond-result">
<label
for="merged"
title="The automatically merged result edit the boxes above to see changes in real-time."
title="The automatically merged result - edit the boxes above to see changes in real-time."
>
Merged result
<svg

View file

@ -54,11 +54,8 @@ export interface TextWithCursors {
}
/**
* Represents a text document with associated cursor positions.
*
* This interface is used both as input to reconcile functions (to specify where
* cursors are positioned in the original documents) and as output (with cursors
* automatically repositioned after merging).
* Like `TextWithCursors`, but cursors may be null or undefined (treated as empty).
* Used as input where cursor tracking is optional.
*/
export interface TextWithOptionalCursors {
/** The document's entire content as a string */
@ -97,7 +94,7 @@ export interface TextWithCursorsAndHistory {
text: string;
/**
* Array of cursor positions within the merged text. Can empty if there are no cursors to track.
* Array of cursor positions within the merged text. Can be empty if there are no cursors to track.
* All cursors are automatically repositioned from the left and right documents.
*/
cursors: CursorPosition[];
@ -124,9 +121,9 @@ export interface SpanWithHistory {
history: History;
}
const UNSUPPORTED_TOKENIZER_ERROR = `Unsupported tokenizer. Only ${BUILTIN_TOKENIZERS.join(
const UNSUPPORTED_TOKENIZER_ERROR = `Unsupported tokenizer, only ${BUILTIN_TOKENIZERS.join(
', '
)} are supported.`;
)} are supported`;
let isInitialised = false;
@ -192,7 +189,7 @@ export function reconcile(
* @param original - The original/base version of the text
* @param changed - The modified version of the text (either string or TextWithCursors with cursor positions)
* @param tokenizer - The tokenisation strategy, which is the same as used in `reconcile`.
* @returns An array representing the compact diff, with inserts as strings and deletes as negative integers.
* @returns An array of inserts (strings), deletes (negative integers), and retained spans (positive integers).
*/
export function diff(
original: string,
@ -221,7 +218,7 @@ export function diff(
* by the `diff` function) and reconstructs the modified text.
*
* @param original - The original/base version of the text
* @param diff - The compact diff array representing changes (inserts as strings, deletes as negative integers)
* @param diff - The compact diff array (inserts as strings, deletes as negative integers, retained spans as positive integers)
* @param tokenizer - The tokenisation strategy, which is the same as used in `reconcile`.
* @returns The reconstructed changed text as a string.
*/
@ -242,7 +239,7 @@ export function undiff(
/**
* Merges three versions of text and returns detailed provenance information.
*
* This function behaves identically to `reconcile()` but additionally provides
* This function behaves like `reconcile()` but also provides
* detailed historical information about the origin of each text span in the result.
* This is valuable for understanding how the merge was performed and which changes
* came from which source.

View file

@ -152,7 +152,7 @@
//! );
//! ```
//!
//! ## Efficiently serialize changes
//! ## Compact change serialization
//!
//! The edits can be serialized into a compact representation without the full
//! original text, making the size depend only on the changes made.

View file

@ -24,8 +24,7 @@ use crate::{Tokenizer, types::text_with_cursors::TextWithCursors};
/// into that span, the inserted text will be present in the return
/// value.
///
/// The function supports UTF-8. The arguments are tokenized at the
/// granularity of words.
/// Supports UTF-8. Arguments are tokenized using the provided `tokenizer`.
///
/// ```
/// use reconcile_text::{reconcile, BuiltinTokenizer};

View file

@ -54,7 +54,7 @@ where
T: PartialEq + Clone + Debug,
{
/// Create an `EditedText` from the given original and updated strings
/// using the provided tokenizer.
/// using the provided tokenizer
pub fn from_strings_with_tokenizer(
original: &'a str,
updated: &TextWithCursors,
@ -256,7 +256,7 @@ where
)
}
/// Apply the operations to the text and return the resulting text.
/// Apply the operations to the text and return the resulting text
#[must_use]
pub fn apply(&self) -> TextWithCursors {
let mut builder: StringBuilder<'_> = StringBuilder::new(self.text);
@ -355,8 +355,8 @@ where
/// This is useful for sending text diffs over the network if there's a
/// clear consensus on the original text.
///
/// Inserts are represented as strings, deletes as negative integers,
/// and equal spans as positive integers.
/// Inserts are strings, deletes are negative integers (character count),
/// and retained spans are positive integers (character count).
///
/// # Panics
///

View file

@ -11,7 +11,7 @@ use crate::{
},
};
/// Represents a change that can be applied on a `StringBuilder`.
/// Represents a change that can be applied on a `StringBuilder`
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Clone, PartialEq)]
pub enum Operation<T>
@ -47,7 +47,7 @@ where
T: PartialEq + Clone + Debug,
{
/// Creates an equal (retain) operation starting at the given character
/// offset in the original text.
/// offset in the original text
pub fn create_equal(order: usize, length: usize) -> Self {
Operation::Equal {
order,
@ -69,13 +69,13 @@ where
}
/// Creates an insert operation at the given character offset with the
/// given tokens.
/// given tokens
pub fn create_insert(order: usize, text: Vec<Token<T>>) -> Self {
Operation::Insert { order, text }
}
/// Creates a delete operation at the given character offset for the
/// specified number of characters.
/// specified number of characters
pub fn create_delete(order: usize, deleted_character_count: usize) -> Self {
Operation::Delete {
order,

View file

@ -3,7 +3,7 @@ use std::fmt::Debug;
use crate::{operation_transformation::Operation, raw_operation::RawOperation};
/// Turn raw operations into ordered operations while keeping track of the
/// original token's indexes.
/// original token's indexes
pub fn cook_operations<I, T>(raw_operations: I) -> impl Iterator<Item = Operation<T>>
where
I: IntoIterator<Item = RawOperation<T>>,

View file

@ -12,7 +12,7 @@ use wasm_bindgen::prelude::*;
pub mod token;
/// Type alias for tokenizer functions that split a string into tokens.
/// Type alias for tokenizer functions that split a string into tokens
pub type Tokenizer<T> = dyn Fn(&str) -> Vec<Token<T>>;
#[cfg_attr(feature = "wasm", wasm_bindgen)]

View file

@ -1,6 +1,6 @@
use super::token::Token;
/// Splits text into UTF-8 characters.
/// Splits text into UTF-8 characters
///
/// ```not_rust
/// "Hey!" -> ["H", "e", "y", "!"]

View file

@ -1,6 +1,6 @@
use super::token::Token;
/// Splits text into lines, preserving line endings as separate tokens.
/// Splits text into lines, preserving line endings as separate tokens
///
/// ## Example
///

View file

@ -14,21 +14,21 @@ pub struct Token<T>
where
T: PartialEq + Clone + Debug,
{
/// The normalized form of the token used deriving the diff.
/// The normalized form of the token used deriving the diff
normalized: T,
/// The original string, that should be inserted or deleted in the document.
/// The original string, that should be inserted or deleted in the document
original: String,
/// Whether the token is semantically joinable with the previous token.
/// Whether the token is semantically joinable with the previous token
pub is_left_joinable: bool,
/// Whether the token is semantically joinable with the next token.
/// Whether the token is semantically joinable with the next token
pub is_right_joinable: bool,
}
/// Trivial implementation of Token when the normalized form is the same as the
/// original string.
/// original string
impl From<&str> for Token<String> {
fn from(text: &str) -> Self { Token::new(text.to_owned(), text.to_owned(), true, true) }
}

View file

@ -1,7 +1,7 @@
use super::token::Token;
/// Splits text on word boundaries, creating tokens of alternating words and
/// whitespace with the whitespace getting unique IDs.
/// whitespace with the whitespace getting unique IDs
///
/// ## Example
///

View file

@ -4,7 +4,7 @@ use serde::{Deserialize, Serialize};
use wasm_bindgen::prelude::*;
/// `CursorPosition` represents the position of an identifiable cursor in a text
/// document based on its (UTF-8) character index.
/// document based on its (UTF-8) character index
#[allow(clippy::unsafe_derive_deserialize)]
#[cfg_attr(feature = "wasm", wasm_bindgen)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]

View file

@ -15,7 +15,7 @@ pub enum History {
RemovedFromRight = "RemovedFromRight",
}
/// Provenance label for each span returned by `apply_with_history`.
/// Provenance label for each span returned by `apply_with_history`
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[cfg(not(feature = "wasm"))]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]

View file

@ -4,7 +4,7 @@ use std::fmt::Display;
use serde::{Deserialize, Serialize};
/// Pretty-printable flag to tell which conflicting edit (side)
/// an operation is associated with.
/// an operation is associated with
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Side {

View file

@ -5,7 +5,7 @@ use wasm_bindgen::prelude::*;
use crate::types::history::History;
/// A text span annotated with its origin in a merge result.
/// A text span annotated with its origin in a merge result
#[allow(clippy::unsafe_derive_deserialize)]
#[cfg_attr(feature = "wasm", wasm_bindgen)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]

View file

@ -3,7 +3,7 @@ use std::fmt::Debug;
use crate::Token;
/// Given two lists of tokens, returns `length` where the `old` list
/// somewhere within contains the `length` prefix of the `new` list.
/// somewhere within contains the `length` prefix of the `new` list
///
/// ## Example
///

View file

@ -77,7 +77,7 @@ where
/// We can't use a traditional Vec to represent `V` since we use `k` as an index
/// and it can take on negative values. So instead `V` is represented as a
/// light-weight wrapper around a Vec plus an `offset` which is the maximum
/// value `k` can take on in order to map negative `k`'s back to a value >= 0.
/// value `k` can take on to map negative `k`'s back to a value >= 0.
#[derive(Debug)]
struct V {
offset: isize,

View file

@ -34,11 +34,11 @@ impl StringBuilder<'_> {
}
}
/// Insert a string at the end of the built buffer.
/// Insert a string at the end of the built buffer
pub fn insert(&mut self, text: &str) { self.buffer.push_str(text); }
/// Skip copying `length` characters from the original string to the built
/// buffer.
/// buffer
pub fn delete(&mut self, length: usize) {
if length == 0 {
return;
@ -52,7 +52,7 @@ impl StringBuilder<'_> {
}
}
/// Copy `length` characters from the original string to the built buffer.
/// Copy `length` characters from the original string to the built buffer
pub fn retain(&mut self, length: usize) {
self.buffer.extend(self.original.by_ref().take(length));

View file

@ -8,7 +8,7 @@ use crate::{BuiltinTokenizer, CursorPosition, EditedText, SpanWithHistory, TextW
#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc<'_> = wee_alloc::WeeAlloc::INIT;
/// WASM wrapper around `crate::reconcile` for merging text.
/// WASM wrapper around `crate::reconcile` for merging text
#[wasm_bindgen(js_name = reconcile)]
#[must_use]
pub fn reconcile(
@ -22,7 +22,7 @@ pub fn reconcile(
crate::reconcile(parent, left, right, &*tokenizer).apply()
}
/// WASM wrapper around `crate::reconcile` that also returns provenance history.
/// WASM wrapper around `crate::reconcile` that also returns provenance history
#[wasm_bindgen(js_name = reconcileWithHistory)]
#[must_use]
pub fn reconcile_with_history(
@ -48,13 +48,13 @@ pub fn reconcile_with_history(
///
/// # Arguments
///
/// - `parent`: The common parent document.
/// - `left`: The left document updated by one user.
/// - `right`: The right document updated by another user.
/// - `parent`: The common parent document
/// - `left`: The left document updated by one user
/// - `right`: The right document updated by another user
///
/// # Returns
///
/// The merged document.
/// The merged document
#[wasm_bindgen(js_name = genericReconcile)]
#[must_use]
pub fn generic_reconcile(
@ -80,7 +80,7 @@ pub fn generic_reconcile(
}
/// WASM wrapper around getting a compact diff representation of two texts as a
/// list of numbers and strings.
/// list of numbers and strings
#[wasm_bindgen(js_name = diff)]
#[must_use]
pub fn diff(parent: &str, changed: &TextWithCursors, tokenizer: BuiltinTokenizer) -> Vec<JsValue> {
@ -94,7 +94,7 @@ pub fn diff(parent: &str, changed: &TextWithCursors, tokenizer: BuiltinTokenizer
.collect()
}
/// Inverse of `diff`, applies a compact diff representation to a parent text.
/// Inverse of `diff`, applies a compact diff representation to a parent text
///
/// # Errors
///