From 077ba9416a5743f664730bccbe7e13d392f77c51 Mon Sep 17 00:00:00 2001 From: Andras Schmelczer Date: Sun, 6 Jul 2025 12:28:46 +0100 Subject: [PATCH] Improve docs --- README.md | 79 +++++++++++--------------------------- a.md | 1 - examples/website/README.md | 54 -------------------------- reconcile-js/src/index.ts | 6 --- scripts/build-js.sh | 7 ---- tests/examples/README.md | 38 ------------------ 6 files changed, 23 insertions(+), 162 deletions(-) delete mode 100644 a.md delete mode 100644 examples/website/README.md delete mode 100755 scripts/build-js.sh diff --git a/README.md b/README.md index 3920975..b7c0c2b 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,13 @@ # Reconcile: conflict-free 3-way text merging -> `diff3` but with automatic conflict resolution. +> [`diff3`](https://www.gnu.org/software/diffutils/manual/html_node/Invoking-diff3.html) but with automatic conflict resolution. [![Check](https://github.com/schmelczer/reconcile/actions/workflows/check.yml/badge.svg)](https://github.com/schmelczer/reconcile/actions/workflows/check.yml) [![Publish to GitHub Pages](https://github.com/schmelczer/reconcile/actions/workflows/gh-pages.yml/badge.svg)](https://github.com/schmelczer/reconcile/actions/workflows/gh-pages.yml) -Reconcile is a Rust and JavaScript (through WebAssembly) library for merging text without user intervention. It automatically resolves conflicts that would typically require manual intervention in traditional 3-way merge tools. +TODO: add links for crates and npm -```rust -use reconcile::{reconcile, BuiltinTokenizer}; - -let parent = "Merging text is hard!"; -let left = "Merging text is easy!"; -let right = "With reconcile, merging documents is hard!"; - -let deconflicted = reconcile(parent, &left.into(), &right.into(), &*BuiltinTokenizer::Word); -assert_eq!(deconflicted.apply().text(), "With reconcile, merging documents is easy!"); -``` +Reconcile is a Rust and JavaScript (through WebAssembly) library for merging text without user intervention. It automatically resolves conflicts that would typically require user action in traditional 3-way merge tools. ## Features @@ -31,6 +22,7 @@ assert_eq!(deconflicted.apply().text(), "With reconcile, merging documents is ea ### Rust Add to your `Cargo.toml`: + ```toml [dependencies] reconcile = "0.4" @@ -54,7 +46,7 @@ npm install reconcile ``` ```javascript -import { init, reconcile } from 'reconcile'; +import { init, reconcile } from "reconcile"; // Initialize the WASM module (required before first use) await init(); @@ -73,9 +65,9 @@ console.log(result.text); // "Hi beautiful world" Reconcile supports different tokenization strategies: -- **Word tokenizer** (`BuiltinTokenizer::Word`): Splits text into words (default, recommended for most use cases) -- **Character tokenizer** (`BuiltinTokenizer::Character`): Splits text into individual characters (fine-grained merging) -- **Custom tokenizer**: Implement your own tokenization logic +- **Word tokenizer** (`BuiltinTokenizer::Word`): Splits text into words (default, recommended for most use cases) +- **Character tokenizer** (`BuiltinTokenizer::Character`): Splits text into individual characters (fine-grained merging) +- **Custom tokenizer**: Implement your own tokenization logic ### Cursor Tracking @@ -86,11 +78,11 @@ const result = reconcile( "Hello world", { text: "Hello beautiful world", - cursors: [{ id: 1, position: 6 }] // After "Hello " + cursors: [{ id: 1, position: 6 }], // After "Hello " }, { text: "Hi world", - cursors: [{ id: 2, position: 0 }] // At beginning + cursors: [{ id: 2, position: 0 }], // At beginning } ); @@ -113,10 +105,10 @@ The algorithm starts similarly to `diff3`. Its inputs are a **parent** document 1. **Diff calculation**: First, 2-way diffs of (parent & left) and (parent & right) are computed using Myers' algorithm 2. **Tokenization**: The text is split into tokens (words, characters, etc.) for granular merging -3. **Operation transformation**: The resulting edits are weaved together using operational transformation principles, ensuring no changes are lost -4. **Conflict resolution**: Unlike traditional 3-way merge tools, Reconcile automatically resolves conflicts without producing conflict markers +3. **Diff cleaning**: The tokens of the same diff are reordered and merged to end up to maximise patch sizes +4. **Operation transformation (OT)**: The resulting edits are weaved together using operational transformation principles, ensuring no changes are lost -The key insight is that both insertions and deletions are preserved: if either side inserted text, it appears in the result; if either side deleted text, the deletion is applied, but insertions into deleted regions are still preserved. +`EditedText` (at least in the Rust library) exposes an implementation of OT. The primary purpose of this library isn't to implement OT but to provide automated text merging, howver, OT happens to provide an easy way of merging the output of Myers' diff. The same result could be achieved through many CRDT implementations as well. However, the merging quality is only as good as the 2-way diffs are. For instance, `reconcile` doesn't support `move` semantics as these are decomposed into an `insert` and `delete` operation by Myers'. ## Motivation @@ -124,7 +116,7 @@ Sometimes documents get edited concurrently by multiple users (or the same user To allow for offline editing, we could use CRDTs or Operational Transformation (OT) to come to a consistent resolution of the competing version. However, this requires capturing all user actions: insertions, deletes, move, copies, and pastes. In some applications, this is trivial if the document can only be edited through an editor that's in our control. But this isn't always the case. Users enjoy composable systems that don't lock them in. For example, one of the unique selling points of Obsidian is to provide an editor experience over a folder of Markdown files leaving the user free to change their technology of choice on a whim. -This means that files can be edited out-of-channel and the only information a text synchronization system can know is the current content of each tracked file. This is the same problem as what Git and similar version control systems solve. Although the problem is similar, there's a relevant difference between syncing source code and personal notes: in the case of the former, a semantically incorrect conflict resolution can wreak havoc in a code base, or worse, introduce a correctness bug unnoticed. Text notes are different though, humans are well-equipped to finding the signal in a noisy environment and "bad merges" might result in a clumsy sentence but the reader will likely still understand the gist and can fix it if necessary. +This means that files can be edited out-of-channel and the only information a text synchronization system can know is the current content of each tracked file. This is described as Differential Synchronization [1]. This is the same problem as what Git and similar version control systems solve but in a manual way. Although the problem is similar, there's a relevant difference between syncing source code and personal notes: in the case of the former, a semantically incorrect conflict resolution can wreak havoc in a code base, or worse, introduce a correctness bug unnoticed. Text notes are different though, humans are well-equipped to finding the signal in a noisy environment and "bad merges" might result in a clumsy sentence but the reader will likely still understand the gist and can fix it if necessary. > There are domains of human text which are less tolerant of mis-merges: for instance, two conflicting changes to a contract could result in a term getting negated in different ways from both sides, resulting in a double-negation, thus unknowingly changing the meaning. @@ -133,49 +125,24 @@ This means that files can be edited out-of-channel and the only information a te ### Prerequisites #### Install Node.js + - Install [nvm](https://github.com/nvm-sh/nvm): `curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash` - `nvm install 22` - `nvm use 22` - Optionally set the system-wide default: `nvm alias default 22` #### Set up Rust + - Install [`rustup`](https://rustup.rs): `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh` -- Install [`wasm-pack`](https://rustwasm.github.io/wasm-pack/installer): `curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh` -- `cargo install cargo-insta cargo-edit` - -### Building - -```bash -# Build Rust library -cargo build - -# Build WASM bindings -wasm-pack build --target web - -# Build JavaScript package -cd reconcile-js -npm install -npm run build -``` - -### Testing - -```bash -# Test Rust library -cargo test - -# Test JavaScript bindings -cd reconcile-js -npm test -``` +- `cargo install wasm-pack cargo-insta cargo-edit` ### Scripts -#### Publish new version -```sh -scripts/bump-version.sh patch -``` +- **Running tests**: `scripts/test.sh` +- **Formatting**: `scripts/lint.sh` +- **Building website**: `scripts/dev-website.sh` +- **Publishing new version**: `scripts/bump-version.sh patch` -## License +TODO: license -MIT +[1]: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35605.pdf diff --git a/a.md b/a.md deleted file mode 100644 index 24abc55..0000000 --- a/a.md +++ /dev/null @@ -1 +0,0 @@ -`EditedText` (at least in the Rust library) exposes an implementation of OT. The primary purpose of this library isn't to implement OT but to provide automated text merging, howver, OT happens to provide an easy way of merging the output of Myers' diff. The same result could be achieved through many CRDT implementations as well. However, the merging quality is only as good as the 2-way diffs are. For instance, `reconcile` doesn't support `move` operations the best as these are decomposed into an `insert` and `delete` operation by Myers'. diff --git a/examples/website/README.md b/examples/website/README.md deleted file mode 100644 index dd6e63f..0000000 --- a/examples/website/README.md +++ /dev/null @@ -1,54 +0,0 @@ -# Reconcile: Interactive Demo - -This is the interactive demo website for the Reconcile library. Visit [schmelczer.dev/reconcile](https://schmelczer.dev/reconcile) to try it out. - -## About the Demo - -The demo allows you to: - -- Enter three text versions (parent, left, right) -- See the reconciled result in real-time -- Experiment with different tokenization strategies -- Observe how cursor positions are updated during merging -- View the history of operations that led to the result - -## Features Demonstrated - -- **Conflict-free merging**: No conflict markers in the output -- **Cursor tracking**: See how cursor positions are automatically updated -- **Different tokenizers**: Compare word-level vs. character-level tokenization -- **Operation history**: Understand the merge process step-by-step - -## Running Locally - -```bash -# Build the WASM module first -cd ../.. -wasm-pack build --target web - -# Install dependencies and run the demo -cd examples/website -npm install -npm run dev -``` - -## Usage Examples - -Try these examples in the demo: - -### Basic merge -- **Parent**: "Hello world" -- **Left**: "Hello beautiful world" -- **Right**: "Hi world" -- **Result**: "Hi beautiful world" - -### Cursor tracking -- **Parent**: "The quick brown fox" -- **Left**: "The very quick brown fox" (cursor at position 4) -- **Right**: "The quick red fox" (cursor at position 10) -- **Result**: Cursors automatically repositioned - -### Character-level merging -Switch to character tokenizer for fine-grained merging of individual characters rather than whole words. - -For more examples and detailed documentation, see the [main README](../../README.md). diff --git a/reconcile-js/src/index.ts b/reconcile-js/src/index.ts index a2d66cb..75d64cf 100644 --- a/reconcile-js/src/index.ts +++ b/reconcile-js/src/index.ts @@ -16,9 +16,6 @@ export interface TextWithCursors { cursors: null | undefined | CursorPosition[]; } -/** - * Represents a cursor position with a unique identifier. - */ export interface CursorPosition { /** Unique identifier for the cursor */ id: number; @@ -42,9 +39,6 @@ export interface SpanWithHistory { history: History; } -/** - * Supported tokenizer types for text processing. - */ export type Tokenizer = "word" | "character"; let isInitialised = false; diff --git a/scripts/build-js.sh b/scripts/build-js.sh deleted file mode 100755 index 8097f25..0000000 --- a/scripts/build-js.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -set -e - -rm -rf pkg -wasm-pack build --target web --features wasm,wee_alloc - diff --git a/tests/examples/README.md b/tests/examples/README.md index 848cbcc..d35440f 100644 --- a/tests/examples/README.md +++ b/tests/examples/README.md @@ -2,44 +2,6 @@ This directory contains YAML test cases that demonstrate various reconcile scenarios. -## Format - -Each YAML file contains test documents with the following structure: - -```yaml -parent: "Original text" -left: - text: "Left version" - cursors: - - id: 1 - char_index: 5 -right: - text: "Right version" - cursors: - - id: 2 - char_index: 10 -expected: - text: "Expected result" - cursors: - - id: 1 - char_index: 8 - - id: 2 - char_index: 12 -``` - ## Cursor Position Notation In some test cases, the `|` character is used to denote cursor positions within the text. These characters are stripped before the actual reconcile logic is run, making it easier to visualize where cursors should be positioned. - -## Running Tests - -These examples are automatically tested as part of the test suite: - -```bash -cargo test -``` - -The tests verify that: -1. Text is merged correctly without conflicts -2. Cursor positions are updated accurately -3. The merge result is consistent regardless of argument order (left/right swap)