vault-link/docs/architecture/sync-algorithm.md
2025-11-30 15:24:52 +00:00

14 KiB
Raw Blame History

Sync Algorithm

VaultLink uses operational transformation (OT) to handle concurrent edits and maintain consistency across clients.

Operational Transformation

Operational transformation is a technique for managing concurrent edits to the same document. It transforms operations (edits) so they can be applied in different orders while preserving user intent.

Why OT?

Traditional conflict resolution approaches:

  • Last write wins: Loses data, frustrating for users
  • Manual merging: Interrupts workflow, requires user intervention
  • Version branching: Complex, not suitable for real-time sync

Operational transformation:

  • Automatic: No user intervention required
  • Preserves all edits: No data loss
  • Real-time: Changes appear immediately
  • Intuitive: Behaviour matches user expectations

The reconcile-text Library

VaultLink uses the reconcile-text Rust library for operational transformation on text documents.

Why reconcile-text over CRDTs?

VaultLink faces a differential synchronisation challenge: users edit Obsidian vaults with various editors (Obsidian desktop, Obsidian mobile, Vim, VS Code, or any text editor), often while offline. This means we only observe the final state of each document after editing, not the individual keystrokes or operations that produced it.

The fundamental problem:

  • CRDTs and traditional OT require capturing every individual operation (each character insertion, deletion, cursor movement)
  • VaultLink's reality: Users edit files with arbitrary tools, sync happens after the fact
  • What we know: Parent version and two modified versions
  • What we don't know: The sequence of operations that created those modifications

Why reconcile-text wins for this use case:

  1. Works with end states only: reconcile-text performs conflict-free 3-way merging given just parent, left, and right versions—no operation history needed

  2. Editor-agnostic: Users can edit with any tool without requiring VaultLink-specific plugins or operation tracking

  3. Offline-first: Edits made while disconnected are merged cleanly when sync resumes, because we're diffing final states rather than replaying operations

  4. No conflict markers: Unlike Git merge, produces clean merged output without <<<<<<< markers that interrupt note-taking flow

  5. Human text forgiveness: For knowledge bases and documentation, a slightly imperfect merge (e.g., minor word order issues) is vastly preferable to manual conflict resolution

  6. Simpler infrastructure: No need for complex operation capture, transformation logs, or tombstone management that CRDTs require

The trade-off:

CRDTs excel when you control the entire editing infrastructure and can capture every operation. reconcile-text excels when you're synchronising independently-edited files—exactly VaultLink's scenario. The merge quality depends on Myers' diff algorithm rather than operation history, which is the correct trade-off for differential sync.

For note-taking workflows where users value editor freedom and offline editing, this approach provides superior user experience compared to either CRDTs (which would require operation tracking) or Git-style merging (which requires manual conflict resolution).

Learn more about reconcile-text →

How It Works

Given three versions (parent, left, right), reconcile-text produces a merged result.

How reconcile-text works:

  1. Tokenisation: Split text into words (using BuiltinTokenizer::Word)
  2. Three-way diff: Compare parent→left and parent→right changes
  3. Merge: Combine non-conflicting changes, prefer content preservation for conflicts
  4. Result: Merged text with both edits applied

Example:

Parent:  "The quick brown fox"
User A:  "The quick red fox"      (changes "brown" → "red")
User B:  "The very quick brown fox"  (inserts "very ")

Merged:  "The very quick red fox"  (both changes applied)

Merge conditions: Only .md and .txt files with valid UTF-8 get merged. Binary files or other extensions use last-write-wins.

Operation Types

The algorithm handles these operations:

  • Insert: Add text at position
  • Delete: Remove text from position
  • Retain: Keep existing text unchanged

Transformation Process

  1. Client A makes edit and sends to server
  2. Client B makes concurrent edit and sends to server
  3. Server receives both edits
  4. Server transforms operations to account for concurrent changes
  5. Server applies merged result to database
  6. Server sends transformed operations to both clients
  7. Clients apply transformed operations locally

Sync State Management

VaultLink maintains sync state to track which changes have been applied.

Version Vectors

Each document has a version tracked by:

  • Server version: Incremented on each change
  • Client cursors: Track which version each client has seen

This enables:

  • Efficient syncing (only send changes since last sync)
  • Conflict detection (concurrent edits to same version)
  • Ordering of operations

Cursor Management

Clients maintain a cursor position:

struct Cursor {
    vault_id: String,
    client_id: String,
    last_version: u64,
    last_updated: DateTime,
}

On sync:

  1. Client sends cursor (last seen version)
  2. Server returns all changes since that version
  3. Client applies changes and updates cursor

Conflict Resolution Flow

Scenario: Concurrent Edits

Two users edit the same paragraph simultaneously.

Initial state:

Version 10: "The quick brown fox jumps over the lazy dog."

User A's edit (version 11):

"The quick brown fox jumps over the very lazy dog."

Inserts "very " at position 40

User B's edit (also from version 10):

"The quick red fox jumps over the lazy dog."

Replaces "brown" with "red" at position 10

Server Processing

  1. Receive User A's operation:

    • Base: version 10
    • Operation: Insert("very ", position=40)
    • Apply to database → version 11
  2. Receive User B's operation:

    • Base: version 10
    • Operation: Replace("brown"→"red", position=10)
    • Conflict detected: Base is version 10, but current is version 11
  3. Transform User B's operation:

    • Transform against User A's operation
    • Adjust positions/content as needed
    • Apply transformed operation → version 12
  4. Broadcast updates:

    • Send User A's operation to User B
    • Send transformed User B's operation to User A

Final Result

Version 12: "The quick red fox jumps over the very lazy dog."

Both edits are preserved in the final document.

Edge Cases

1. Delete vs Insert Conflict

Scenario: User A deletes a paragraph while User B edits it.

Resolution:

  • OT algorithm prioritizes preservation of content
  • Insert operation is transformed to account for deletion
  • Typically results in inserted content appearing nearby

Example:

Base: "Line 1\nLine 2\nLine 3"

User A: Delete Line 2 → "Line 1\nLine 3"
User B: Edit Line 2 → "Line 1\nLine 2 modified\nLine 3"

Result: "Line 1\nLine 2 modified\nLine 3"

(Insert takes precedence, preserving user content)

2. Overlapping Edits

Scenario: Two users edit overlapping regions.

Resolution:

  • OT splits operations into non-overlapping segments
  • Applies each segment independently
  • Merges results

3. Delete vs Delete

Scenario: Two users delete overlapping text.

Resolution:

  • Deletes are merged
  • Final result has the union of deleted ranges removed

4. Network Partitions

Scenario: Client loses connection, makes edits offline, reconnects.

Resolution:

  1. Client queues edits locally
  2. On reconnect, sends all queued operations
  3. Server applies OT against all operations that happened during partition
  4. Client receives transformed operations and applies

Performance Characteristics

Time Complexity

  • Single operation: O(1) for most operations
  • Transformation: O(n) where n is operation size
  • Conflict resolution: O(m × n) where m is number of concurrent operations

Space Complexity

  • Version history: Grows with number of changes
  • Cursors: O(clients × vaults)
  • Active operations: Minimal (processed in real-time)

Optimisation

VaultLink optimises for:

  • Small, frequent edits (typical typing patterns)
  • Text documents (not binary files)
  • Real-time processing (no batching delay)

Limitations

Binary and Non-Mergeable Files

Only .md and .txt files get automatic merging. Everything else uses last-write-wins.

Binary detection:

  • Files with NUL bytes (0x00)
  • Files failing UTF-8 validation

Even .md files are treated as binary if they fail UTF-8 checks.

Last-write-wins behaviour:

User A uploads image.png → Server version 1
User B uploads image.png → Server version 2 (A's upload lost)

Workaround: Avoid concurrent edits to non-text files. See all limitations →

Large Documents

Very large documents (> 1MB) may have:

  • Higher transformation costs
  • Slower sync times
  • Increased memory usage

Workaround: Split large documents or increase timeout settings.

Complex Formatting

Markdown with complex structures may occasionally produce unexpected results:

  • Nested lists
  • Tables
  • Code blocks

Workaround: Manual cleanup if needed, or minimize concurrent edits to complex structures.

Consistency Guarantees

Strong Consistency

VaultLink provides strong eventual consistency:

  • All clients eventually converge to the same state
  • Operations applied in causal order
  • No data loss under normal operation

Ordering Guarantees

  • Operations from the same client are applied in order
  • Concurrent operations may be applied in any order
  • Final result is independent of operation order (commutative)

Durability

  • Operations are written to SQLite before acknowledgment
  • SQLite ACID guarantees protect against data loss
  • Clients retry failed uploads

Comparison with Other Approaches

Git-style Merging

Aspect Git Merge VaultLink OT
Real-time No Yes
Manual conflict resolution Yes No
Branching Yes No
Automatic merge Limited Always
Use case Code changes Collaborative documents

CRDTs (Conflict-free Replicated Data Types)

Aspect CRDTs VaultLink (reconcile-text)
Operation tracking Required (every keystroke) Not required (end states only)
Editor freedom Limited (must use CRDT-aware editor) Unlimited (any text editor works)
Offline editing Requires operation log Works with file comparison
Server required No Yes
Memory overhead Higher (tombstones, metadata) Lower (versions only)
Infrastructure complexity Higher Lower
Best for Controlled editing environments Independent file editing (Obsidian, Vim, VS Code)

Key insight: CRDTs are superior when you can capture every operation. reconcile-text is superior when users edit files independently with arbitrary tools—exactly VaultLink's scenario.

Last Write Wins

Aspect LWW VaultLink OT
Data loss Yes No
Simplicity High Medium
User experience Poor Excellent
Performance Best Good

Algorithm Details

Transformation Rules

When transforming operation A against operation B:

  1. Insert vs Insert:

    • If positions equal: Order by client ID
    • If different positions: Adjust positions
  2. Insert vs Delete:

    • If insert in deleted range: Shift insert position
    • If insert after delete: Adjust position by deleted length
  3. Delete vs Delete:

    • If ranges overlap: Merge delete ranges
    • If ranges disjoint: Adjust positions
  4. Retain vs Any:

    • Retain operations don't conflict
    • Simply adjust positions

Transformation Example

// Pseudo-code for transformation
fn transform(op_a: Operation, op_b: Operation) -> (Operation, Operation) {
    match (op_a, op_b) {
        (Insert(pos_a, text_a), Insert(pos_b, text_b)) => {
            if pos_a < pos_b {
                (op_a, Insert(pos_b + text_a.len(), text_b))
            } else if pos_a > pos_b {
                (Insert(pos_a + text_b.len(), text_a), op_b)
            } else {
                // Same position, use client ID to break tie
                if client_id_a < client_id_b {
                    (op_a, Insert(pos_b + text_a.len(), text_b))
                } else {
                    (Insert(pos_a + text_b.len(), text_a), op_b)
                }
            }
        }
        // ... other cases
    }
}

Best Practices

For Smooth Collaboration

  1. Small edits: Make small, focused changes for easier merging
  2. Coordinate major changes: Discuss large refactors with team
  3. Monitor sync status: Ensure changes are uploaded before signing off
  4. Test conflict resolution: Verify behaviour matches expectations

For Developers

  1. Text files preferred: OT works best on text
  2. Limit file sizes: Keep documents reasonably sized
  3. Binary files: Use versioning or avoid concurrent edits
  4. Testing: Test concurrent edit scenarios thoroughly

Further Reading