This commit is contained in:
Andras Schmelczer 2025-11-22 11:19:08 +00:00
parent 56c1f4d58b
commit 50a95b114d
19 changed files with 4663 additions and 1 deletions

View file

@ -0,0 +1,361 @@
# Sync Algorithm
VaultLink uses operational transformation (OT) to handle concurrent edits and maintain consistency across clients. This document explains how the algorithm works.
## Operational Transformation
Operational transformation is a technique for managing concurrent edits to the same document. It transforms operations (edits) so they can be applied in different orders while preserving user intent.
### Why OT?
Traditional conflict resolution approaches:
- **Last write wins**: Loses data, frustrating for users
- **Manual merging**: Interrupts workflow, requires user intervention
- **Version branching**: Complex, not suitable for real-time sync
Operational transformation:
- **Automatic**: No user intervention required
- **Preserves all edits**: No data loss
- **Real-time**: Changes appear immediately
- **Intuitive**: Behavior matches user expectations
## The reconcile-text Library
VaultLink uses the [`reconcile-text`](https://crates.io/crates/reconcile-text) Rust library for operational transformation on text documents.
### How It Works
Given a base document and two sets of changes, OT produces a merged result that includes both changes.
**Example**:
```
Base document: "Hello world"
User A: "Hello beautiful world" (inserts "beautiful ")
User B: "Hello world!" (inserts "!")
OT result: "Hello beautiful world!" (both changes applied)
```
### Operation Types
The algorithm handles these operations:
- **Insert**: Add text at position
- **Delete**: Remove text from position
- **Retain**: Keep existing text unchanged
### Transformation Process
1. **Client A** makes edit and sends to server
2. **Client B** makes concurrent edit and sends to server
3. **Server** receives both edits
4. **Server** transforms operations to account for concurrent changes
5. **Server** applies merged result to database
6. **Server** sends transformed operations to both clients
7. **Clients** apply transformed operations locally
## Sync State Management
VaultLink maintains sync state to track which changes have been applied.
### Version Vectors
Each document has a version tracked by:
- **Server version**: Incremented on each change
- **Client cursors**: Track which version each client has seen
This enables:
- Efficient syncing (only send changes since last sync)
- Conflict detection (concurrent edits to same version)
- Ordering of operations
### Cursor Management
Clients maintain a cursor position:
```rust
struct Cursor {
vault_id: String,
client_id: String,
last_version: u64,
last_updated: DateTime,
}
```
On sync:
1. Client sends cursor (last seen version)
2. Server returns all changes since that version
3. Client applies changes and updates cursor
## Conflict Resolution Flow
### Scenario: Concurrent Edits
Two users edit the same paragraph simultaneously.
**Initial state**:
```
Version 10: "The quick brown fox jumps over the lazy dog."
```
**User A's edit** (version 11):
```
"The quick brown fox jumps over the very lazy dog."
```
*Inserts "very " at position 40*
**User B's edit** (also from version 10):
```
"The quick red fox jumps over the lazy dog."
```
*Replaces "brown" with "red" at position 10*
### Server Processing
1. **Receive User A's operation**:
- Base: version 10
- Operation: Insert("very ", position=40)
- Apply to database → version 11
2. **Receive User B's operation**:
- Base: version 10
- Operation: Replace("brown"→"red", position=10)
- **Conflict detected**: Base is version 10, but current is version 11
3. **Transform User B's operation**:
- Transform against User A's operation
- Adjust positions/content as needed
- Apply transformed operation → version 12
4. **Broadcast updates**:
- Send User A's operation to User B
- Send transformed User B's operation to User A
### Final Result
```
Version 12: "The quick red fox jumps over the very lazy dog."
```
Both edits are preserved in the final document.
## Edge Cases
### 1. Delete vs Insert Conflict
**Scenario**: User A deletes a paragraph while User B edits it.
**Resolution**:
- OT algorithm prioritizes preservation of content
- Insert operation is transformed to account for deletion
- Typically results in inserted content appearing nearby
**Example**:
```
Base: "Line 1\nLine 2\nLine 3"
User A: Delete Line 2 → "Line 1\nLine 3"
User B: Edit Line 2 → "Line 1\nLine 2 modified\nLine 3"
Result: "Line 1\nLine 2 modified\nLine 3"
```
(Insert takes precedence, preserving user content)
### 2. Overlapping Edits
**Scenario**: Two users edit overlapping regions.
**Resolution**:
- OT splits operations into non-overlapping segments
- Applies each segment independently
- Merges results
### 3. Delete vs Delete
**Scenario**: Two users delete overlapping text.
**Resolution**:
- Deletes are merged
- Final result has the union of deleted ranges removed
### 4. Network Partitions
**Scenario**: Client loses connection, makes edits offline, reconnects.
**Resolution**:
1. Client queues edits locally
2. On reconnect, sends all queued operations
3. Server applies OT against all operations that happened during partition
4. Client receives transformed operations and applies
## Performance Characteristics
### Time Complexity
- **Single operation**: O(1) for most operations
- **Transformation**: O(n) where n is operation size
- **Conflict resolution**: O(m × n) where m is number of concurrent operations
### Space Complexity
- **Version history**: Grows with number of changes
- **Cursors**: O(clients × vaults)
- **Active operations**: Minimal (processed in real-time)
### Optimization
VaultLink optimizes for:
- Small, frequent edits (typical typing patterns)
- Text documents (not binary files)
- Real-time processing (no batching delay)
## Limitations
### Binary Files
OT works best for text files. Binary files:
- Cannot be meaningfully merged
- Use last-write-wins strategy
- May cause data loss on concurrent edits
**Workaround**: Avoid concurrent edits to binary files, or use versioning.
### Large Documents
Very large documents (> 1MB) may have:
- Higher transformation costs
- Slower sync times
- Increased memory usage
**Workaround**: Split large documents or increase timeout settings.
### Complex Formatting
Markdown with complex structures may occasionally produce unexpected results:
- Nested lists
- Tables
- Code blocks
**Workaround**: Manual cleanup if needed, or minimize concurrent edits to complex structures.
## Consistency Guarantees
### Strong Consistency
VaultLink provides **strong eventual consistency**:
- All clients eventually converge to the same state
- Operations applied in causal order
- No data loss under normal operation
### Ordering Guarantees
- Operations from the same client are applied in order
- Concurrent operations may be applied in any order
- Final result is independent of operation order (commutative)
### Durability
- Operations are written to SQLite before acknowledgment
- SQLite ACID guarantees protect against data loss
- Clients retry failed uploads
## Comparison with Other Approaches
### Git-style Merging
| Aspect | Git Merge | VaultLink OT |
|--------|-----------|--------------|
| Real-time | No | Yes |
| Manual conflict resolution | Yes | No |
| Branching | Yes | No |
| Automatic merge | Limited | Always |
| Use case | Code changes | Collaborative documents |
### CRDTs (Conflict-free Replicated Data Types)
| Aspect | CRDTs | VaultLink OT |
|--------|-------|--------------|
| Server required | No | Yes |
| Memory overhead | Higher | Lower |
| Complexity | Higher | Lower |
| Deletion handling | Complex (tombstones) | Simple |
| Best for | Distributed systems | Centralized sync |
### Last Write Wins
| Aspect | LWW | VaultLink OT |
|--------|-----|--------------|
| Data loss | Yes | No |
| Simplicity | High | Medium |
| User experience | Poor | Excellent |
| Performance | Best | Good |
## Algorithm Details
### Transformation Rules
When transforming operation `A` against operation `B`:
1. **Insert vs Insert**:
- If positions equal: Order by client ID
- If different positions: Adjust positions
2. **Insert vs Delete**:
- If insert in deleted range: Shift insert position
- If insert after delete: Adjust position by deleted length
3. **Delete vs Delete**:
- If ranges overlap: Merge delete ranges
- If ranges disjoint: Adjust positions
4. **Retain vs Any**:
- Retain operations don't conflict
- Simply adjust positions
### Transformation Example
```rust
// Pseudo-code for transformation
fn transform(op_a: Operation, op_b: Operation) -> (Operation, Operation) {
match (op_a, op_b) {
(Insert(pos_a, text_a), Insert(pos_b, text_b)) => {
if pos_a < pos_b {
(op_a, Insert(pos_b + text_a.len(), text_b))
} else if pos_a > pos_b {
(Insert(pos_a + text_b.len(), text_a), op_b)
} else {
// Same position, use client ID to break tie
if client_id_a < client_id_b {
(op_a, Insert(pos_b + text_a.len(), text_b))
} else {
(Insert(pos_a + text_b.len(), text_a), op_b)
}
}
}
// ... other cases
}
}
```
## Best Practices
### For Smooth Collaboration
1. **Small edits**: Make small, focused changes for easier merging
2. **Coordinate major changes**: Discuss large refactors with team
3. **Monitor sync status**: Ensure changes are uploaded before signing off
4. **Test conflict resolution**: Verify behavior matches expectations
### For Developers
1. **Text files preferred**: OT works best on text
2. **Limit file sizes**: Keep documents reasonably sized
3. **Binary files**: Use versioning or avoid concurrent edits
4. **Testing**: Test concurrent edit scenarios thoroughly
## Further Reading
- [reconcile-text library](https://crates.io/crates/reconcile-text)
- [Operational Transformation FAQ](https://en.wikipedia.org/wiki/Operational_transformation)
- [Data flow architecture →](/architecture/data-flow)