Add docs
This commit is contained in:
parent
56c1f4d58b
commit
50a95b114d
19 changed files with 4663 additions and 1 deletions
361
docs/architecture/sync-algorithm.md
Normal file
361
docs/architecture/sync-algorithm.md
Normal file
|
|
@ -0,0 +1,361 @@
|
|||
# Sync Algorithm
|
||||
|
||||
VaultLink uses operational transformation (OT) to handle concurrent edits and maintain consistency across clients. This document explains how the algorithm works.
|
||||
|
||||
## Operational Transformation
|
||||
|
||||
Operational transformation is a technique for managing concurrent edits to the same document. It transforms operations (edits) so they can be applied in different orders while preserving user intent.
|
||||
|
||||
### Why OT?
|
||||
|
||||
Traditional conflict resolution approaches:
|
||||
- **Last write wins**: Loses data, frustrating for users
|
||||
- **Manual merging**: Interrupts workflow, requires user intervention
|
||||
- **Version branching**: Complex, not suitable for real-time sync
|
||||
|
||||
Operational transformation:
|
||||
- **Automatic**: No user intervention required
|
||||
- **Preserves all edits**: No data loss
|
||||
- **Real-time**: Changes appear immediately
|
||||
- **Intuitive**: Behavior matches user expectations
|
||||
|
||||
## The reconcile-text Library
|
||||
|
||||
VaultLink uses the [`reconcile-text`](https://crates.io/crates/reconcile-text) Rust library for operational transformation on text documents.
|
||||
|
||||
### How It Works
|
||||
|
||||
Given a base document and two sets of changes, OT produces a merged result that includes both changes.
|
||||
|
||||
**Example**:
|
||||
|
||||
```
|
||||
Base document: "Hello world"
|
||||
|
||||
User A: "Hello beautiful world" (inserts "beautiful ")
|
||||
User B: "Hello world!" (inserts "!")
|
||||
|
||||
OT result: "Hello beautiful world!" (both changes applied)
|
||||
```
|
||||
|
||||
### Operation Types
|
||||
|
||||
The algorithm handles these operations:
|
||||
- **Insert**: Add text at position
|
||||
- **Delete**: Remove text from position
|
||||
- **Retain**: Keep existing text unchanged
|
||||
|
||||
### Transformation Process
|
||||
|
||||
1. **Client A** makes edit and sends to server
|
||||
2. **Client B** makes concurrent edit and sends to server
|
||||
3. **Server** receives both edits
|
||||
4. **Server** transforms operations to account for concurrent changes
|
||||
5. **Server** applies merged result to database
|
||||
6. **Server** sends transformed operations to both clients
|
||||
7. **Clients** apply transformed operations locally
|
||||
|
||||
## Sync State Management
|
||||
|
||||
VaultLink maintains sync state to track which changes have been applied.
|
||||
|
||||
### Version Vectors
|
||||
|
||||
Each document has a version tracked by:
|
||||
- **Server version**: Incremented on each change
|
||||
- **Client cursors**: Track which version each client has seen
|
||||
|
||||
This enables:
|
||||
- Efficient syncing (only send changes since last sync)
|
||||
- Conflict detection (concurrent edits to same version)
|
||||
- Ordering of operations
|
||||
|
||||
### Cursor Management
|
||||
|
||||
Clients maintain a cursor position:
|
||||
|
||||
```rust
|
||||
struct Cursor {
|
||||
vault_id: String,
|
||||
client_id: String,
|
||||
last_version: u64,
|
||||
last_updated: DateTime,
|
||||
}
|
||||
```
|
||||
|
||||
On sync:
|
||||
1. Client sends cursor (last seen version)
|
||||
2. Server returns all changes since that version
|
||||
3. Client applies changes and updates cursor
|
||||
|
||||
## Conflict Resolution Flow
|
||||
|
||||
### Scenario: Concurrent Edits
|
||||
|
||||
Two users edit the same paragraph simultaneously.
|
||||
|
||||
**Initial state**:
|
||||
```
|
||||
Version 10: "The quick brown fox jumps over the lazy dog."
|
||||
```
|
||||
|
||||
**User A's edit** (version 11):
|
||||
```
|
||||
"The quick brown fox jumps over the very lazy dog."
|
||||
```
|
||||
*Inserts "very " at position 40*
|
||||
|
||||
**User B's edit** (also from version 10):
|
||||
```
|
||||
"The quick red fox jumps over the lazy dog."
|
||||
```
|
||||
*Replaces "brown" with "red" at position 10*
|
||||
|
||||
### Server Processing
|
||||
|
||||
1. **Receive User A's operation**:
|
||||
- Base: version 10
|
||||
- Operation: Insert("very ", position=40)
|
||||
- Apply to database → version 11
|
||||
|
||||
2. **Receive User B's operation**:
|
||||
- Base: version 10
|
||||
- Operation: Replace("brown"→"red", position=10)
|
||||
- **Conflict detected**: Base is version 10, but current is version 11
|
||||
|
||||
3. **Transform User B's operation**:
|
||||
- Transform against User A's operation
|
||||
- Adjust positions/content as needed
|
||||
- Apply transformed operation → version 12
|
||||
|
||||
4. **Broadcast updates**:
|
||||
- Send User A's operation to User B
|
||||
- Send transformed User B's operation to User A
|
||||
|
||||
### Final Result
|
||||
|
||||
```
|
||||
Version 12: "The quick red fox jumps over the very lazy dog."
|
||||
```
|
||||
|
||||
Both edits are preserved in the final document.
|
||||
|
||||
## Edge Cases
|
||||
|
||||
### 1. Delete vs Insert Conflict
|
||||
|
||||
**Scenario**: User A deletes a paragraph while User B edits it.
|
||||
|
||||
**Resolution**:
|
||||
- OT algorithm prioritizes preservation of content
|
||||
- Insert operation is transformed to account for deletion
|
||||
- Typically results in inserted content appearing nearby
|
||||
|
||||
**Example**:
|
||||
```
|
||||
Base: "Line 1\nLine 2\nLine 3"
|
||||
|
||||
User A: Delete Line 2 → "Line 1\nLine 3"
|
||||
User B: Edit Line 2 → "Line 1\nLine 2 modified\nLine 3"
|
||||
|
||||
Result: "Line 1\nLine 2 modified\nLine 3"
|
||||
```
|
||||
(Insert takes precedence, preserving user content)
|
||||
|
||||
### 2. Overlapping Edits
|
||||
|
||||
**Scenario**: Two users edit overlapping regions.
|
||||
|
||||
**Resolution**:
|
||||
- OT splits operations into non-overlapping segments
|
||||
- Applies each segment independently
|
||||
- Merges results
|
||||
|
||||
### 3. Delete vs Delete
|
||||
|
||||
**Scenario**: Two users delete overlapping text.
|
||||
|
||||
**Resolution**:
|
||||
- Deletes are merged
|
||||
- Final result has the union of deleted ranges removed
|
||||
|
||||
### 4. Network Partitions
|
||||
|
||||
**Scenario**: Client loses connection, makes edits offline, reconnects.
|
||||
|
||||
**Resolution**:
|
||||
1. Client queues edits locally
|
||||
2. On reconnect, sends all queued operations
|
||||
3. Server applies OT against all operations that happened during partition
|
||||
4. Client receives transformed operations and applies
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Time Complexity
|
||||
|
||||
- **Single operation**: O(1) for most operations
|
||||
- **Transformation**: O(n) where n is operation size
|
||||
- **Conflict resolution**: O(m × n) where m is number of concurrent operations
|
||||
|
||||
### Space Complexity
|
||||
|
||||
- **Version history**: Grows with number of changes
|
||||
- **Cursors**: O(clients × vaults)
|
||||
- **Active operations**: Minimal (processed in real-time)
|
||||
|
||||
### Optimization
|
||||
|
||||
VaultLink optimizes for:
|
||||
- Small, frequent edits (typical typing patterns)
|
||||
- Text documents (not binary files)
|
||||
- Real-time processing (no batching delay)
|
||||
|
||||
## Limitations
|
||||
|
||||
### Binary Files
|
||||
|
||||
OT works best for text files. Binary files:
|
||||
- Cannot be meaningfully merged
|
||||
- Use last-write-wins strategy
|
||||
- May cause data loss on concurrent edits
|
||||
|
||||
**Workaround**: Avoid concurrent edits to binary files, or use versioning.
|
||||
|
||||
### Large Documents
|
||||
|
||||
Very large documents (> 1MB) may have:
|
||||
- Higher transformation costs
|
||||
- Slower sync times
|
||||
- Increased memory usage
|
||||
|
||||
**Workaround**: Split large documents or increase timeout settings.
|
||||
|
||||
### Complex Formatting
|
||||
|
||||
Markdown with complex structures may occasionally produce unexpected results:
|
||||
- Nested lists
|
||||
- Tables
|
||||
- Code blocks
|
||||
|
||||
**Workaround**: Manual cleanup if needed, or minimize concurrent edits to complex structures.
|
||||
|
||||
## Consistency Guarantees
|
||||
|
||||
### Strong Consistency
|
||||
|
||||
VaultLink provides **strong eventual consistency**:
|
||||
- All clients eventually converge to the same state
|
||||
- Operations applied in causal order
|
||||
- No data loss under normal operation
|
||||
|
||||
### Ordering Guarantees
|
||||
|
||||
- Operations from the same client are applied in order
|
||||
- Concurrent operations may be applied in any order
|
||||
- Final result is independent of operation order (commutative)
|
||||
|
||||
### Durability
|
||||
|
||||
- Operations are written to SQLite before acknowledgment
|
||||
- SQLite ACID guarantees protect against data loss
|
||||
- Clients retry failed uploads
|
||||
|
||||
## Comparison with Other Approaches
|
||||
|
||||
### Git-style Merging
|
||||
|
||||
| Aspect | Git Merge | VaultLink OT |
|
||||
|--------|-----------|--------------|
|
||||
| Real-time | No | Yes |
|
||||
| Manual conflict resolution | Yes | No |
|
||||
| Branching | Yes | No |
|
||||
| Automatic merge | Limited | Always |
|
||||
| Use case | Code changes | Collaborative documents |
|
||||
|
||||
### CRDTs (Conflict-free Replicated Data Types)
|
||||
|
||||
| Aspect | CRDTs | VaultLink OT |
|
||||
|--------|-------|--------------|
|
||||
| Server required | No | Yes |
|
||||
| Memory overhead | Higher | Lower |
|
||||
| Complexity | Higher | Lower |
|
||||
| Deletion handling | Complex (tombstones) | Simple |
|
||||
| Best for | Distributed systems | Centralized sync |
|
||||
|
||||
### Last Write Wins
|
||||
|
||||
| Aspect | LWW | VaultLink OT |
|
||||
|--------|-----|--------------|
|
||||
| Data loss | Yes | No |
|
||||
| Simplicity | High | Medium |
|
||||
| User experience | Poor | Excellent |
|
||||
| Performance | Best | Good |
|
||||
|
||||
## Algorithm Details
|
||||
|
||||
### Transformation Rules
|
||||
|
||||
When transforming operation `A` against operation `B`:
|
||||
|
||||
1. **Insert vs Insert**:
|
||||
- If positions equal: Order by client ID
|
||||
- If different positions: Adjust positions
|
||||
|
||||
2. **Insert vs Delete**:
|
||||
- If insert in deleted range: Shift insert position
|
||||
- If insert after delete: Adjust position by deleted length
|
||||
|
||||
3. **Delete vs Delete**:
|
||||
- If ranges overlap: Merge delete ranges
|
||||
- If ranges disjoint: Adjust positions
|
||||
|
||||
4. **Retain vs Any**:
|
||||
- Retain operations don't conflict
|
||||
- Simply adjust positions
|
||||
|
||||
### Transformation Example
|
||||
|
||||
```rust
|
||||
// Pseudo-code for transformation
|
||||
fn transform(op_a: Operation, op_b: Operation) -> (Operation, Operation) {
|
||||
match (op_a, op_b) {
|
||||
(Insert(pos_a, text_a), Insert(pos_b, text_b)) => {
|
||||
if pos_a < pos_b {
|
||||
(op_a, Insert(pos_b + text_a.len(), text_b))
|
||||
} else if pos_a > pos_b {
|
||||
(Insert(pos_a + text_b.len(), text_a), op_b)
|
||||
} else {
|
||||
// Same position, use client ID to break tie
|
||||
if client_id_a < client_id_b {
|
||||
(op_a, Insert(pos_b + text_a.len(), text_b))
|
||||
} else {
|
||||
(Insert(pos_a + text_b.len(), text_a), op_b)
|
||||
}
|
||||
}
|
||||
}
|
||||
// ... other cases
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### For Smooth Collaboration
|
||||
|
||||
1. **Small edits**: Make small, focused changes for easier merging
|
||||
2. **Coordinate major changes**: Discuss large refactors with team
|
||||
3. **Monitor sync status**: Ensure changes are uploaded before signing off
|
||||
4. **Test conflict resolution**: Verify behavior matches expectations
|
||||
|
||||
### For Developers
|
||||
|
||||
1. **Text files preferred**: OT works best on text
|
||||
2. **Limit file sizes**: Keep documents reasonably sized
|
||||
3. **Binary files**: Use versioning or avoid concurrent edits
|
||||
4. **Testing**: Test concurrent edit scenarios thoroughly
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [reconcile-text library](https://crates.io/crates/reconcile-text)
|
||||
- [Operational Transformation FAQ](https://en.wikipedia.org/wiki/Operational_transformation)
|
||||
- [Data flow architecture →](/architecture/data-flow)
|
||||
Loading…
Add table
Add a link
Reference in a new issue