diff --git a/docs/.vitepress/config.mts b/docs/.vitepress/config.mts index 64d77100..d009127a 100644 --- a/docs/.vitepress/config.mts +++ b/docs/.vitepress/config.mts @@ -18,6 +18,7 @@ export default defineConfig({ items: [ { text: "What is VaultLink?", link: "/guide/what-is-vaultlink" }, { text: "Getting Started", link: "/guide/getting-started" }, + { text: "Limitations", link: "/guide/limitations" }, { text: "Comparison with Alternatives", link: "/guide/alternatives" } ] }, diff --git a/docs/architecture/sync-algorithm.md b/docs/architecture/sync-algorithm.md index eb55e9a4..35e63d50 100644 --- a/docs/architecture/sync-algorithm.md +++ b/docs/architecture/sync-algorithm.md @@ -60,19 +60,27 @@ For note-taking workflows where users value editor freedom and offline editing, ### How It Works -Given a base document and two sets of changes, OT produces a merged result that includes both changes. +Given three versions (parent, left, right), reconcile-text produces a merged result. + +**How reconcile-text works**: + +1. **Tokenisation**: Split text into words (using `BuiltinTokenizer::Word`) +2. **Three-way diff**: Compare parent→left and parent→right changes +3. **Merge**: Combine non-conflicting changes, prefer content preservation for conflicts +4. **Result**: Merged text with both edits applied **Example**: ``` -Base document: "Hello world" +Parent: "The quick brown fox" +User A: "The quick red fox" (changes "brown" → "red") +User B: "The very quick brown fox" (inserts "very ") -User A: "Hello beautiful world" (inserts "beautiful ") -User B: "Hello world!" (inserts "!") - -OT result: "Hello beautiful world!" (both changes applied) +Merged: "The very quick red fox" (both changes applied) ``` +**Merge conditions**: Only `.md` and `.txt` files with valid UTF-8 get merged. Binary files or other extensions use last-write-wins. + ### Operation Types The algorithm handles these operations: @@ -263,15 +271,25 @@ VaultLink optimises for: ## Limitations -### Binary Files +### Binary and Non-Mergeable Files -OT works best for text files. Binary files: +Only **`.md`** and **`.txt`** files get automatic merging. Everything else uses last-write-wins. -- Cannot be meaningfully merged -- Use last-write-wins strategy -- May cause data loss on concurrent edits +**Binary detection**: -**Workaround**: Avoid concurrent edits to binary files, or use versioning. +- Files with NUL bytes (`0x00`) +- Files failing UTF-8 validation + +Even `.md` files are treated as binary if they fail UTF-8 checks. + +**Last-write-wins behaviour**: + +``` +User A uploads image.png → Server version 1 +User B uploads image.png → Server version 2 (A's upload lost) +``` + +**Workaround**: Avoid concurrent edits to non-text files. [See all limitations →](/guide/limitations) ### Large Documents diff --git a/docs/config/advanced.md b/docs/config/advanced.md index 72052d50..5275be93 100644 --- a/docs/config/advanced.md +++ b/docs/config/advanced.md @@ -55,26 +55,37 @@ done ### Version History Cleanup -To limit database growth, implement version history pruning (requires custom script): +VaultLink stores **all versions indefinitely** by default. Database grows with every change. + +**Database schema**: Each version stored in `documents` table with `vault_update_id` (sequential). + +Manual cleanup (keep last 100 versions per document): ```bash #!/bin/bash # prune-old-versions.sh -# Keep only last 100 versions per document for db in databases/*.db; do sqlite3 "$db" < /dev/null; then + if ! curl -sf http://localhost:3000/vaults/test/ping > /dev/null; then echo "Health check failed at $(date)" | mail -s "VaultLink Down" admin@example.com # Optionally restart # docker restart vaultlink-server diff --git a/docs/guide/getting-started.md b/docs/guide/getting-started.md index 0dc369df..02b20ae0 100644 --- a/docs/guide/getting-started.md +++ b/docs/guide/getting-started.md @@ -49,7 +49,7 @@ docker run -d \ /app/sync_server /data/config.yml ``` -Verify: `curl http://localhost:3000/vaults/test/ping` should return `pong` +Verify: `curl http://localhost:3000/vaults/test/ping` should return server version and auth status ## Step 2: Connect Client @@ -114,10 +114,12 @@ users: **Client can't connect**: -1. Verify: `curl http://your-server:3000/vaults/test/ping` +1. Verify server: `curl http://your-server:3000/vaults/test/ping` 2. Check URL: `ws://` for HTTP, `wss://` for HTTPS 3. Verify token matches config.yml +**Understanding limitations**: [See what VaultLink can and can't do →](/guide/limitations) + **Files not syncing**: Check client logs, verify vault name matches [Server setup →](/guide/server-setup) | [Architecture →](/architecture/) diff --git a/docs/guide/limitations.md b/docs/guide/limitations.md new file mode 100644 index 00000000..1c514939 --- /dev/null +++ b/docs/guide/limitations.md @@ -0,0 +1,192 @@ +# Limitations + +VaultLink works well for most Obsidian vaults, but has some constraints you should know about. + +## File Type Limitations + +### Mergeable Files + +Only **`.md`** and **`.txt`** files get automatic conflict-free merging. + +Other file types (images, PDFs, etc.) use last-write-wins: + +``` +User A updates diagram.png → Server stores version 1 +User B updates diagram.png → Server stores version 2 (overwrites A's changes) +``` + +**Workaround**: Avoid editing the same non-text file simultaneously. + +### Binary Detection + +Files are treated as binary if they: + +- Contain NUL bytes (`0x00`) +- Fail UTF-8 validation + +Binary files within `.md` or `.txt` extensions still get last-write-wins (no merge). + +## Performance Constraints + +### Server Limits (Configurable) + +| Resource | Default | Maximum Tested | +| ------------------------ | ------- | -------------- | +| Clients per vault | 256 | ~256 | +| Database connections | 12 | 20 | +| Max file size | 512 MB | 4096 MB | +| Request timeout | 60s | 180s | +| WebSocket cursor timeout | 60s | 300s | +| Database busy timeout | 3600s | - | + +### Vault Size + +- **Small vaults** (< 1000 files): Excellent performance +- **Medium vaults** (1000-10000 files): Good performance +- **Large vaults** (> 10000 files): Works, but initial sync slower + +No hard file count limit—constrained by disk space and sync time. + +### Resource Usage + +Rough estimates (varies by vault size and activity): + +- **RAM**: ~50-200 MB base + ~1-5 MB per active client +- **CPU**: Low (< 5%) for typical usage, spikes during merges +- **Disk**: Vault size + version history (grows over time) + +## Version History + +### Storage + +- All versions stored indefinitely (no automatic cleanup) +- Each vault is a separate SQLite database +- Deleted files marked as deleted (not purged) + +**Growth**: Version history grows with every change. A 10 MB vault with frequent edits might grow to 100+ MB over months. + +**Cleanup**: Manual only (see [Advanced Configuration](/config/advanced#version-history-cleanup)). + +### Implications + +- Disk usage grows over time +- Database size affects backup time +- No built-in retention policy + +## Merge Quality + +### Text Merging + +VaultLink uses word-level tokenisation for merging: + +```markdown +Parent: "The quick brown fox" +User A: "The quick red fox" +User B: "The very quick brown fox" +Result: "The very quick red fox" ← Both changes preserved +``` + +**Imperfect scenarios**: + +- Complex nested Markdown (tables, code blocks) +- Simultaneous edits to the same sentence +- Large structural changes (moving sections around) + +**Result**: Merged file might need manual cleanup in ~1-5% of concurrent edits. + +## Scalability + +### SQLite Limitations + +- One SQLite database per vault +- Single-server architecture (no built-in clustering) +- Write serialisation through database + +**For high concurrency**: Consider multiple vaults instead of one massive shared vault. + +### Horizontal Scaling + +Not currently supported. Running multiple servers requires manual vault partitioning. + +## Network Requirements + +### Latency + +- Real-time sync typically < 500ms on good connections +- Mobile/slow networks: 1-5s latency possible +- Timeout failures on very slow connections (> 60s) + +### Offline Behaviour + +- Clients queue changes locally +- On reconnect, sync all changes since last connection +- Conflicts resolved automatically (for mergeable files) + +**Limitation**: No offline conflict preview—merged result appears after reconnect. + +## Security + +### No End-to-End Encryption + +- Server sees all file contents +- Transport encryption only (WSS/TLS) +- Trust your server + +**Workaround**: Self-host on infrastructure you control. + +### Authentication + +- Token-based only (no OAuth, SAML, etc.) +- Tokens configured in server config file +- No runtime user management + +## Known Edge Cases + +### Simultaneous Deletes and Edits + +``` +User A deletes note.md +User B edits note.md +Result: Edit wins (file recreated with B's content) +``` + +Operational transformation prioritises content preservation. + +### Large File Uploads + +Files > 100 MB may time out on slow connections. Increase `response_timeout_seconds` or split large files. + +### Mobile Sync + +- Mobile networks may drop WebSocket connections frequently +- Client auto-reconnects, but causes sync delays +- Battery impact from constant reconnections + +## What VaultLink is NOT + +- **Not a backup solution**: Version history helps but isn't a backup (make backups!) +- **Not Git**: No branching, no commit messages, no diffs to review before merge +- **Not encrypted storage**: Server sees everything +- **Not multi-master**: One server, multiple clients (not peer-to-peer) + +## Recommendations + +### Good Use Cases + +- Personal multi-device sync (< 10 devices) +- Small team collaboration (< 20 people) +- Primarily text/Markdown content +- Trusted server environment + +### Poor Use Cases + +- Large teams (> 50 concurrent users per vault) +- Primarily binary files (images, videos, large PDFs) +- Untrusted server (need E2E encryption) +- Highly regulated environments (HIPAA, etc.) + +## Next Steps + +- [Server configuration limits →](/config/server) +- [Advanced tuning →](/config/advanced) +- [Architecture details →](/architecture/) diff --git a/docs/guide/server-setup.md b/docs/guide/server-setup.md index bf09c5e6..7754da54 100644 --- a/docs/guide/server-setup.md +++ b/docs/guide/server-setup.md @@ -280,10 +280,15 @@ Run daily via cron: The server exposes a ping endpoint: ```bash -curl http://localhost:3000/vaults/fake/ping -# Returns: pong +curl http://localhost:3000/vaults/test/ping +# Returns: {"server_version":"0.10.1","is_authenticated":false} ``` +Replace `test` with any vault name. The endpoint returns: + +- `server_version`: Current server version +- `is_authenticated`: Whether the request included a valid token + Docker health check is built-in and checks this endpoint every 30 seconds. #### Prometheus Metrics diff --git a/docs/guide/what-is-vaultlink.md b/docs/guide/what-is-vaultlink.md index a7dee7c7..070b312c 100644 --- a/docs/guide/what-is-vaultlink.md +++ b/docs/guide/what-is-vaultlink.md @@ -13,9 +13,11 @@ Syncing Obsidian vaults across devices or sharing with teammates sucks: ## VaultLink's Solution -Differential synchronisation with operational transformation. +Differential synchronisation with operational transformation for Markdown and text files. -Edit files with Obsidian, Vim, VS Code, or any editor. VaultLink compares versions and automatically merges all changes. No operation tracking required, no conflict markers, no data loss. +Edit `.md` and `.txt` files with Obsidian, Vim, VS Code, or any editor. VaultLink compares versions and automatically merges all changes. No operation tracking required, no conflict markers. + +**Note**: Binary files (images, PDFs, etc.) use last-write-wins. [See limitations →](/guide/limitations) ## How It Works diff --git a/docs/index.md b/docs/index.md index 705dd1b9..6a7d610d 100644 --- a/docs/index.md +++ b/docs/index.md @@ -37,7 +37,7 @@ features: **Edit with any tool.** Other solutions require CRDT-aware editors or break when you edit outside Obsidian. VaultLink uses differential sync: edit files however you want, sync handles the rest. -**No conflict markers.** Git forces manual merging. Other tools use last-write-wins. VaultLink's operational transformation automatically merges all changes without data loss or workflow interruption. +**No conflict markers.** Git forces manual merging. Other tools use last-write-wins. VaultLink's operational transformation automatically merges Markdown and text files without conflict markers or workflow interruption. [See what's supported →](/guide/limitations) [See how VaultLink compares to alternatives →](/guide/alternatives)