Improve network usage for small text changes #166

Merged
schmelczer merged 9 commits from asch/compact-reconcile into main 2025-11-16 22:10:22 +00:00
schmelczer commented 2025-11-16 20:41:48 +00:00 (Migrated from github.com)

Only send the changes to the server when the user is typing avoiding sending the entire document on each key stroke.

We only ever know the current state of a document by reading it from disk, so this PR adds a small cache with confirmed document version id - content pairs. When sending requests, it calculates the diff between them first. This results in higher overall CPU usage but a smaller network footprint which tends to be the bottleneck in practice.

Also fix flaky log rotation logic with Claude

Only send the changes to the server when the user is typing avoiding sending the entire document on each key stroke. We only ever know the current state of a document by reading it from disk, so this PR adds a small cache with confirmed document version id - content pairs. When sending requests, it calculates the diff between them first. This results in higher overall CPU usage but a smaller network footprint which tends to be the bottleneck in practice. Also fix flaky log rotation logic with Claude
schmelczer (Migrated from github.com) reviewed 2025-11-16 20:42:01 +00:00
schmelczer (Migrated from github.com) commented 2025-11-16 20:42:01 +00:00

It's a bit useless

It's a bit useless
schmelczer (Migrated from github.com) reviewed 2025-11-16 20:43:05 +00:00
@ -18,3 +18,3 @@
"p-queue": "^8.1.0",
"reconcile-text": "^0.5.0",
"reconcile-text": "^0.7.1",
"uuid": "^13.0.0"
schmelczer (Migrated from github.com) commented 2025-11-16 20:43:05 +00:00
Pick up https://github.com/schmelczer/reconcile/pull/21 & https://github.com/schmelczer/reconcile/pull/24
copilot-pull-request-reviewer[bot] (Migrated from github.com) reviewed 2025-11-16 21:25:23 +00:00
copilot-pull-request-reviewer[bot] (Migrated from github.com) left a comment

Pull Request Overview

This PR optimizes network usage by sending only text changes (diffs) instead of entire documents when the user is typing. The implementation introduces a client-side cache to track document content by version ID, enabling diff calculation for text documents while maintaining full content transfers for binary files. Additionally, it fixes flaky log rotation logic by properly reusing existing log files when appropriate.

  • Splits document update endpoints into separate /text and /binary routes
  • Implements a 2MB LRU cache on the client to store document contents for diff calculation
  • Fixes log rotation to reuse existing files instead of creating new ones prematurely

Reviewed Changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sync-server/src/utils/rotating_file_writer.rs Adds logic to reuse existing log files before rotation time
sync-server/src/utils/is_binary.rs New utility function to detect binary vs text content
sync-server/src/utils.rs Exports new is_binary module
sync-server/src/server/update_document.rs Splits into separate text/binary handlers with diff support
sync-server/src/server/requests.rs Separates request types for text vs binary updates
sync-server/src/server.rs Routes split into /text and /binary endpoints
sync-server/Cargo.toml Updates reconcile-text to 0.7.1 with serde support
frontend/sync-client/src/utils/is-binary.ts Client-side binary detection matching server logic
frontend/sync-client/src/utils/fix-sized-cache.ts LRU cache implementation for document content
frontend/sync-client/src/utils/fix-sized-cache.test.ts Test coverage for cache eviction logic
frontend/sync-client/src/sync-operations/unrestricted-syncer.ts Integrates cache and calls appropriate text/binary endpoints
frontend/sync-client/src/sync-operations/syncer.ts Adds cache clearing on reset
frontend/sync-client/src/sync-client.ts Initializes cache with 2MB size
frontend/sync-client/src/services/types/UpdateTextDocumentVersion.ts Type definition for text update requests
frontend/sync-client/src/services/sync-service.ts Implements putText and putBinary methods
frontend/sync-client/src/file-operations/file-operations.ts Updates import to use local is-binary
frontend/sync-client/package.json Updates reconcile-text dependency
frontend/eslint.config.mjs Disables max-params rule
Files not reviewed (1)
  • frontend/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

## Pull Request Overview This PR optimizes network usage by sending only text changes (diffs) instead of entire documents when the user is typing. The implementation introduces a client-side cache to track document content by version ID, enabling diff calculation for text documents while maintaining full content transfers for binary files. Additionally, it fixes flaky log rotation logic by properly reusing existing log files when appropriate. - Splits document update endpoints into separate `/text` and `/binary` routes - Implements a 2MB LRU cache on the client to store document contents for diff calculation - Fixes log rotation to reuse existing files instead of creating new ones prematurely ### Reviewed Changes Copilot reviewed 18 out of 20 changed files in this pull request and generated 4 comments. <details> <summary>Show a summary per file</summary> | File | Description | | ---- | ----------- | | sync-server/src/utils/rotating_file_writer.rs | Adds logic to reuse existing log files before rotation time | | sync-server/src/utils/is_binary.rs | New utility function to detect binary vs text content | | sync-server/src/utils.rs | Exports new is_binary module | | sync-server/src/server/update_document.rs | Splits into separate text/binary handlers with diff support | | sync-server/src/server/requests.rs | Separates request types for text vs binary updates | | sync-server/src/server.rs | Routes split into /text and /binary endpoints | | sync-server/Cargo.toml | Updates reconcile-text to 0.7.1 with serde support | | frontend/sync-client/src/utils/is-binary.ts | Client-side binary detection matching server logic | | frontend/sync-client/src/utils/fix-sized-cache.ts | LRU cache implementation for document content | | frontend/sync-client/src/utils/fix-sized-cache.test.ts | Test coverage for cache eviction logic | | frontend/sync-client/src/sync-operations/unrestricted-syncer.ts | Integrates cache and calls appropriate text/binary endpoints | | frontend/sync-client/src/sync-operations/syncer.ts | Adds cache clearing on reset | | frontend/sync-client/src/sync-client.ts | Initializes cache with 2MB size | | frontend/sync-client/src/services/types/UpdateTextDocumentVersion.ts | Type definition for text update requests | | frontend/sync-client/src/services/sync-service.ts | Implements putText and putBinary methods | | frontend/sync-client/src/file-operations/file-operations.ts | Updates import to use local is-binary | | frontend/sync-client/package.json | Updates reconcile-text dependency | | frontend/eslint.config.mjs | Disables max-params rule | </details> <details> <summary>Files not reviewed (1)</summary> * **frontend/package-lock.json**: Language not supported </details> --- 💡 <a href="/schmelczer/vault-link/new/main/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
@ -297,11 +347,19 @@ export class SyncService {
return `${safeRemoteUri}/vaults/${vaultName}${path}`;
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2025-11-16 21:25:23 +00:00

Setting Content-Type: multipart/form-data manually will break multipart requests. The browser/fetch API must set this header automatically to include the correct boundary parameter. Remove this line and let the API set the header when FormData is used.

			// Do not set Content-Type for multipart/form-data; let the browser/fetch API handle it.
Setting `Content-Type: multipart/form-data` manually will break multipart requests. The browser/fetch API must set this header automatically to include the correct boundary parameter. Remove this line and let the API set the header when `FormData` is used. ```suggestion // Do not set Content-Type for multipart/form-data; let the browser/fetch API handle it. ```
@ -0,0 +1,113 @@
// Implements an in-memory fixed-size cache for document contents,
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2025-11-16 21:25:22 +00:00

Filtering the entire usage order array on every get operation is O(n). Consider using a doubly-linked list or Map-based approach to track order in O(1) time. With frequent reads, this could become a significant performance bottleneck.

Filtering the entire usage order array on every `get` operation is O(n). Consider using a doubly-linked list or Map-based approach to track order in O(1) time. With frequent reads, this could become a significant performance bottleneck.
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2025-11-16 21:25:22 +00:00

Filtering the entire usage order array on every put operation when updating existing entries is O(n). This compounds the performance issue noted in the get method. Consider using a doubly-linked list or Map-based approach.

Filtering the entire usage order array on every `put` operation when updating existing entries is O(n). This compounds the performance issue noted in the `get` method. Consider using a doubly-linked list or Map-based approach.
@ -0,0 +9,4 @@
// human-readable text.
return true;
}
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2025-11-16 21:25:22 +00:00

The contains method performs a linear search through the entire byte slice. For large files, this could be inefficient. Consider using memchr::memchr(0, data).is_some() which uses SIMD optimizations, or at minimum check only a sample of the data (e.g., first few KB) since NUL bytes in text files typically appear early if present.

    if memchr::memchr(0, data).is_some() {
        // Even though the NUL character is valid in UTF-8, it's highly suspicious in
        // human-readable text.
        return true;
    }
use memchr;
The `contains` method performs a linear search through the entire byte slice. For large files, this could be inefficient. Consider using `memchr::memchr(0, data).is_some()` which uses SIMD optimizations, or at minimum check only a sample of the data (e.g., first few KB) since NUL bytes in text files typically appear early if present. ```suggestion if memchr::memchr(0, data).is_some() { // Even though the NUL character is valid in UTF-8, it's highly suspicious in // human-readable text. return true; } use memchr; ```
schmelczer (Migrated from github.com) reviewed 2025-11-16 21:27:52 +00:00
@ -0,0 +9,4 @@
// human-readable text.
return true;
}
schmelczer (Migrated from github.com) commented 2025-11-16 21:27:52 +00:00

Don't want to add another dep

Don't want to add another dep
Sign in to join this conversation.
No description provided.