Add tokenizer selector

This commit is contained in:
Andras Schmelczer 2025-07-06 22:06:43 +01:00
parent 0ad3dee468
commit 56e08588ef
No known key found for this signature in database
GPG key ID: FC8F2C3D3D1A718C
3 changed files with 310 additions and 123 deletions

View file

@ -28,67 +28,163 @@
<div class="scroll-container"> <div class="scroll-container">
<div class="page-wrapper"> <div class="page-wrapper">
<header> <header>
<h1>3-Way Text Merge</h1> <h1>Reconcile: automated 3-way text merge</h1>
<p> <p>
The The
<a href="https://github.com/schmelczer/reconcile" target="_blank">reconcile</a> <a
solves a fundamental challenge in collaborative editing: what happens when href="https://github.com/schmelczer/reconcile"
multiple people edit the same text simultaneously? target="_blank"
<code>reconcile(parent: str, left: str, right: str) -> str</code> rel="noopener noreferrer"
takes conflicting concurrent edits and intelligently merges them into a unified >reconcile</a
result. Beyond basic conflict resolution, it offers sophisticated merging >
heuristics, flexible tokenization options, and cursor position tracking. library solves a fundamental challenge in collaborative editing: what happens
</p> when multiple users edit the same text simultaneously but we can only capture
<p> the end result, not the intermediary edits? Essentially, it's
The algorithm begins with your chosen tokenizer, then applies Myers' diff <a
algorithm to compare the original text with both conflicting versions. These href="https://www.gnu.org/software/diffutils/manual/html_node/Invoking-diff3.html"
diffs undergo transformation to preserve meaningful change sequences, before a target="_blank"
final merge strategy—inspired by Operational Transformation (OT)—reconciles all rel="noopener noreferrer"
conflicting modifications without losing any edits. >diff3</a
</p> >
<p> (or <code>git merge</code>) but with automatic conflict resolution.
For more details, see the </p>
<a href="https://github.com/schmelczer/reconcile" target="_blank">README</a>. <p>
</p> The
</header> <code>reconcile(parent: str, left: str, right: str) -> str</code>
takes conflicting concurrent edits and intelligently merges them into a
unified result. Beyond basic conflict resolution, it offers sophisticated
merging heuristics, flexible tokenization options, and cursor position
tracking.
</p>
<p>
The algorithm begins with your chosen tokenizer, then applies Myers' diff
algorithm to compare the original text with both conflicting versions. These
diffs undergo transformation to preserve meaningful change sequences, before a
final merge strategy—inspired by Operational Transformation reconciles all
conflicting modifications without losing any edits.
</p>
<p>
For more details, see the
<a href="https://github.com/schmelczer/reconcile" target="_blank">README</a>.
</p>
<main> <p>
<div class="text-area-card diamond-parent"> Use the tokenization options below to experiment with different strategies.
<label The library supports user-defined tokenizers as well.
for="original" </p>
title="The text document's content before any concurrent edits occurred." </header>
>Original</label
>
<textarea id="original" name="original"></textarea>
</div>
<div class="text-area-card diamond-left"> <main>
<label <section class="tokenizer-selector">
for="left" <div class="radio-group" role="radiogroup" aria-label="Tokenization strategy">
title="Colour-coded tokens mark the origin of each token in the result. This text box is marked with the colour green." <label class="radio-option">
> <input
First concurrent edit type="radio"
<div class="box Left"></div> name="tokenizer"
</label> value="Character"
<textarea id="left" name="left"></textarea> id="tokenizer-character"
</div> />
<span class="radio-custom" aria-hidden="true"></span>
<div class="radio-content">
<span class="radio-label">Character</span>
<span class="radio-description">Split by individual characters</span>
</div>
</label>
<label class="radio-option">
<input
type="radio"
name="tokenizer"
value="Word"
id="tokenizer-word"
checked
/>
<span class="radio-custom" aria-hidden="true"></span>
<div class="radio-content">
<span class="radio-label">Word</span>
<span class="radio-description">Split by words (default)</span>
</div>
</label>
<label class="radio-option">
<input type="radio" name="tokenizer" value="Line" id="tokenizer-line" />
<span class="radio-custom" aria-hidden="true"></span>
<div class="radio-content">
<span class="radio-label">Line</span>
<span class="radio-description"
>Split by lines similarly to <code>git merge</code></span
>
</div>
</label>
</div>
</section>
<div class="text-area-card diamond-right"> <div class="text-area-card diamond-parent">
<label <label
for="right" for="original"
title="Colour-coded tokens mark the origin of each token in the result. This text box is marked with the colour blue." title="The text document's content before any concurrent edits occurred."
> >Original</label
Second concurrent edit >
<div class="box Right"></div> <textarea id="original" name="original"></textarea>
</label> </div>
<textarea id="right" name="right"></textarea>
</div>
<div class="text-area-card diamond-result"> <div class="text-area-card diamond-left">
<label <label
title="Read-only. Change the above text boxes to change the content of this box." for="left"
title="Colour-coded tokens mark the origin of each token in the result. This text box is marked with the colour green."
>
First concurrent edit
<div class="box Left"></div>
</label>
<textarea id="left" name="left"></textarea>
</div>
<div class="text-area-card diamond-right">
<label
for="right"
title="Colour-coded tokens mark the origin of each token in the result. This text box is marked with the colour blue."
>
Second concurrent edit
<div class="box Right"></div>
</label>
<textarea id="right" name="right"></textarea>
</div>
<div class="text-area-card diamond-result">
<label
for="merged"
title="Read-only. Change the above text boxes to change the content of this box."
>
Deconflicted result
<svg
xmlns="http://www.w3.org/2000/svg"
width="24"
height="24"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="2"
stroke-linecap="round"
stroke-linejoin="round"
aria-hidden="true"
>
<path stroke="none" d="M0 0h24v24H0z" fill="none"></path>
<path
d="M10 10l-6 6v4h4l6 -6m1.99 -1.99l2.504 -2.504a2.828 2.828 0 1 0 -4 -4l-2.5 2.5"
></path>
<path d="M13.5 6.5l4 4"></path>
<path d="M3 3l18 18"></path>
</svg>
</label>
<div id="merged" role="textbox" aria-readonly="true" aria-live="polite"></div>
</div>
</main>
<footer>
<p>2025 Andras Schmelczer</p>
<a
href="https://github.com/schmelczer/reconcile"
class="github-link"
aria-label="GitHub repository"
> >
Deconflicted result
<svg <svg
xmlns="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg"
width="24" width="24"
@ -102,45 +198,16 @@
> >
<path stroke="none" d="M0 0h24v24H0z" fill="none" /> <path stroke="none" d="M0 0h24v24H0z" fill="none" />
<path <path
d="M10 10l-6 6v4h4l6 -6m1.99 -1.99l2.504 -2.504a2.828 2.828 0 1 0 -4 -4l-2.5 2.5" d="M9 19c-4.3 1.4 -4.3 -2.5 -6 -3m12 5v-3.5c0 -1 .1 -1.4 -.5 -2c2.8 -.3 5.5 -1.4 5.5 -6a4.6 4.6 0 0 0 -1.3 -3.2a4.2 4.2 0 0 0 -.1 -3.2s-1.1 -.3 -3.5 1.3a12.3 12.3 0 0 0 -6.2 0c-2.4 -1.6 -3.5 -1.3 -3.5 -1.3a4.2 4.2 0 0 0 -.1 3.2a4.6 4.6 0 0 0 -1.3 3.2c0 4.6 2.7 5.7 5.5 6c-.6 .6 -.6 1.2 -.5 2v3.5"
/> />
<path d="M13.5 6.5l4 4" />
<path d="M3 3l18 18" />
</svg> </svg>
</label> </a>
<div id="merged"></div> </footer>
</div>
</main>
<footer>
<p>2025 Andras Schmelczer</p>
<a
href="https://github.com/schmelczer/reconcile"
class="github-link"
aria-label="GitHub repository"
>
<svg
xmlns="http://www.w3.org/2000/svg"
width="24"
height="24"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="2"
stroke-linecap="round"
stroke-linejoin="round"
>
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
<path
d="M9 19c-4.3 1.4 -4.3 -2.5 -6 -3m12 5v-3.5c0 -1 .1 -1.4 -.5 -2c2.8 -.3 5.5 -1.4 5.5 -6a4.6 4.6 0 0 0 -1.3 -3.2a4.2 4.2 0 0 0 -.1 -3.2s-1.1 -.3 -3.5 1.3a12.3 12.3 0 0 0 -6.2 0c-2.4 -1.6 -3.5 -1.3 -3.5 -1.3a4.2 4.2 0 0 0 -.1 3.2a4.6 4.6 0 0 0 -1.3 3.2c0 4.6 2.7 5.7 5.5 6c-.6 .6 -.6 1.2 -.5 2v3.5"
/>
</svg>
</a>
</footer>
</div> </div>
</div> </div>
<noscript>JavaScript is required for this website.</noscript> <noscript>JavaScript is required for this website to function properly.</noscript>
<script inline inline-asset="index.js" inline-asset-delete></script> <script inline inline-asset="index.js" inline-asset-delete></script>
</body> </body>
</html> </html>

View file

@ -1,52 +1,54 @@
import { init, reconcileWithHistory } from 'reconcile'; import { init, reconcileWithHistory } from 'reconcile';
import type { Tokenizer } from 'reconcile';
import './style.scss'; import './style.scss';
const originalTextArea = document.getElementById('original') as HTMLTextAreaElement; const originalTextArea = document.getElementById('original') as HTMLTextAreaElement;
const leftTextArea = document.getElementById('left') as HTMLTextAreaElement; const leftTextArea = document.getElementById('left') as HTMLTextAreaElement;
const rightTextArea = document.getElementById('right') as HTMLTextAreaElement; const rightTextArea = document.getElementById('right') as HTMLTextAreaElement;
const mergedTextArea = document.getElementById('merged') as HTMLDivElement; const mergedTextArea = document.getElementById('merged') as HTMLDivElement;
const tokenizerRadios = document.querySelectorAll(
'input[name="tokenizer"]'
) as NodeListOf<HTMLInputElement>;
const sampleText = `The \`reconcile\` Rust library is embedded on this page a WASM module and it powers these text boxes. Experiment with the "Original", "First concurrent edit", and "Second concurrent edit" text boxes to watch competing changes merge in real-time within the "Deconflicted result" box. Here, you will see color-coded tokens marking the origin of each token, including ones that got deleted. The result highly depends on the tokenization strategy, for example, deciding how casing or white-spacing is taken into account.`; const sampleText = `The \`reconcile\` Rust library is embedded on this page as a WASM module and powers these text boxes. Experiment with changing the "Original", "First concurrent edit", and "Second concurrent edit" text boxes to see competing changes get merged in real-time within the "Deconflicted result" box. Here, you will see color-coded tokens marking the origin of each token, including ones that got deleted. The result highly depends on the tokenization strategy, for example, deciding how casing or whitespace is taken into account.`;
async function main(): Promise<void> { async function main(): Promise<void> {
await init(); await init();
originalTextArea?.addEventListener('input', updateMergedText); originalTextArea.addEventListener('input', updateMergedText);
leftTextArea?.addEventListener('input', updateMergedText); leftTextArea.addEventListener('input', updateMergedText);
rightTextArea?.addEventListener('input', updateMergedText); rightTextArea.addEventListener('input', updateMergedText);
window.addEventListener('resize', resizeTextAreas); window.addEventListener('resize', resizeTextAreas);
tokenizerRadios.forEach((radio) => {
radio.addEventListener('change', updateMergedText);
});
loadSample(); loadSample();
updateMergedText(); updateMergedText();
if (leftTextArea) focusTextArea(leftTextArea); focusTextArea(leftTextArea);
} }
function loadSample(): void { function loadSample(): void {
if (originalTextArea) originalTextArea.value = sampleText; originalTextArea.value = sampleText;
if (leftTextArea) { leftTextArea.value =
leftTextArea.value = sampleText.replace('color', 'colour') +
sampleText.replace('color', 'colour') + " Check out what's the most complex conflict you can come up with!";
" Check out what's the most complex conflict you can come up with!"; rightTextArea.value = sampleText
} .replace(', for example,', ' such as')
if (rightTextArea) { .replace('WASM', 'WebAssembly');
rightTextArea.value = sampleText
.replace(', for example,', ' such as')
.replace('WASM', 'WebAssembly');
}
} }
function updateMergedText(): void { function updateMergedText(): void {
resizeTextAreas(); resizeTextAreas();
if (!originalTextArea || !leftTextArea || !rightTextArea || !mergedTextArea) {
return;
}
const original = originalTextArea.value; const original = originalTextArea.value;
const left = leftTextArea.value; const left = leftTextArea.value;
const right = rightTextArea.value; const right = rightTextArea.value;
const results = reconcileWithHistory(original, left, right); const selectedTokenizer = getSelectedTokenizer();
const results = reconcileWithHistory(original, left, right, selectedTokenizer);
mergedTextArea.innerHTML = ''; mergedTextArea.innerHTML = '';
@ -58,11 +60,17 @@ function updateMergedText(): void {
} }
} }
function getSelectedTokenizer(): Tokenizer {
const selectedRadio = Array.from(tokenizerRadios).find((radio) => radio.checked);
return selectedRadio?.value as Tokenizer;
}
function resizeTextAreas(): void { function resizeTextAreas(): void {
// Only auto-resize if field-sizing CSS property is not supported, like in Safari as of now
if (!CSS.supports('field-sizing', 'content')) { if (!CSS.supports('field-sizing', 'content')) {
if (originalTextArea) autoResize(originalTextArea); autoResize(originalTextArea);
if (leftTextArea) autoResize(leftTextArea); autoResize(leftTextArea);
if (rightTextArea) autoResize(rightTextArea); autoResize(rightTextArea);
} }
} }

View file

@ -68,7 +68,7 @@ header > p:not(:first-of-type) {
main { main {
display: grid; display: grid;
grid-template-rows: min-content; grid-template-rows: min-content min-content min-content;
grid-template-columns: 1fr 1fr; grid-template-columns: 1fr 1fr;
gap: 20px; gap: 20px;
justify-items: center; justify-items: center;
@ -76,23 +76,120 @@ main {
padding: 32px; padding: 32px;
} }
.tokenizer-selector {
grid-column: 1 / -1;
grid-row: 1;
width: 100%;
margin-bottom: 8px;
}
.radio-group {
display: flex;
gap: 16px;
justify-content: center;
flex-wrap: wrap;
}
.radio-option {
display: flex;
align-items: center;
gap: 12px;
padding: 16px 20px;
background: #fff;
border-radius: 12px;
box-shadow: 0 2px 8px rgba(36, 81, 166, 0.08);
cursor: pointer;
transition: all 0.2s ease;
border: 2px solid transparent;
min-width: 180px;
position: relative;
}
.radio-option:hover {
box-shadow: 0 4px 16px rgba(36, 81, 166, 0.12);
transform: translateY(-2px);
}
.radio-option:has(input:checked) {
background: #f0f7ff;
border-color: #2451a6;
box-shadow: 0 4px 16px rgba(36, 81, 166, 0.16);
}
.radio-option input[type='radio'] {
position: absolute;
opacity: 0;
pointer-events: none;
}
.radio-custom {
width: 20px;
height: 20px;
border: 2px solid #d1d5db;
border-radius: 50%;
position: relative;
transition: all 0.2s ease;
flex-shrink: 0;
}
.radio-option:has(input:checked) .radio-custom {
border-color: #2451a6;
background: #2451a6;
}
.radio-custom::after {
content: '';
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%, -50%) scale(0);
width: 8px;
height: 8px;
border-radius: 50%;
background: white;
transition: transform 0.2s ease;
}
.radio-option:has(input:checked) .radio-custom::after {
transform: translate(-50%, -50%) scale(1);
}
.radio-content {
display: flex;
flex-direction: column;
gap: 2px;
}
.radio-label {
font-weight: 600;
color: #2451a6;
font-size: 0.95rem;
}
.radio-description {
font-size: 0.8rem;
color: #6b7280;
line-height: 1.2;
}
.diamond-parent { .diamond-parent {
grid-column: 1 / -1; grid-column: 1 / -1;
grid-row: 2;
} }
.diamond-left { .diamond-left {
grid-column: 1; grid-column: 1;
grid-row: 2; grid-row: 3;
} }
.diamond-right { .diamond-right {
grid-column: 2; grid-column: 2;
grid-row: 2; grid-row: 3;
} }
.diamond-result { .diamond-result {
grid-column: 1 / -1; grid-column: 1 / -1;
grid-row: 3; grid-row: 4;
} }
.diamond-result label { .diamond-result label {
@ -196,28 +293,43 @@ textarea {
@media (max-width: 768px) { @media (max-width: 768px) {
main { main {
grid-template-columns: 1fr; grid-template-columns: 1fr;
grid-template-rows: auto auto auto auto; grid-template-rows: auto auto auto auto auto;
} }
.diamond-parent { .tokenizer-selector {
grid-column: 1; grid-column: 1;
grid-row: 1; grid-row: 1;
} }
.diamond-left { .diamond-parent {
grid-column: 1; grid-column: 1;
grid-row: 2; grid-row: 2;
} }
.diamond-right { .diamond-left {
grid-column: 1; grid-column: 1;
grid-row: 3; grid-row: 3;
} }
.diamond-result { .diamond-right {
grid-column: 1; grid-column: 1;
grid-row: 4; grid-row: 4;
} }
.radio-group {
flex-direction: column;
gap: 12px;
}
.radio-option {
min-width: unset;
width: 100%;
}
.diamond-result {
grid-column: 1;
grid-row: 5;
}
} }
footer { footer {