LexiDiff.Snowball
0.3.1
dotnet add package LexiDiff.Snowball --version 0.3.1
NuGet\Install-Package LexiDiff.Snowball -Version 0.3.1
<PackageReference Include="LexiDiff.Snowball" Version="0.3.1" />
<PackageVersion Include="LexiDiff.Snowball" Version="0.3.1" />
<PackageReference Include="LexiDiff.Snowball" />
paket add LexiDiff.Snowball --version 0.3.1
#r "nuget: LexiDiff.Snowball, 0.3.1"
#:package LexiDiff.Snowball@0.3.1
#addin nuget:?package=LexiDiff.Snowball&version=0.3.1
#tool nuget:?package=LexiDiff.Snowball&version=0.3.1
LexiDiff
Token-aware text diffs with an objective to favour readability over compactness.
| Pure Leveinstein | LexiDiff | |------------------|----------| | Alice was <del>beginning to get</del><ins>getting</ins> very tired <ins>t</ins>o<del>f</del> sit<del>ting</del> by her sister on the bank, <del>and of having</del><ins>with</ins> nothing to do. | Alice was <del>beginning to </del>get<ins>ting</ins> very tired <del>of</del><ins>to</ins> sit<del>ting</del> by her sister on the bank, <del>and of having</del><ins>with</ins> nothing to do. |
Produce readable diffs that never split randomly inside words, optionally promote changes to sentence or paragraph granularity, and render as unified diff or inline HTML.
- ICU word segmentation + Snowball stemming (multi-language)
- Token-aware diff (Diff Match Patch), no mid-token splits
- Optional promotion to Sentence or Paragraph
- Output as Delete-Add-Replace sequences, Unified Diff (line-level hunks) or Inline HTML
Install
Requirements: .NET 4.8+ / .NET 8 / .NET 9
dotnet add package LexiDiff
Quick Start
using System.Globalization;
using LexiDiff;
LexiDiffResult result = Lexi.Compare(
"Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do.",
"Alice was getting very tired to sit by her sister on the bank, with nothing to do.");
foreach (var span in result.Spans) {
switch (span.Op) {
case LexiOp.Insert: Console.Write($"<ins>{span.Text}</ins>"); break;
case LexiOp.Equal: Console.Write(span.Text); break;
case LexiOp.Delete: Console.Write($"<del>{span.Text}</del>"); break;
}
}
The LexiDiffResult result contains a list of operation (delete, insert, equal) which, when applied onto the original string, generates the second:
Alice was <del>beginning to </del>get<ins>ting</ins> very tired <del>of</del><ins>to</ins> sit<del>ting</del> by her sister on the bank, <del>and of having</del><ins>with</ins> nothing to do.
Notice that we perform word and stemming aware diff:
- get<u>ting</u> is allowed (stemming aware) but
- <u>to</u>[f] would have been preferred by Levenshtein distance, but is not allowed here and transformed as
of<u>to</u> instead
Granularity Promotion (Sentence / Paragraph)
You can “promote” any in-sentence edits to a whole-sentence replacement (or paragraph-level), which is often what reviewers want to see.
// Sentence-level promotion (locale-aware via ICU)
var sentenceDiff = Lexi.Compare(
a, b,
new LexiDiff.LexiOptions {
PromoteTo = LexiDiff.LexGranularity.Sentence,
SentenceCulture = CultureInfo.GetCultureInfo("en-US")
});
Console.WriteLine(sentenceDiff.ToUnifiedDiff("a.txt", "b.txt"));
// Paragraph-level promotion
var paraDiff = LexiDiff.LexDiff.CompareParagraphs(a, b);
Sentence boundaries use ICU’s Unicode Text Segmentation (UAX #29) with locale tailoring. Paragraphs split on newlines (blank line is its own paragraph).
Why token-aware?
Traditional diffs split anywhere in the character stream. LexiDiff:
- Segments words with ICU, so punctuation/whitespace tokens are preserved.
- Stems with Snowball, so variants like Running → Runner align on Run.
- Diffs on tokens, so we never split inside a stem/suffix.
- Guarantees perfect reconstruction: for every token, either a
Wholetoken or(Stem + Suffix)wherestem + suffix == original.
This makes deltas cleaner and more meaningful for reviewers.
Known Limitations
- Unified diff is line-level. Inline word/suffix highlighting is available via
ToInlineHtml, not in unified output. - Snowball’s stemming is heuristic; some languages/words may not split (by design). We preserve the original text regardless.
- Sentence boundaries in may need RBBI tailoring; a light post-filter for abbreviations (
Me.,Dr.,art.) is easy to add if needed.
License
MIT (project code). Snowball stemmers are BSD-style; ICU4N follows ICU/Unicode licenses. Review their licenses if redistributing.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
| .NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- No dependencies.
-
net8.0
- No dependencies.
NuGet packages (1)
Showing the top 1 NuGet packages that depend on LexiDiff.Snowball:
| Package | Downloads |
|---|---|
|
LexiDiff
Token-aware diff engine with Snowball stemming, ICU tokenization, and unified diff rendering. |
GitHub repositories
This package is not used by any popular GitHub repositories.