Soenneker.Deduplication.Bounded
4.0.21
Prefix Reserved
dotnet add package Soenneker.Deduplication.Bounded --version 4.0.21
NuGet\Install-Package Soenneker.Deduplication.Bounded -Version 4.0.21
<PackageReference Include="Soenneker.Deduplication.Bounded" Version="4.0.21" />
<PackageVersion Include="Soenneker.Deduplication.Bounded" Version="4.0.21" />
<PackageReference Include="Soenneker.Deduplication.Bounded" />
paket add Soenneker.Deduplication.Bounded --version 4.0.21
#r "nuget: Soenneker.Deduplication.Bounded, 4.0.21"
#:package Soenneker.Deduplication.Bounded@4.0.21
#addin nuget:?package=Soenneker.Deduplication.Bounded&version=4.0.21
#tool nuget:?package=Soenneker.Deduplication.Bounded&version=4.0.21
Soenneker.Deduplication.Bounded
A thread-safe high-performance bounded size deduplication utility for .NET.
Installation
dotnet add package Soenneker.Deduplication.Bounded
What it does
Soenneker.Deduplication.Bounded provides a fast, thread-safe “seen set” for deduplication with a maximum size.
You call TryMarkSeen(...) with an input value:
- Returns
trueif this value has not been seen before (it was added) - Returns
falseif it has already been seen (already exists)
Internally it hashes your input to a ulong using XXH3 (XxHash3) and stores only the hash in a bounded concurrent set. That means it’s very memory efficient and avoids storing original strings/byte arrays.
Key characteristics
- Bounded size: targets
MaxSizeand opportunistically trims under contention (best-effort, not strict) - Thread-safe: safe to use concurrently from many threads
- High-throughput: stores
ulonghashes instead of strings - Span-friendly: avoids allocations via
ReadOnlySpan<char>andReadOnlySpan<byte> - Optional hashing seed: lets you rotate/partition hash space if desired
- Diagnostics-friendly: exposes an approximate
Count
Quick start
using Soenneker.Deduplication.Bounded;
var dedupe = new BoundedDedupe(maxSize: 250_000);
// returns true the first time
if (dedupe.TryMarkSeen("user:123"))
{
// process first occurrence
}
// returns false on repeats
if (!dedupe.TryMarkSeen("user:123"))
{
// duplicate
}
API
TryMarkSeen
Use these for the fast “check + add” operation.
bool added = dedupe.TryMarkSeen("some string");
bool added2 = dedupe.TryMarkSeen("some string".AsSpan());
bool added3 = dedupe.TryMarkSeenUtf8(utf8Bytes);
Contains
Pure membership checks (no mutation).
bool exists = dedupe.Contains("some string");
bool exists2 = dedupe.Contains("some string".AsSpan());
bool exists3 = dedupe.ContainsUtf8(utf8Bytes);
TryRemove
Removes an entry if present.
bool removed = dedupe.TryRemove("some string");
bool removed2 = dedupe.TryRemove("some string".AsSpan());
bool removed3 = dedupe.TryRemoveUtf8(utf8Bytes);
Properties
int max = dedupe.MaxSize;
int approx = dedupe.Count; // approximate; good for diagnostics/telemetry
Configuration
var dedupe = new BoundedDedupe(
maxSize: 250_000,
capacityHint: 300_000, // optional, reduces resizing
seed: 0, // optional XXH3 seed
trimBatchSize: 64, // work chunk size when trimming
trimStartOveragePercent: 5, // begin trimming after +5% over MaxSize
maxTrimWorkPerCall: 4096, // caps trimming effort per write
resyncAfterNoProgress: 8, // resync count if trimming stalls
queueOverageFactor: 4 // internal queue sizing multiplier
);
Notes on trimming / bounded behavior
This is not a strict LRU and does not guarantee exact eviction order. Under heavy contention it may temporarily exceed MaxSize, then trims opportunistically during subsequent writes.
This design is intentional: it favors throughput and low contention over perfect eviction accuracy.
Hashing & collisions
Inputs are deduped by their 64-bit XXH3 hash (ulong). Like all hashing-based dedupe approaches, there is a theoretical possibility of collisions (different inputs producing the same hash). For most dedupe/telemetry/rate-limit style workloads, a 64-bit hash is typically more than sufficient.
If collision risk is unacceptable for your use case, you should store full keys (or use a stronger scheme), at higher memory cost.
When to use this
- Deduping inbound events/messages by ID for a fixed memory budget
- “Seen recently” protection in high-volume ingestion pipelines
- De-duplicating phone numbers / emails / identifiers without storing raw values
- Fast in-memory suppression lists
When not to use this
- You need exact dedupe of raw strings (no collision tolerance)
- You need strict FIFO/LRU eviction ordering guarantees
- You need time-window expiration semantics (use a sliding window approach instead)
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- Soenneker.Hashing.XxHash (>= 4.0.42)
- Soenneker.Sets.Concurrent.Bounded (>= 4.0.15)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on Soenneker.Deduplication.Bounded:
| Package | Downloads |
|---|---|
|
Soenneker.Deduplication.Bounded.Registry
A keyed registry of bounded dedupe instances. |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 4.0.21 | 38 | 3/15/2026 |
| 4.0.20 | 48 | 3/14/2026 |
| 4.0.19 | 40 | 3/13/2026 |
| 4.0.18 | 43 | 3/13/2026 |
| 4.0.17 | 33 | 3/13/2026 |
| 4.0.16 | 45 | 3/13/2026 |
| 4.0.15 | 59 | 3/12/2026 |
| 4.0.14 | 70 | 3/12/2026 |
| 4.0.13 | 98 | 3/11/2026 |
| 4.0.12 | 71 | 3/11/2026 |
| 4.0.11 | 48 | 3/11/2026 |
| 4.0.10 | 37 | 3/11/2026 |
| 4.0.9 | 63 | 3/10/2026 |
| 4.0.8 | 34 | 3/10/2026 |
| 4.0.7 | 109 | 3/10/2026 |
| 4.0.6 | 109 | 3/10/2026 |
| 4.0.5 | 76 | 3/9/2026 |
| 4.0.4 | 76 | 3/9/2026 |
| 4.0.3 | 80 | 3/9/2026 |
| 4.0.2 | 141 | 3/7/2026 |