Tokenizers.HuggingFace
1.0.0
See the version list below for details.
dotnet add package Tokenizers.HuggingFace --version 1.0.0
NuGet\Install-Package Tokenizers.HuggingFace -Version 1.0.0
<PackageReference Include="Tokenizers.HuggingFace" Version="1.0.0" />
<PackageVersion Include="Tokenizers.HuggingFace" Version="1.0.0" />
<PackageReference Include="Tokenizers.HuggingFace" />
paket add Tokenizers.HuggingFace --version 1.0.0
#r "nuget: Tokenizers.HuggingFace, 1.0.0"
#:package Tokenizers.HuggingFace@1.0.0
#addin nuget:?package=Tokenizers.HuggingFace&version=1.0.0
#tool nuget:?package=Tokenizers.HuggingFace&version=1.0.0
Tokenizers.HuggingFace
.NET bindings for huggingface/tokenizers using protobufs for comunication and C-ABI.
How to install
dotnet add package Tokenizers.HuggingFace
Supported targets
- win-x64
- linux-x64
- osx-x64
- osx-arm64
- win-arm64
- linux-arm64
Usage
Casses:
- Normalization
- PreTokenization
- Tokenizer (Encode, Decode, Load From File, Train)
Examples
Sentence Similarity with sentence-transformers/all-MiniLM-L6-v2
Steps:
- Create Console app
dotnet new console --name Sentences
- Download onnx/model.onnx.
- Download tokenizer.json. optional: Remove the padding and truncation.
- Install onnxruntime and Tokenizers.HuggingFace
dotnet add package Microsoft.ML.OnnxRuntime
dotnet add package Tokenizers.HuggingFace
- Add the following code to Program.cs
using System.Numerics;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using Tokenizers.HuggingFace.Tokenizer;
var a = SentenceSimilarityModel.GetEmbeddings("Hello, world");
var b = SentenceSimilarityModel.GetEmbeddings("Hello, world, good to be here");
Console.WriteLine($"E: {string.Join(',', a)}");
Console.WriteLine($"a-b: {SentenceSimilarityModel.CosineSimilarity(a, b)}");
static class SentenceSimilarityModel
{
static readonly Tokenizer tk = Tokenizer.FromFile("./tokenizer.json");
static readonly InferenceSession session = new InferenceSession("./model.onnx");
static (int, NamedOnnxValue[]) PrepareInputs(string text)
{
var encodings = tk.Encode(text, true, include_type_ids: true, include_attention_mask: true).Encodings[0];
var sequenceLenght = encodings.Ids.Count;
var input_ids = new DenseTensor<long>(encodings.Ids.Select(t => (long)t).ToArray(), [1, sequenceLenght]);
var type_ids = new DenseTensor<long>(encodings.TypeIds.Select(t => (long)t).ToArray(), [1, sequenceLenght]);
var attention_mask = new DenseTensor<long>(encodings.AttentionMask.Select(t => (long)t).ToArray(), [1, sequenceLenght]);
return (sequenceLenght, [
NamedOnnxValue.CreateFromTensor("input_ids", input_ids),
NamedOnnxValue.CreateFromTensor("token_type_ids", type_ids),
NamedOnnxValue.CreateFromTensor("attention_mask", attention_mask)
]);
}
static public float[] GetEmbeddings(string text)
{
var (sequenceLenght, inputs) = PrepareInputs(text);
using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = session.Run(inputs);
var outputTensor = results.First().AsEnumerable<float>().ToArray();
int subVector = 384 / Vector<float>.Count;
float[] data = new float[384];
for (int i = 0; i < sequenceLenght; i++)
{
for (int j = 0; j < subVector; j++)
{
Vector<float> result = new(data, j * Vector<float>.Count);
result += new Vector<float>(outputTensor, i * 384 + j * Vector<float>.Count);
result.CopyTo(data, j * Vector<float>.Count);
}
}
for (int i = 0; i < subVector; i++)
{
Vector<float> result = new Vector<float>(data, i * Vector<float>.Count)/sequenceLenght;
result.CopyTo(data, i * Vector<float>.Count);
}
return data;
}
static public double CosineSimilarity(float[] a, float[] b)
{
float[] result = new float[384];
int subVector = 384 / Vector<float>.Count;
double ab = 0, aa = 0, bb = 0;
for (int i = 0; i < subVector; i++)
{
Vector<float> vecA = new(a, i * Vector<float>.Count);
Vector<float> vecB = new(b, i * Vector<float>.Count);
ab += Vector.Dot(vecA, vecB);
aa += Vector.Dot(vecA, vecA);
bb += Vector.Dot(vecB, vecB);
}
return ab / (aa * bb);
}
}
Releasing
If you know the target target you are building yout project for use:
dotnet build .\YourProject.csproj -c Release -r [target]
This way you avoid including all native libraries.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net6.0
- Google.Protobuf (>= 3.30.2)
NuGet packages (3)
Showing the top 3 NuGet packages that depend on Tokenizers.HuggingFace:
| Package | Downloads |
|---|---|
|
AllMpnetBaseV2Sharp
C# implementation of sentence-transformers/all-mpnet-base-v2 using ONNX Runtime and HuggingFace tokenizers. |
|
|
AdaptiveClassifier.NET
A .NET implementation of adaptive text classification concepts inspired by the Adaptive Classifier research team. Provides production-ready inference with dual prediction mechanisms. |
|
|
SentenceTransformers.Qwen3
The wrapper provides a simple and easy-to-use interface for loading the Qwen3 embeddings model and generating embeddings for input text. |
GitHub repositories (1)
Showing the top 1 popular GitHub repositories that depend on Tokenizers.HuggingFace:
| Repository | Stars |
|---|---|
|
axzxs2001/Asp.NetCoreExperiment
原来所有项目都移动到**OleVersion**目录下进行保留。新的案例装以.net 5.0为主,一部分对以前案例进行升级,一部分将以前的工作经验总结出来,以供大家参考!
|
| Version | Downloads | Last Updated |
|---|---|---|
| 2.21.4 | 5,731 | 9/4/2025 |
| 2.21.4-rc.0 | 229 | 8/30/2025 |
| 1.21.4 | 613 | 8/10/2025 |
| 1.21.4-rc.1 | 168 | 8/17/2025 |
| 1.0.1-experimental.1 | 225 | 6/26/2025 |
| 1.0.0 | 1,119 | 5/10/2025 |
| 0.1.0 | 277 | 5/8/2025 |