TiktokenSharp 1.0.8

There is a newer version of this package available.
See the version list below for details.

dotnet add package TiktokenSharp --version 1.0.8

NuGet\Install-Package TiktokenSharp -Version 1.0.8

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="TiktokenSharp" Version="1.0.8" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

paket add TiktokenSharp --version 1.0.8

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: TiktokenSharp, 1.0.8"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

// Install TiktokenSharp as a Cake Addin
#addin nuget:?package=TiktokenSharp&version=1.0.8

// Install TiktokenSharp as a Cake Tool
#tool nuget:?package=TiktokenSharp&version=1.0.8

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

TiktokenSharp

Due to the lack of a C# version of cl100k_base encoding (gpt-3.5-turbo), I have implemented a basic solution with encoding and decoding methods based on the official Rust implementation.

Currently, cl100k_base p50k_base has been implemented. Other encodings will be added in future submissions. If you encounter any issues or have questions, please feel free to submit them on the lssues."

If you want to use the ChatGPT C# library that integrates this repository and implements context-based conversation, please visit ChatGPTSharp.

Getting Started

TiktokenSharp is available as NuGet package.

using TiktokenSharp;

//use model name
TikToken tikToken = TikToken.EncodingForModel("gpt-3.5-turbo");
var i = tikToken.Encode("hello world"); //[15339, 1917]
var d = tikToken.Decode(i); //hello world

//use encoding name
TikToken tikToken = TikToken.GetEncoding("cl100k_base");
var i = tikToken.Encode("hello world"); //[15339, 1917]
var d = tikToken.Decode(i); //hello world

When using a new encoder for the first time, the required tiktoken files for the encoder will be downloaded from the internet. This may take some time. Once the download is successful, subsequent uses will not require downloading again. You can set TikToken.PBEFileDirectory before using the encoder to modify the storage path of the downloaded tiktoken files, or you can pre-download the files to avoid network issues causing download failures.

Why are the tiktoken files not integrated into the package? On one hand, this would make the package size larger. On the other hand, I want to stay as consistent as possible with OpenAI's official Python code.

If you are deploying cloud functions, such as "Azure App Service," which cannot read/write local files, please package tiktoken files(PBE Dir) with the publish files.

Below are the file download links: p50k_base.tiktoken cl100k_base.tiktoken

Efficiency Comparison

I noticed that some users would like to get a comparison of efficiency. Here, I use SharpToken as the basic comparison, with the encoder cl100k_base, on the .Net 6.0 in Debug mode.

TiktokenSharp Version: 1.0.5
SharpToken Version: 1.0.28

CPU

const string kLongText = "King Lear, one of Shakespeare's darkest and most savage plays, tells the story of the foolish and Job-like Lear, who divides his kingdom, as he does his affections, according to vanity and whim. Lear’s failure as a father engulfs himself and his world in turmoil and tragedy.";

static async Task SpeedTiktokenSharp()
{
    TikToken tikToken = TikToken.GetEncoding("cl100k_base");
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();

    for (int i = 0; i < 10000; i++) 
    {
        var encoded = tikToken.Encode(kLongText);
        var decoded = tikToken.Decode(encoded);
    }

    stopwatch.Stop();
    TimeSpan timespan = stopwatch.Elapsed;
    double milliseconds = timespan.TotalMilliseconds;
    Console.WriteLine($"SpeedTiktokenSharp = {milliseconds} ms");
}

static async Task SpeedSharpToken()
{
    var encoding = GptEncoding.GetEncoding("cl100k_base");

    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();   

    for (int i = 0; i < 10000; i++) 
    {
        var encoded = encoding.Encode(kLongText);
        var decoded = encoding.Decode(encoded);
    }

    stopwatch.Stop();
    TimeSpan timespan = stopwatch.Elapsed;
    double milliseconds = timespan.TotalMilliseconds;
    Console.WriteLine($"SpeedSharpToken = {milliseconds} ms");

}

</details> TiktokenSharp is approximately 57% faster than SharpToken.

SpeedTiktokenSharp = 570.1206 ms
SpeedSharpToken = 1312.8812 ms

Memory

<details> <summary>Image：</summary>

20230509125926 20230509130021

</details>

TiktokenSharp has approximately 26% less memory usage than SharpToken.

Update

1.0.7 20231010

Corrected the issue where some new models could not properly obtain the encoder.

1.0.6 20230625

Replace WebClient with HttpClient, add async methods.

1.0.5 20230508

New support for .Net Standard 2.0 has been added, making TiktokenSharp usable in the .Net Framework.

1.0.4 20230424

Add method TikToken.GetEncoding(encodingName).

1.0.3 20230321

GetEncodingSetting now supports the model of gpt-4 and also allows for encoding names to be directly passed in.

1.0.2 20230317

add method TikToken.PBEFileDirectory to allow for custom storage directory of bpe files. the path needs to be set before TikToken.EncodingForModel().

1.0.1 20230313

p50k_base encoding algorithm that supports the text-davinci-003 model.

Product	Compatible and additional computed target framework versions.
.NET	net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed.
.NET Core	netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed.
.NET Standard	netstandard2.0 is compatible. netstandard2.1 is compatible.
.NET Framework	net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed.
MonoAndroid	monoandroid was computed.
MonoMac	monomac was computed.
MonoTouch	monotouch was computed.
Tizen	tizen40 was computed. tizen60 was computed.
Xamarin.iOS	xamarinios was computed.
Xamarin.Mac	xamarinmac was computed.
Xamarin.TVOS	xamarintvos was computed.
Xamarin.WatchOS	xamarinwatchos was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

.NETStandard 2.0
- IndexRange (>= 1.0.2)
.NETStandard 2.1
- No dependencies.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on TiktokenSharp:

Package	Downloads
ChatGPTSharp Supports GPT-4V, GPT-3.5 models; auto-calculates request tokens; enables continuous dialogues with conversation IDs; now includes Vision model image sending.	5.6K

GitHub repositories (2)

Showing the top 2 popular GitHub repositories that depend on TiktokenSharp:

Repository	Stars
dmitry-brazhenko/SharpToken SharpToken is a C# library for tokenizing natural language text. It's based on the tiktoken Python library and designed to be fast and accurate.	205
MayDay-wpf/AIBotPublic AIBot PRO 是一个基于.NET 6 的 AI聚合客户端 to C 弱 to B 可以集成众多AI产品(ChatGPT,Gemini,Claude,文心一言,通义千问,讯飞星火)，无感切换对话，支持知识库、插件开发、AI流程引擎（workflow）、以及开放平台对外输出定制化的特色AI API	184

Version	Downloads	Last updated
1.1.5	1,107	10/8/2024
1.1.4	25,588	5/14/2024
1.1.2	433	5/14/2024
1.1.1	101	5/14/2024
1.1.0	6,102	4/8/2024
1.0.9	12,346	2/8/2024
1.0.8	13,426	12/27/2023
1.0.7	23,327	10/10/2023
1.0.6	60,914	6/25/2023
1.0.5	50,753	5/8/2023
1.0.4	1,902	4/24/2023
1.0.3	2,254	3/21/2023
1.0.2	599	3/17/2023
1.0.1	1,404	3/13/2023
1.0.0	607	3/7/2023