ThaiNLP.NET 0.1.3

dotnet add package ThaiNLP.NET --version 0.1.3
                    
NuGet\Install-Package ThaiNLP.NET -Version 0.1.3
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ThaiNLP.NET" Version="0.1.3" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="ThaiNLP.NET" Version="0.1.3" />
                    
Directory.Packages.props
<PackageReference Include="ThaiNLP.NET" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add ThaiNLP.NET --version 0.1.3
                    
#r "nuget: ThaiNLP.NET, 0.1.3"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package ThaiNLP.NET@0.1.3
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=ThaiNLP.NET&version=0.1.3
                    
Install as a Cake Addin
#tool nuget:?package=ThaiNLP.NET&version=0.1.3
                    
Install as a Cake Tool

thainlp.net

Thai NLP in .NET

Features

Word Tokenization

  • newmm - Dictionary-based maximal matching word segmentation constrained by Thai Character Cluster (TCC) boundaries
  • API similar to PyThaiNLP for easy migration from Python

Subword Tokenization

  • TCC (Thai Character Cluster) tokenization for breaking text into character clusters

Number to Thai Word Conversion

  • NumToThaiWord - Convert numbers to Thai text representation
  • BahtText - Convert numbers to Thai currency format (Baht and Satang)

Installation

dotnet add package ThaiNLP.NET

Or via Package Manager:

Install-Package ThaiNLP.NET

From Source

Build the project:

dotnet build

Usage

Word Tokenization (newmm)

Basic usage:

using Thainlp;

// Simple tokenization
var tokens = WordTokenizer.Tokenize("ประเทศไทยมีอากาศดี");
// Output: ["ประเทศ", "ไทย", "มี", "อากาศ", "ดี"]

// With more options
var tokens = WordTokenizer.WordTokenize(
    text: "โอเคบ่พวกเรารักภาษาบ้านเกิด",
    engine: "newmm",
    keepWhitespace: true
);
// Output: ["โอเค", "บ่", "พวกเรา", "รัก", "ภาษา", "บ้านเกิด"]

Custom Dictionary

using Thainlp;
using System.Collections.Generic;

// Create custom dictionary
var customWords = new List<string> { "ชินโซ", "อาเบะ" };
var customDict = new Trie(customWords);

// Use with tokenizer
var tokens = WordTokenizer.WordTokenize(
    "ชินโซ อาเบะ เกิด 21 กันยายน",
    customDict: customDict
);

TCC (Thai Character Cluster) Tokenization

using Thainlp;

// Tokenize into character clusters
var clusters = TCC.Segment("ประเทศไทย");
// Output: ["ป", "ระ", "เท", "ศ", "ไ", "ท", "ย"]

// Get cluster positions
var positions = TCC.GetPositions("ประเทศไทย");

Legacy Subword API

using Thainlp;

// Original TCC implementation
var clusters = Subword.tcc("ประเทศไทย");
var positions = Subword.tcc_pos("ประเทศไทย");

Number to Thai Word Conversion

using Thainlp;

// Convert number to Thai words
string text = NumToWord.NumToThaiWord(112);
// Output: หนึ่งร้อยสิบสอง

string negative = NumToWord.NumToThaiWord(-273);
// Output: ลบสองร้อยเจ็ดสิบสาม

// Convert to Thai Baht currency format
string baht = NumToWord.BahtText(5611116.50);
// Output: ห้าล้านหกแสนหนึ่งหมื่นหนึ่งพันหนึ่งร้อยสิบหกบาทห้าสิบสตางค์

string simple = NumToWord.BahtText(116);
// Output: หนึ่งร้อยสิบหกบาทถ้วน

API Compatibility with PyThaiNLP

This library provides an API similar to PyThaiNLP:

PyThaiNLP thainlp.net
word_tokenize(text) WordTokenizer.WordTokenize(text)
word_tokenize(text, engine="newmm") WordTokenizer.WordTokenize(text, engine: "newmm")
word_tokenize(text, custom_dict=trie) WordTokenizer.WordTokenize(text, customDict: trie)
word_tokenize(text, keep_whitespace=False) WordTokenizer.WordTokenize(text, keepWhitespace: false)
num_to_thaiword(number) NumToWord.NumToThaiWord(number)
bahttext(number) NumToWord.BahtText(number)

Testing

Run the test suite:

dotnet test

Creating a Release

The project is configured to automatically create GitHub releases and publish to NuGet when a version tag is pushed.

Prerequisites

  1. Create a NuGet API key at nuget.org
  2. Add the API key as a secret in your GitHub repository settings:
    • Go to Settings → Secrets and variables → Actions
    • Add a new repository secret named NUGET_API_KEY
    • Paste your NuGet API key as the value

Release Process

  1. Update the version in thainlp/Thainlp.csproj:

    <Version>0.1.0</Version>
    
  2. Commit your changes:

    git commit -am "Bump version to 0.1.0"
    git push
    
  3. Create and push a version tag:

    git tag v0.1.0
    git push origin v0.1.0
    

The GitHub Actions workflow will automatically:

  • Build the project
  • Run tests
  • Create the NuGet package
  • Create a GitHub release with the package attached
  • Publish to NuGet

Continuous Integration

Every push to any branch triggers the CI workflow which:

  • Builds the project
  • Runs tests
  • Creates the NuGet package as an artifact (not published)

License

See LICENSE file for details.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net8.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.1.3 99 1/15/2026
0.1.2 105 1/11/2026
0.1.1 106 1/11/2026
0.1.0 99 1/10/2026