ThaiNLP.NET
0.1.3
dotnet add package ThaiNLP.NET --version 0.1.3
NuGet\Install-Package ThaiNLP.NET -Version 0.1.3
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ThaiNLP.NET" Version="0.1.3" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="ThaiNLP.NET" Version="0.1.3" />
<PackageReference Include="ThaiNLP.NET" />
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add ThaiNLP.NET --version 0.1.3
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: ThaiNLP.NET, 0.1.3"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package ThaiNLP.NET@0.1.3
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=ThaiNLP.NET&version=0.1.3
#tool nuget:?package=ThaiNLP.NET&version=0.1.3
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
thainlp.net
Thai NLP in .NET
Features
Word Tokenization
- newmm - Dictionary-based maximal matching word segmentation constrained by Thai Character Cluster (TCC) boundaries
- API similar to PyThaiNLP for easy migration from Python
Subword Tokenization
- TCC (Thai Character Cluster) tokenization for breaking text into character clusters
Number to Thai Word Conversion
- NumToThaiWord - Convert numbers to Thai text representation
- BahtText - Convert numbers to Thai currency format (Baht and Satang)
Installation
From NuGet (Recommended)
dotnet add package ThaiNLP.NET
Or via Package Manager:
Install-Package ThaiNLP.NET
From Source
Build the project:
dotnet build
Usage
Word Tokenization (newmm)
Basic usage:
using Thainlp;
// Simple tokenization
var tokens = WordTokenizer.Tokenize("ประเทศไทยมีอากาศดี");
// Output: ["ประเทศ", "ไทย", "มี", "อากาศ", "ดี"]
// With more options
var tokens = WordTokenizer.WordTokenize(
text: "โอเคบ่พวกเรารักภาษาบ้านเกิด",
engine: "newmm",
keepWhitespace: true
);
// Output: ["โอเค", "บ่", "พวกเรา", "รัก", "ภาษา", "บ้านเกิด"]
Custom Dictionary
using Thainlp;
using System.Collections.Generic;
// Create custom dictionary
var customWords = new List<string> { "ชินโซ", "อาเบะ" };
var customDict = new Trie(customWords);
// Use with tokenizer
var tokens = WordTokenizer.WordTokenize(
"ชินโซ อาเบะ เกิด 21 กันยายน",
customDict: customDict
);
TCC (Thai Character Cluster) Tokenization
using Thainlp;
// Tokenize into character clusters
var clusters = TCC.Segment("ประเทศไทย");
// Output: ["ป", "ระ", "เท", "ศ", "ไ", "ท", "ย"]
// Get cluster positions
var positions = TCC.GetPositions("ประเทศไทย");
Legacy Subword API
using Thainlp;
// Original TCC implementation
var clusters = Subword.tcc("ประเทศไทย");
var positions = Subword.tcc_pos("ประเทศไทย");
Number to Thai Word Conversion
using Thainlp;
// Convert number to Thai words
string text = NumToWord.NumToThaiWord(112);
// Output: หนึ่งร้อยสิบสอง
string negative = NumToWord.NumToThaiWord(-273);
// Output: ลบสองร้อยเจ็ดสิบสาม
// Convert to Thai Baht currency format
string baht = NumToWord.BahtText(5611116.50);
// Output: ห้าล้านหกแสนหนึ่งหมื่นหนึ่งพันหนึ่งร้อยสิบหกบาทห้าสิบสตางค์
string simple = NumToWord.BahtText(116);
// Output: หนึ่งร้อยสิบหกบาทถ้วน
API Compatibility with PyThaiNLP
This library provides an API similar to PyThaiNLP:
| PyThaiNLP | thainlp.net |
|---|---|
word_tokenize(text) |
WordTokenizer.WordTokenize(text) |
word_tokenize(text, engine="newmm") |
WordTokenizer.WordTokenize(text, engine: "newmm") |
word_tokenize(text, custom_dict=trie) |
WordTokenizer.WordTokenize(text, customDict: trie) |
word_tokenize(text, keep_whitespace=False) |
WordTokenizer.WordTokenize(text, keepWhitespace: false) |
num_to_thaiword(number) |
NumToWord.NumToThaiWord(number) |
bahttext(number) |
NumToWord.BahtText(number) |
Testing
Run the test suite:
dotnet test
Creating a Release
The project is configured to automatically create GitHub releases and publish to NuGet when a version tag is pushed.
Prerequisites
- Create a NuGet API key at nuget.org
- Add the API key as a secret in your GitHub repository settings:
- Go to Settings → Secrets and variables → Actions
- Add a new repository secret named
NUGET_API_KEY - Paste your NuGet API key as the value
Release Process
Update the version in
thainlp/Thainlp.csproj:<Version>0.1.0</Version>Commit your changes:
git commit -am "Bump version to 0.1.0" git pushCreate and push a version tag:
git tag v0.1.0 git push origin v0.1.0
The GitHub Actions workflow will automatically:
- Build the project
- Run tests
- Create the NuGet package
- Create a GitHub release with the package attached
- Publish to NuGet
Continuous Integration
Every push to any branch triggers the CI workflow which:
- Builds the project
- Runs tests
- Creates the NuGet package as an artifact (not published)
License
See LICENSE file for details.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
net8.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.