TextNormalizer 1.0.1
dotnet add package TextNormalizer --version 1.0.1
NuGet\Install-Package TextNormalizer -Version 1.0.1
<PackageReference Include="TextNormalizer" Version="1.0.1" />
<PackageVersion Include="TextNormalizer" Version="1.0.1" />
<PackageReference Include="TextNormalizer" />
paket add TextNormalizer --version 1.0.1
#r "nuget: TextNormalizer, 1.0.1"
#:package TextNormalizer@1.0.1
#addin nuget:?package=TextNormalizer&version=1.0.1
#tool nuget:?package=TextNormalizer&version=1.0.1
TextNormalizer
An open-source .NET library for standardizing English text and numbers. It automatically handles issues like number conversion, abbreviation expansion, and format unification in unstructured English content, making it suitable for scenarios such as data cleaning and NLP preprocessing.
📸 Visual Example of Results
Below are the core functionality results of the library (input text on the left, normalized output on the right), covering common text issues in real-world scenarios:
1. Number Normalization (Text → Numerals + Standard Format)
| Input Text | Normalized Output |
|---|---|
| "five twenty five" | "525" |
| "eight of these products had sales exceeding 100 million yuan, three euros and sixty five cents, on august twenty sixth twenty twenty one" | "8 of these products had sales exceeding 100000000 yuan, €3.65, on august 26th 2021" |
| "revenue of 5.8 billion yuan, average sales 549,000 yuan" | "revenue of 5800000000 yuan, average sales 549000 yuan" |
2. Text Abbreviation & Title Expansion
| Input Text | Normalized Output |
|---|---|
| "Mr. Park visited Assoc. Prof. Kim Jr." | "mister park visited associate professor kim junior" |
| "Chagee's founder said in the first quarter" | "chagee is founder said in the 1st quarter" |
3. Punctuation & Format Cleaning
| Input Text | Normalized Output |
|---|---|
| "100 million yuan,,,, [$14.1 million yuan],,," | "100000000 yuan, $14100000 yuan" |
| "sales over 700 million cups ( three years ago )" | "sales over 700000000 cups 3 years ago" |
🚀 Quick Start in 3 Steps
1. Prerequisites
Supported .NET frameworks:
- .NET Framework 4.8
- .NET Standard 2.0+
- .NET 6.0+ (compatible via .NET Standard)
2. Installation
Option 1: Install via NuGet (Recommended)
# .NET CLI
dotnet add package TextNormalizer
# Package Manager Console
Install-Package TextNormalizer
Option 2: Reference Source Code
- Clone the repository:
git clone https://github.com/manyeyes/TextNormalizer.git - Add the
TextNormalizerproject to your solution and reference it.
3. Core Code Example
Complete normalization can be achieved with just 3 lines of code:
using TextNormalizer;
// 1. Create normalizer instances
var textNormalizer = new EnglishTextNormalizer();
var spellingNormalizer = new EnglishSpellingNormalizer();
// 2. Input text to be processed
string input = "Mr. Park said sales exceeded three million yuan on july fifth twenty twenty three.";
// 3. Perform normalization (text + spelling processing)
string normalizedText = textNormalizer.GetEnglishTextNormalizer(input);
normalizedText = spellingNormalizer.GetEnglishSpellingNormalizer(normalizedText);
// Output result: "mister park said sales exceeded 3000000 yuan on july 5th 2023."
Console.WriteLine(normalizedText);
🔧 Core Capabilities
| Capability Category | Specific Features | Examples |
|---|---|---|
| Number Processing | Text-based numbers → Arabic numerals | "twenty nineteen" → "2019" |
| Currency unit standardization | "100 million dollars" → "$100000000" | |
| Date format unification | "august twenty sixth" → "august 26th" | |
| Percentage/symbol conversion | "ninety percent" → "90%" | |
| Text Processing | Title/abbreviation expansion | "Assoc. Prof." → "associate professor" |
| Possessive/contraction parsing | "He's" → "he is" | |
| Redundant punctuation cleaning | "yuan,,,, " → "yuan " | |
| Case unification | "By the end" → "by the end" | |
| Spelling Processing | British → American spelling unification | "mobilisation" → "mobilization" |
✅ Test Validation
The project includes built-in unit tests covering all core scenarios to ensure stability:
- Number Normalization: Validates 30+ scenarios (including currency, dates, and special formats like "double zero seven"→"007")
- Text Processing: Covers common abbreviations (Mr./Prof./Jr.) and contractions (Let's/He's)
- Format Compatibility: Handles text with interfering characters like parentheses and commas
To run tests:
dotnet test TextNormalizer.Tests.csproj
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-android34.0 is compatible. net8.0-browser was computed. net8.0-ios was computed. net8.0-ios18.0 is compatible. net8.0-maccatalyst was computed. net8.0-maccatalyst18.0 is compatible. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net8.0-windows10.0.19041 is compatible. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 is compatible. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 is compatible. |
| .NET Framework | net461 is compatible. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 is compatible. net48 is compatible. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETCoreApp 3.1
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
.NETFramework 4.6.1
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
.NETFramework 4.7.2
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
.NETFramework 4.8
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
.NETStandard 2.0
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
.NETStandard 2.1
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
net6.0
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
net8.0
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
net8.0-android34.0
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
net8.0-ios18.0
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
net8.0-maccatalyst18.0
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
-
net8.0-windows10.0.19041
- Rationals (>= 2.3.0)
- Unidecode.NET (>= 2.1.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.