returned for Tags:"tokenization"
Tokenization of raw text is a standard pre-processing step for many NLP tasks. For English, tokenization usually involves punctuation splitting and separation of some affixes like possessives. Other languages require more extensive token pre-processing, which is usually called segmentation.
TextMatch is a library for searching inside texts using Lucene query expressions. Supports all types of Lucene query expressions - boolean, wildcard, fuzzy. Options are available for tweaking tokenization, such as case-sensitivity and word stemming.
CyberSource Flex API Server Side SDK
SDK for server-side integration with CyberSource Flex API
Extract tokens from a string of text for use with NLP tools or statistical analysis.
Class library for working with general language constructs.
Text tokenization based on Unicode grapheme clustering, and the XID_Start and XID_Continue binary properties.
Concise monadic parser combinator library with separate lexer/parser phases and big-size input support.
JSON grammar based on Parsimonious parser lib.
Lightweight HttpModule, REST API, and managed API for *safe* and highly-optimized server-side image processing.
Works with .NET 4.5.1+, ASP.NET WebForms, MVC 1-4, WebAPI, Routing, IIS 7, 7.5, & 8, 8.1 and nearly all...