Stringier.Patterns 0.6.0

Provides SNOBOL4 or UNICON inspired patterns and parsing

There is a newer version of this package available.
See the version list below for details.
Install-Package Stringier.Patterns -Version 0.6.0
dotnet add package Stringier.Patterns --version 0.6.0
<PackageReference Include="Stringier.Patterns" Version="0.6.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Stringier.Patterns --version 0.6.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: Stringier.Patterns, 0.6.0"
#r directive can be used in F# Interactive, C# scripting and .NET Interactive. Copy this into the interactive tool or source code of the script to reference the package.
// Install Stringier.Patterns as a Cake Addin
#addin nuget:?package=Stringier.Patterns&version=0.6.0

// Install Stringier.Patterns as a Cake Tool
#tool nuget:?package=Stringier.Patterns&version=0.6.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Stringier.Patterns

Patterns, probably introduced with SNOBOL, and also seen with SPITBOL and UNICON, are considerably more powerful than Regular Expressions. So what do you do when you need to parse something more complicated than a Regex? Hacky Regex extensions aren't great, and still lack what some advanced alternatives can offer. Parser Combinators? Actually these are great. I'm not going to bash them at all. Pattern Matching and Parser Combinators share a huge amount of theory and implementation details. You could consider them alternative interpretations of the same concept. That being said, you'll notice a few small differences, but the advantages apply to both.

Including

using System.Text.Patterns;

Usage

In most situations, there's only three usage patterns you'll need to know.

Declaration

Pattern patternName = "Text to match";
// Comparison of a Literal

or

Pattern patternName = ("Text to match", StringComparison.CurrentCultureIgnoreCase);
// Comparisons of this Literal will use the StringComparison value

or

Pattern patternName = literalPattern1 & (literalPattern2 | literalPattern3);
// Comparison of an actual pattern

Matching

patternName.Consume("Candidate text");
//Assuming Consume captures "Candidate" this will return true and "Candidate"

Inline (Quick Match)

"Hello".Consume("Hello World!");
//Assuming "Hello" captures "Hello" (which it obviously will) this will return true and "Hello"

Concepts

Multiple return values

Pattern matching is largely based around the idea of goal-direction. The two most likely languages you're using this library from C# and VB.NET don't support goal-direction (if you're using F# then FParsec is going to match that programming style better anyways). Goal-directed semantics require both a success state and the result to be returned from every function call (or just the success state for a void return).

But wait, C# can't return multiple values!

While true, this is remarkably pedantic. Whether you return an array, a struct, a class, or a tuple, you are returning multiple values as one conceptual value. All the parsing methods return Result which contains both the success state (Boolean) and the result of the operation (String). Result implicitly casts to both Boolean and String and can be used as such. This allows some conveniences without adding new methods.

So every return passes two values? Isn't that a lot of extra memory?

One, no not really, a single Boolean isn't very large. Two, it doesn't actually pass a Boolean at all. An empty string is recognized as a failure. Essentially Result is a box of String with special comparisons and implicit conversions. In other words, the behavior of Parse and TryParse combined into one method. And, getting technical, we're not actually passing around String either. We're actually passing around Span<Char> for performance reasons; actually passing around references to parts of the string, preventing copying in most situations.

Literal

Pattern patternName = "Literal Pattern";

This is an exact 1:1 match pattern, and is equivalent to

"pattern" == "candidate"`

Literal is meant mostly as a building block for patterns. Because pattern operators expect to use a Literal, which is not a string, the convenient syntax shown above only applies to Literal. Use inside a pattern operator might require a cast like

(Pattern)"Literal Pattern" & "Other Literal Pattern"

This is generally only required as the very first member

Alternator

Pattern patternName = pattern1 | pattern2;

Alternators accept either pattern, and are equivalent to the regex (pattern1|pattern2).

Combinator

Pattern patternName = pattern1 & pattern2;

Combinators require both patterns in sequence and are equivalent to the regex (pattern1)(pattern2) with the unnecessary parenthesis added for readability.

Optor

Pattern patternName = ~pattern;

Optors make the pattern completly optional, so success is always true, and are equivalent to the regex (pattern)?.

Repeater

Pattern patternName = pattern * 3; //repeats the pattern three times

Repeaters require the pattern to repeat the specified number of times, and can be thought of the multiplcation to patterns when combinators are addition. The above example would be equivalent to the regex pattern{3}.

Spanner

Pattern patternName = +pattern;

Spanners require the pattern to exist at least once, but will repeat until the pattern can no longer be matched, and are equivalent to the regex pattern+.

OptorSpanners

Pattern patternName = ~+pattern;

or

Pattern patternName = +~pattern;

Technically not its own type, but this does represent a Regex symbol that doesn't have a direct matching. It is equivalent to the regex pattern*.

I'm not sure if one of these forms is superior to the other. Conceptually they are the same though.

Stringier.Patterns

Patterns, probably introduced with SNOBOL, and also seen with SPITBOL and UNICON, are considerably more powerful than Regular Expressions. So what do you do when you need to parse something more complicated than a Regex? Hacky Regex extensions aren't great, and still lack what some advanced alternatives can offer. Parser Combinators? Actually these are great. I'm not going to bash them at all. Pattern Matching and Parser Combinators share a huge amount of theory and implementation details. You could consider them alternative interpretations of the same concept. That being said, you'll notice a few small differences, but the advantages apply to both.

Including

using System.Text.Patterns;

Usage

In most situations, there's only three usage patterns you'll need to know.

Declaration

Pattern patternName = "Text to match";
// Comparison of a Literal

or

Pattern patternName = ("Text to match", StringComparison.CurrentCultureIgnoreCase);
// Comparisons of this Literal will use the StringComparison value

or

Pattern patternName = literalPattern1 & (literalPattern2 | literalPattern3);
// Comparison of an actual pattern

Matching

patternName.Consume("Candidate text");
//Assuming Consume captures "Candidate" this will return true and "Candidate"

Inline (Quick Match)

"Hello".Consume("Hello World!");
//Assuming "Hello" captures "Hello" (which it obviously will) this will return true and "Hello"

Concepts

Multiple return values

Pattern matching is largely based around the idea of goal-direction. The two most likely languages you're using this library from C# and VB.NET don't support goal-direction (if you're using F# then FParsec is going to match that programming style better anyways). Goal-directed semantics require both a success state and the result to be returned from every function call (or just the success state for a void return).

But wait, C# can't return multiple values!

While true, this is remarkably pedantic. Whether you return an array, a struct, a class, or a tuple, you are returning multiple values as one conceptual value. All the parsing methods return Result which contains both the success state (Boolean) and the result of the operation (String). Result implicitly casts to both Boolean and String and can be used as such. This allows some conveniences without adding new methods.

So every return passes two values? Isn't that a lot of extra memory?

One, no not really, a single Boolean isn't very large. Two, it doesn't actually pass a Boolean at all. An empty string is recognized as a failure. Essentially Result is a box of String with special comparisons and implicit conversions. In other words, the behavior of Parse and TryParse combined into one method. And, getting technical, we're not actually passing around String either. We're actually passing around Span<Char> for performance reasons; actually passing around references to parts of the string, preventing copying in most situations.

Literal

Pattern patternName = "Literal Pattern";

This is an exact 1:1 match pattern, and is equivalent to

"pattern" == "candidate"`

Literal is meant mostly as a building block for patterns. Because pattern operators expect to use a Literal, which is not a string, the convenient syntax shown above only applies to Literal. Use inside a pattern operator might require a cast like

(Pattern)"Literal Pattern" & "Other Literal Pattern"

This is generally only required as the very first member

Alternator

Pattern patternName = pattern1 | pattern2;

Alternators accept either pattern, and are equivalent to the regex (pattern1|pattern2).

Combinator

Pattern patternName = pattern1 & pattern2;

Combinators require both patterns in sequence and are equivalent to the regex (pattern1)(pattern2) with the unnecessary parenthesis added for readability.

Optor

Pattern patternName = ~pattern;

Optors make the pattern completly optional, so success is always true, and are equivalent to the regex (pattern)?.

Repeater

Pattern patternName = pattern * 3; //repeats the pattern three times

Repeaters require the pattern to repeat the specified number of times, and can be thought of the multiplcation to patterns when combinators are addition. The above example would be equivalent to the regex pattern{3}.

Spanner

Pattern patternName = +pattern;

Spanners require the pattern to exist at least once, but will repeat until the pattern can no longer be matched, and are equivalent to the regex pattern+.

OptorSpanners

Pattern patternName = ~+pattern;

or

Pattern patternName = +~pattern;

Technically not its own type, but this does represent a Regex symbol that doesn't have a direct matching. It is equivalent to the regex pattern*.

I'm not sure if one of these forms is superior to the other. Conceptually they are the same though.

NuGet packages (8)

Showing the top 5 NuGet packages that depend on Stringier.Patterns:

Package Downloads
Stringier
Meta-package for all of Stringier.
Stringier.Patterns.FSharp
Provides F# support for Stringier.Patterns
Stringier.Patterns.MSTest
Provides extensions to mstest for unit testing Patterns
Stringier.Patterns.NUnit
Provides extensions to NUnit for unit testing Patterns
Stringier.Patterns.Parser
Implements a parser for Stringier's Patterns. This is meant for internal use, as it only parses a generalized format of Stringier's Pattern expressions.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version History

Version Downloads Last updated
5.0.0-beta.5 88 5/29/2021
3.1.0 380 6/2/2020
3.0.0 449 3/27/2020
2.3.0 354 1/21/2020
2.2.0 344 1/9/2020
2.1.1 213 12/29/2019
2.1.0 295 12/23/2019
2.0.0 255 12/1/2019
1.8.0 235 10/18/2019
1.7.0 234 10/12/2019
1.6.0 231 10/9/2019
1.5.0 228 10/7/2019
1.4.3 237 10/4/2019
1.4.2 235 10/4/2019
1.4.1 235 10/3/2019
1.4.0 231 10/1/2019
1.3.0 221 9/27/2019
1.2.0 230 9/23/2019
1.1.0 231 9/21/2019
1.0.1 240 9/14/2019
1.0.0 252 9/12/2019
0.12.0 227 9/1/2019
0.11.0 219 8/28/2019
0.10.0 226 8/23/2019
0.9.0 227 8/17/2019
0.8.3 258 8/11/2019
0.8.2 253 8/11/2019
0.8.1 260 8/10/2019
0.8.0 256 8/8/2019
0.7.0 251 8/7/2019
0.6.1 259 8/3/2019
0.6.0 254 8/3/2019
0.5.1 261 8/2/2019
0.5.0 256 8/2/2019
0.4.1 250 8/2/2019
0.4.0 269 5/4/2019
0.3.1 273 5/3/2019
0.3.0 288 5/3/2019
0.2.0 266 4/29/2019
0.1.1 309 4/24/2019
0.1.0 304 4/24/2019
Show less