PercolatorMatching 1.1.0

Percolator Matching

A simple dll that contains a matching class to match strings and to calculate the score of similarity between the two strings using the Ratcliff-Obershelp algorithm.

Install-Package PercolatorMatching -Version 1.1.0
dotnet add package PercolatorMatching --version 1.1.0
<PackageReference Include="PercolatorMatching" Version="1.1.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add PercolatorMatching --version 1.1.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Release Notes

I originally built this when I found out that the fuzzy lookup and fuzzy grouping components of SSIS were only available on enterprise editions of SQL server.  I've used this to scan over database tables to search for possible duplicate entries, and output the results to another table for a user to look over at a later time, and other applications as well.

Reference the dll and expose the namespace "Percolator.Matching". Make a new instance of "Fuzzylator."

The "ThresholdPercentage" is the threshold that the two strings must meet in order to be deemed as similar. This can be set while creating the new object, or later. If no threshold is set, then it will default to the "Zero" percent.

There are several overloads of the "IsSimilar" method to accomodate a couple different scenarios.
--Durring every check a score is calculated. The optional out parameter can be used to grab that score out of the check if he or she wishes to use it later rather than having to calculate the same score later on. --An optional ThresholdPercentage can be used on a single method to use that percentage rather than the one set by the instance for that one method call.

The "IsUPCSimilar" is a specialized UPC scanner that is streamlined specifically for a upc string. It does not calculate longest common subsequences, rather just looks at each digit in order and returns the score.

"GetScore" returns the score between the two strings, using the Ratcliff/Obershelp algorithm.

"GetUPCScore" again is a streamlined algorithm specifically for a UPC string.

Examples =>

using the similarty bools:

var fuz = new Fuzzylator(ThresholdPercentage.Eighty);

string str1 = "Test String";
string str2 = "A Test String";

if (fuz.IsSimilar(str1, str2))
{
//Do something
}

double score; if (fuz.IsSimilar(str1, str2, out score))
{
//Do something
Console.WriteLine(score); //score now contains the score of the two strings
}

if (fuz.IsSimilar(str1, str2, ThresholdPercentage.Ninety))
{
//Do something
//The IsSimilar check uses a Ninety percent threshold for this one time.
}

double score = fuz.GetScore(str1, str2, true); //the score variable now holds the value of the score between str1 and str2, optionally ignoring the case.

Dependencies

This package has no dependencies.

This package is not used by any popular GitHub repositories.

Version History

Version Downloads Last updated
1.1.0 659 4/7/2015