GroupDocs.Search
24.9.0
dotnet add package GroupDocs.Search --version 24.9.0
NuGet\Install-Package GroupDocs.Search -Version 24.9.0
<PackageReference Include="GroupDocs.Search" Version="24.9.0" />
paket add GroupDocs.Search --version 24.9.0
#r "nuget: GroupDocs.Search, 24.9.0"
// Install GroupDocs.Search as a Cake Addin #addin nuget:?package=GroupDocs.Search&version=24.9.0 // Install GroupDocs.Search as a Cake Tool #tool nuget:?package=GroupDocs.Search&version=24.9.0
Advanced Document Search & Indexing .NET API
GroupDocs.Search for .NET is a comprehensive library enabling developers to build advanced search and indexing capabilities into their .NET applications. It supports a wide range of document formats and provides features such as semantic search, entity recognition, sentiment analysis, and custom entity extraction. With its flexible API and support for various search types, developers can easily implement powerful search functionalities, enhance data analysis, and gain valuable insights from their documents.
Creating an Index
- Index Directory: Specify a directory where index data will be stored.
- Memory-Based Indexing: Option to create an index in memory for faster search (but not persistent).
Adding Documents to an Index
- Individual Files: Add files one by one using their paths.
- Directory Scanning: Add all supported documents within a specified directory and its subdirectories.
- File Streams: Index documents directly from streams for flexibility.
Updating and Maintaining an Index
- Incremental Updates: Add or remove individual documents without rebuilding the entire index.
- Index Optimization: Improve search performance by optimizing the index structure periodically.
- Index Backup and Restore: Create backups of the index for safety and restore them if needed.
Basic Search Functionality
- Full-Text Search: Index and search within the body text of documents.
- Metadata Search: Search based on document metadata (author, title, keywords, etc.).
- Supported File Formats: Search across various file types (DOCX, PDF, HTML, etc.).
- Simple Query Syntax: Use basic search terms and phrases.
Advanced Search Options
- Boolean Operators: Combine search terms using AND, OR, NOT for precise queries.
- Wildcards: Use * to match any character sequence, ? to match a single character.
- Regular Expressions: Employ powerful pattern matching for complex searches.
- Fuzzy Search: Find near matches even with typos or slight variations.
- Proximity Search: Search for words within a specified distance of each other.
- Field Search: Search within specific document fields or properties.
Filtering and Sorting Search Results
- Filter by Metadata: Narrow down results based on metadata values.
- Filter by Date Range: Limit results to documents within a specific time frame.
- Sort by Relevance: Order results based on their relevance to the search query.
- Sort by Date: Order results by document creation or modification date.
- Custom Sorting: Implement custom sorting logic based on specific criteria.
Working with Metadata
- Metadata Extraction: Automatically extract and index metadata during indexing.
- Metadata Search: Search and filter results based on extracted metadata.
- Metadata Display: Include metadata in search results for additional context.
Highlighting and Snippets
- Search Term Highlighting: Visually highlight search terms within the retrieved documents.
- Search Result Snippets: Display a short excerpt of text surrounding the search terms to provide context.
Semantic Search and Entity Recognition
- Semantic Search: Understand the meaning behind search queries and retrieve conceptually relevant results, going beyond keyword matching.
- Entity Recognition: Identify and extract named entities like people, organizations, locations, dates, and more from text.
Sentiment Analysis
- Sentiment Determination: Analyze text to determine the overall sentiment expressed, classifying it as positive, negative, or neutral.
- Sentiment Scoring: Assign sentiment scores to text, indicating the degree of positivity or negativity.
Document Classification
- Text Classification: Categorize documents into predefined classes or categories based on their content.
- Custom Classification: Train custom classification models to categorize documents according to specific criteria or taxonomies.
Custom Entity Extraction
- Custom Entity Definition: Define custom entity types and extraction rules based on specific requirements.
- Custom Entity Extraction: Extract user-defined entities from text using the defined rules and patterns.
Advanced Search API Features
- Creating Custom Analyzers: Tailor text processing during indexing and searching. Define tokenization rules, stemming algorithms, stop words, and more.
- Configuring Indexing Options: Control which file types and parts of documents are indexed. Set indexing depth and update frequency.
- Implementing Search Result Ranking: Customize the scoring algorithm to prioritize certain results. Factor in metadata, document structure, or other custom criteria.
- Semantic Search and Entity Recognition: Understand the meaning behind search queries. Identify named entities (people, organizations, locations, etc.) within text.
- Sentiment Analysis: Determine the overall sentiment (positive, negative, neutral) expressed in text.
- Document Classification: Categorize documents based on their content.
- Custom Entity Extraction: Define and extract custom entities from text.
Supported Document Formats
Document Type | Document Type Description | Searchable Data | Supported Versions | Notes |
---|---|---|---|---|
Word Processing | ||||
DOC | Microsoft Word® Document | Content and metadata | Microsoft Word® 97+ | |
DOT | Microsoft Word® Document Template | Content and metadata | Microsoft Word® 97+ | |
DOCX | Office Open XML Document | Content and metadata | ||
DOCM | Office Open XML Document [Macro-enabled] | Content and metadata | ||
DOTX | Office Open XML Document Template | Content and metadata | ||
DOTM | Office Open XML Document Template [Macro-enabled] | Content and metadata | ||
TXT | Plain text | Content and metadata | ||
ODT | Open Document Text | Content and metadata | ||
OTT | Open Document Text Template | Content and metadata | ||
RTF | Rich Text Format | Content and metadata | 1.9 | |
Portable Document Format File | Content and metadata | |||
Markup | ||||
HTML | Hypertext Markup Language File | Content and metadata | ||
XHTML | Extensible Hypertext Markup Language File | Content and metadata | ||
MHTML | MIME HTML File | Content and metadata | Not supported by .NET Core version in Linux | |
MD | Markdown | Content and metadata | ||
XML | XML File | Content and metadata | ||
Ebooks | ||||
CHM | Compiled HTML Help File | Content and metadata | 1.4 | |
EPUB | Open eBook File | Content and metadata | 2.0, 3.0, 3.1 | |
FB2 | FictionBook 2.0 File | Content and metadata | 2.0 | |
Spreadsheets | ||||
XLS | Microsoft Excel® Spreadsheet | Content and metadata | Microsoft Excel® 97+ | |
XLT | Microsoft Excel® Spreadsheet Template | Content and metadata | Microsoft Excel® 97+ | |
XLSX | Office Open XML Spreadsheet | Content and metadata | ||
XLSM | Office Open XML Spreadsheet [Macro-enabled] | Content and metadata | ||
XLSB | Office Open XML Spreadsheet [Binary] | Content and metadata | ||
XLTX | Office Open XML Spreadsheet Template | Content and metadata | ||
XLTM | Office Open XML Spreadsheet Template [Macro-enabled] | Content and metadata | ||
XLA | Microsoft Excel® 97-2003 Add-In | Content and metadata | ||
XLAM | Microsoft Excel® Open XML Add-In | Content and metadata | ||
ODS | Open Document Spreadsheet | Content and metadata | ||
OTS | Open Document Spreadsheet Template | Content and metadata | ||
CSV | Comma Separated Values | Content and metadata | ||
TSV | Tab Separated Values | Content and metadata | ||
XML | SpreadsheetML | Content and metadata | ||
Presentations | ||||
PPT | PowerPoint® Presentation | Content and metadata | Microsoft PowerPoint® 97+ | |
PPS | PowerPoint® Slideshow | Content and metadata | Microsoft PowerPoint® 97+ | |
POT | PowerPoint® Template | Content and metadata | Microsoft PowerPoint® 97+ | |
PPTX | Office Open XML Presentation | Content and metadata | ||
PPTM | Office Open XML Presentation [Macro-enabled] | Content and metadata | ||
POTX | Office Open XML Presentation Template | Content and metadata | ||
POTM | Office Open XML Presentation Template [Macro-enabled] | Content and metadata | ||
PPSX | Office Open XML Presentation Slideshow | Content and metadata | ||
PPSM | Office Open XML Presentation Slideshow [Macro-enabled] | Content and metadata | ||
ODP | Open Document Presentation | Content and metadata | ||
Emails | ||||
PST | Outlook Personal Information Store File | Content and metadata | ||
OST | Outlook Offline Data File | Content and metadata | ||
EML | E-Mail Message | Content and metadata | ||
EMLX | Apple Mail Message | Content and metadata | ||
MSG | Outlook Mail Message | Content and metadata | ||
Notes | ||||
OneNote® | OneNote® Document | Content and metadata | Local files of Microsoft OneNote® 2010-2016 | Not supported by .NET Core version in Linux |
Archives | ||||
ZIP | Zipped File | Content and metadata | ||
Audio | ||||
MP3 | MPEG-2 Audio Layer III | Metadata only | ||
WAV | Waveform Audio File Format | Metadata only | ||
Images | ||||
BMP | Bitmap Picture | Content and metadata | ||
GIF | Graphical Interchange Format File | Content and metadata | ||
JP2 | JPEG 2000 Core Image File | Content and metadata | ||
PNG | Portable Network Graphics | Content and metadata | ||
WEBP | WebP Image Format File | Content and metadata | ||
TIFF | Tagged Image File Format | Content and metadata | ||
EMF | Enhanced Windows Metafile | Content and metadata | ||
WMF | Windows Metafile | Content and metadata | ||
JPG | JPEG Image | Content and metadata | ||
PSD | Adobe Photoshop Document | Content and metadata | ||
DJVU | DjVu Image | Content and metadata | ||
Project Management | ||||
MPP | Microsoft Project File | Metadata only | ||
Torrents | ||||
TORRENT | BitTorrent File | Metadata only | ||
Diagrams | ||||
VSD | Visio® Drawing File | Metadata only | ||
VSS | Visio® Stencil File | Metadata only | ||
Medicine | ||||
DCM | DICOM Image | Metadata only | ||
DICOM | DICOM Image | Metadata only | ||
Videos | ||||
AVI | Audio Video Interleave File | Metadata only | ||
MOV | Apple QuickTime Movie | Metadata only | ||
QT | Apple QuickTime Movie | Metadata only | ||
FLV | Animate Video File | Metadata only | ||
ASF | Advanced Systems Format File | Metadata only |
Supported Search Types
- Simple word search: Searches for the exact occurrence of a word in the indexed documents.
- Boolean search: Combines multiple search terms using logical operators like AND, OR, and NOT.
- Regular expression search: Uses patterns and expressions to search for complex text structures.
- Faceted search: Filters search results based on specific categories or fields.
- Case sensitive search: Differentiates between uppercase and lowercase characters in the search query.
- Flexible fuzzy search: Finds words with similar spelling, allowing for minor typing or spelling errors.
- Synonym search: Searches for words and their synonyms to expand search results.
- Homophone search: Finds words that sound the same but have different spellings.
- Wildcard search: Uses placeholders like * or ? to match varying characters or word fragments.
- Phrase search with wildcards: Searches for a specific phrase while allowing variations with wildcards.
- Search for different word forms: Matches different grammatical forms of a word, such as plural or tense variations.
- Date range search: Filters documents based on a specific date or a range of dates.
- Numeric range search: Finds data within a specified numeric range.
- Search by chunks (pages): Searches within specific sections or pages of a document.
- Search for different object types: Searches across various data types, such as text, numbers, dates, file names, and metadata.
- Combining different types of search into one search query: Mixes multiple search types, such as combining Boolean and wildcard searches in one query.
- Alias substitution in search queries: Replaces defined aliases with their full meanings during the search.
- Spell check during search: Automatically corrects minor spelling mistakes in the search query.
- Keyboard layout correction during search: Adjusts the search query for different keyboard layouts or language settings.
- Search queries in text or flexible object form: Accepts both textual and structured object-based search queries.
- Highlighting search results: Highlights the found terms or phrases directly in the document.
- Multiple simultaneous thread safe search: Allows multiple searches to be run concurrently without conflicts.
- Thread safe search during indexing, updating, or merging operations: Ensures safe searching while the index is being modified.
- Search over several indexes simultaneously: Performs searches across multiple indexes in a single operation.
- Reverse image search: Finds images based on similarity or matching image characteristics rather than text.
System Requirements
Supported Platforms/Versions | |
---|---|
Supported Operating Systems | |
Windows | Microsoft Windows 2003 Server (x64, x86), Microsoft Windows 2008 Server (x64, x86), Microsoft Windows 2012 Server (x64, x86), Microsoft Windows 2012 R2 Server (x64, x86), Microsoft Windows 2016 Server (x64, x86), Microsoft Windows 2019 Server (x64, x86), Microsoft Windows Vista (x64, x86), Microsoft Windows XP (x64, x86), Microsoft Windows 7 (x64, x86), Microsoft Windows 8, 8.1 (x64, x86), Microsoft Windows 10 (x64, x86) |
Linux | Linux (Ubuntu, OpenSUSE, CentOS, and others) |
Supported Frameworks | |
.NET Frameworks | .NET Framework 4.5, 4.5.1, 4.5.2, 4.6, 4.6.1, 4.6.2, 4.7, 4.7.1, 4.7.2, 4.8, .NET Standard 2.1, .NET Core 3.0, .NET Core 3.1, .NET 5.0, .NET 6.0 |
Development Environments | |
Visual Studio Versions | Microsoft Visual Studio 2012, 2013, 2015, 2017, 2019, 2022 |
Install via NuGet
Using Package Manager GUI
- Open your solution in Visual Studio.
- Go to
Tools
→NuGet Package Manager
→Manage NuGet Packages for Solution
. - In the
Browse
tab, search forGroupDocs.Search
. - Click
Install
to add it to your project.
Using Package Manager Console
- Open your solution in Visual Studio.
- Go to
Tools
→NuGet Package Manager
→Package Manager Console
. - Run the command:
Install-Package GroupDocs.Search
- GroupDocs.Search will be referenced in your project.
Install from Official Website
- Download and unpack the ZIP or use the MSI installer from the official website.
- In Visual Studio, right-click
References
and selectAdd Reference
. - Browse and select
GroupDocs.Search.dll
, or choose from the installed components. - Click
OK
to complete the reference.
Indexing Documents from URL Using GroupDocs.Search for .NET
Learn how to index a document from a URL using GroupDocs.Search for .NET. This example demonstrates lazy initialization of documents from a URL and indexing them for efficient search in .NET applications.
// Class to load a document from a URL with lazy initialization
private class DocumentLoaderFromUrl : IDocumentLoader
{
private readonly string documentKey; // Document identifier (URL in this case)
private readonly string url; // The URL to fetch the document from
private readonly string extension; // The file extension of the document
// Constructor to initialize document properties
public DocumentLoaderFromUrl(string documentKey, string url, string extension)
{
this.documentKey = documentKey;
this.url = url;
this.extension = extension;
}
// Method to load the document from the URL stream
public Document LoadDocument()
{
// Configure security protocols for web requests
ServicePointManager.Expect100Continue = true;
ServicePointManager.SecurityProtocol =
SecurityProtocolType.Ssl3 |
SecurityProtocolType.Tls |
SecurityProtocolType.Tls12 |
SecurityProtocolType.Tls11;
// Create a web request to access the URL
WebRequest request = WebRequest.Create(url);
using (WebResponse response = request.GetResponse())
using (Stream stream = response.GetResponseStream())
{
// Copy the stream into memory
MemoryStream memoryStream = new MemoryStream();
stream.CopyTo(memoryStream);
memoryStream.Position = 0;
// Create a Document object from the memory stream
Document document = Document.CreateFromStream(documentKey, DateTime.Now, extension, memoryStream);
return document;
}
}
// Method to close the document (empty in this case)
public void CloseDocument()
{
}
}
// Define the index folder path where the index will be stored
string indexFolder = @"c:\MyIndex";
// Define the URL of the document to be indexed
string url = "http://example.com/ExampleDocument.pdf";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Creating a document loader object to fetch the document from the URL
string documentKey = url;
IDocumentLoader documentLoader = new DocumentLoaderFromUrl(documentKey, url, ".pdf");
// Creating a lazy-initialized document object
Document document = Document.CreateLazy(DocumentSourceKind.Stream, documentKey, documentLoader);
// Prepare an array of documents for indexing
Document[] documents = new Document[] { document };
// Indexing options (default options in this case)
IndexingOptions options = new IndexingOptions();
// Add the lazy-loaded document to the index
index.Add(documents, options);
Homophone Search Using GroupDocs.Search for .NET
Learn how to perform homophone search using GroupDocs.Search for .NET. This code example demonstrates enabling homophone search to find similar-sounding words like "coal," "cole," and "kohl" in indexed documents.
// Specify the index folder path where the index will be created
string indexFolder = @"c:\MyIndex\";
// Specify the folder path containing documents to be indexed
string documentsFolder = @"c:\MyDocuments\";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Adding documents to the index from the specified folder
index.Add(documentsFolder);
// Creating search options to enable homophone search
SearchOptions options = new SearchOptions();
options.UseHomophoneSearch = true; // Enabling homophone search
// Search for the word 'coal' in the indexed documents
// Homophone search will also find words that sound like 'coal', such as 'cole' and 'kohl'
SearchResult result = index.Search("coal", options);
Perform Shard Optimization using GroupDocs.Search for .NET
Learn how to optimize shards in a search network using GroupDocs.Search for .NET. This example demonstrates how to improve search performance by minimizing the number of index segments on each shard through the optimization process.
// Inform the user that the optimization process is starting
Console.WriteLine("Optimizing shards");
// Access the Indexer class for the current search network node
Indexer indexer = node.Indexer; // Assuming 'node' is defined elsewhere in your search network
// Create optimization options
OptimizeOptions options = new OptimizeOptions();
// Perform the optimization process on all shards
indexer.Optimize(options); // This reduces the number of index segments on each shard
Tags
Aspose | GroupDocs |Advanced Document Search | Indexing API | .NET Search Library | Semantic Search API | Boolean Search | Fuzzy Search | Metadata Search | Entity Recognition API | Sentiment Analysis | Custom Entity Extraction | Document Classification API | Full-Text Search | Field Search | Regular Expressions Search | Proximity Search | Custom Search Ranking | Indexing Optimization | Distributed Search Network | Reverse Image Search | Search API | .NET Document Search | Document Indexing API | GroupDocs.Search for .NET | Text Search API | Search Results Highlighting | Document Metadata Search | Snippets Extraction | Wildcard Search | Search API for .NET
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.1 is compatible. |
.NET Framework | net45 is compatible. net451 was computed. net452 was computed. net46 was computed. net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETFramework 4.5
- No dependencies.
-
.NETStandard 2.1
- Microsoft.Extensions.DependencyModel (>= 2.0.4)
- Microsoft.Win32.Registry (>= 4.7.0)
- SkiaSharp (>= 2.88.6)
- SkiaSharp.NativeAssets.Linux.NoDependencies (>= 2.88.3)
- System.CodeDom (>= 4.4.0)
- System.Diagnostics.PerformanceCounter (>= 4.5.0)
- System.Drawing.Common (>= 6.0.0)
- System.Reflection.Emit (>= 4.7.0)
- System.Reflection.Emit.ILGeneration (>= 4.7.0)
- System.Security.Cryptography.Pkcs (>= 5.0.1)
- System.Security.Permissions (>= 4.6.0)
- System.Security.Principal.Windows (>= 4.7.0)
- System.Text.Encoding.CodePages (>= 7.0.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
24.9.0 | 879 | 9/11/2024 |
24.8.0 | 1,068 | 8/22/2024 |
24.6.0 | 1,013 | 6/20/2024 |
24.5.0 | 637 | 5/14/2024 |
24.4.0 | 1,317 | 4/18/2024 |
24.3.0 | 2,811 | 3/20/2024 |
24.2.1 | 894 | 2/29/2024 |
24.2.0 | 695 | 2/28/2024 |
24.1.0 | 541 | 1/26/2024 |
23.12.0 | 2,001 | 12/4/2023 |
23.11.0 | 3,406 | 11/23/2023 |
23.10.1 | 2,001 | 10/20/2023 |
23.10.0 | 1,462 | 10/2/2023 |
23.6.0 | 4,441 | 6/15/2023 |
23.2.0 | 4,295 | 2/28/2023 |
22.11.0 | 2,348 | 11/24/2022 |
22.10.1 | 2,360 | 10/12/2022 |
22.10.0 | 1,447 | 10/7/2022 |
21.8.1 | 39,053 | 8/23/2021 |
21.8.0 | 1,550 | 8/18/2021 |
21.3.0 | 41,379 | 3/18/2021 |
21.2.0 | 26,869 | 2/18/2021 |
20.11.0 | 35,950 | 11/19/2020 |
20.8.0 | 68,924 | 8/17/2020 |
20.6.0 | 63,429 | 6/23/2020 |
20.4.0 | 66,165 | 4/15/2020 |
20.1.0 | 53,164 | 1/31/2020 |
19.10.1 | 57,659 | 11/6/2019 |
19.10.0 | 1,006 | 10/2/2019 |
19.5.1 | 844 | 7/15/2019 |
19.5.0 | 804 | 5/31/2019 |
19.3.0 | 842 | 3/6/2019 |
19.2.0 | 924 | 2/5/2019 |
18.12.0 | 1,087 | 12/11/2018 |
18.9.0 | 1,126 | 9/6/2018 |
18.8.0 | 1,303 | 8/8/2018 |
18.7.0 | 1,178 | 7/14/2018 |
18.6.0 | 1,244 | 6/14/2018 |
18.5.0 | 1,142 | 5/16/2018 |
18.4.0 | 1,295 | 4/9/2018 |
18.2.0 | 1,257 | 2/8/2018 |
18.1.0 | 1,253 | 1/9/2018 |
17.12.0 | 1,474 | 12/7/2017 |
17.11.0 | 1,241 | 11/9/2017 |
17.10.0 | 1,125 | 10/3/2017 |