Bytescout.PDFExtractor 13.1.1.4480

.NET Core 2.1 .NET Framework 2.0
Install-Package Bytescout.PDFExtractor -Version 13.1.1.4480
dotnet add package Bytescout.PDFExtractor --version 13.1.1.4480
<PackageReference Include="Bytescout.PDFExtractor" Version="13.1.1.4480" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Bytescout.PDFExtractor --version 13.1.1.4480
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: Bytescout.PDFExtractor, 13.1.1.4480"
#r directive can be used in F# Interactive, C# scripting and .NET Interactive. Copy this into the interactive tool or source code of the script to reference the package.
// Install Bytescout.PDFExtractor as a Cake Addin
#addin nuget:?package=Bytescout.PDFExtractor&version=13.1.1.4480

// Install Bytescout.PDFExtractor as a Cake Tool
#tool nuget:?package=Bytescout.PDFExtractor&version=13.1.1.4480
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Bytescout PDF Extractor SDK for .NET, ASP.NET, ActiveX - extract data from PDF documents

Product Versions
.NET net5.0 net5.0-windows net6.0 net6.0-android net6.0-ios net6.0-maccatalyst net6.0-macos net6.0-tvos net6.0-windows
.NET Core netcoreapp2.1 netcoreapp2.2 netcoreapp3.0 netcoreapp3.1
.NET Framework net20 net35 net40 net403 net45 net451 net452 net46 net461 net462 net463 net47 net471 net472 net48
Compatible target framework(s)
Additional computed target framework(s)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on Bytescout.PDFExtractor:

Package Downloads
BizDoc.Applications.Invoice-Scan

Invoice for BizDoc

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
13.1.1.4480 59 5/25/2022
13.1.0.4386 3,233 1/25/2022
13.0.1.4281 829 11/8/2021
13.0.0.4254 339 10/4/2021
12.1.5.4183 912 7/5/2021
12.1.5.4181 269 7/5/2021
12.1.4.4171 364 6/17/2021
12.1.4.4169 202 6/17/2021
12.1.3.4167 239 6/16/2021
12.1.2.4156 323 5/28/2021
12.1.1.4149 253 5/26/2021
12.1.1.4145 246 5/26/2021
12.1.0.4136 320 5/18/2021
12.0.0.4062 843 2/8/2021
11.3.0.3983 1,070 10/26/2020
11.2.1.3959 1,677 9/1/2020
11.2.1.3929 896 7/14/2020
11.2.1.3926 477 7/9/2020
11.2.0.3919 495 6/30/2020
11.1.0.3869 2,781 4/10/2020
11.1.0.3864 591 4/4/2020
11.1.0.3849 649 3/27/2020
11.1.0.3845 633 3/19/2020
11.0.0.3834 753 3/6/2020
11.0.0.3832 514 3/4/2020
11.0.0.3830 493 3/4/2020
11.0.0.3815 675 2/21/2020
11.0.0.3805 809 2/11/2020
10.8.0.3758 1,555 12/19/2019
10.8.0.3750 558 12/17/2019
10.8.0.3744 499 12/12/2019
10.8.0.3741 480 12/10/2019
10.8.0.3736 595 12/6/2019
10.8.0.3732 581 12/4/2019
10.7.2.3710 932 11/13/2019
10.7.1.3705 521 11/11/2019
10.7.0.3697 638 11/2/2019
10.6.0.3666 1,496 10/1/2019
10.5.0.3637 1,277 9/2/2019
10.4.0.3618 979 8/15/2019
10.4.0.3613 581 8/13/2019
10.4.0.3602 680 8/7/2019
10.3.0.3566 1,190 7/2/2019
10.2.0.3548 1,585 6/13/2019
10.2.0.3534 554 6/11/2019
10.2.0.3525 595 6/7/2019
10.2.0.3514 585 5/28/2019
10.1.0.3444 1,153 4/5/2019
10.1.0.3439 590 4/4/2019
10.0.0.3429 654 3/25/2019
10.0.0.3427 556 3/25/2019
10.0.0.3424 564 3/23/2019
10.0.0.3423 578 3/23/2019
10.0.0.3422 593 3/23/2019
10.0.0.3421 633 3/21/2019
9.4.0.3398 704 3/12/2019
9.3.0.3366 1,705 2/12/2019
9.3.0.3357 677 2/4/2019
9.3.0.3354 582 1/31/2019
9.2.0.3293 1,698 11/20/2018
9.2.0.3262 919 10/24/2018
9.2.0.3259 643 10/24/2018
9.1.0.3170 1,374 7/26/2018
9.1.0.3167 894 7/18/2018
9.1.0.3165 774 7/18/2018
9.1.0.3163 844 7/18/2018
9.0.0.3095 2,001 4/23/2018
9.0.0.3087 1,110 4/13/2018
9.0.0.3080 925 4/11/2018
8.8.1.3046 1,451 2/20/2018
8.8.1.3025 1,706 1/29/2018
8.8.0.3021 958 1/23/2018
8.7.0.2981 4,266 11/8/2017
8.6.0.2917 1,889 8/2/2017
8.6.0.2912 830 8/1/2017
8.5.0.2863 1,103 6/9/2017
8.5.0.2861 996 6/8/2017
8.5.0.2856 964 6/1/2017
8.4.1.2829 5,168 4/12/2017
8.4.0.2821 948 3/29/2017
8.3.0.2809 1,343 3/13/2017
8.3.0.2806 863 3/12/2017
8.3.0.2803 876 3/6/2017
8.3.0.2801 845 3/6/2017
8.3.0.2800 851 3/6/2017
8.3.0.2798 835 3/6/2017
8.3.0.2796 850 3/6/2017
8.3.0.2794 859 3/6/2017
8.2.0.2699 1,290 1/11/2017
8.1.1.2606 1,803 10/25/2016
8.1.0.2600 918 10/21/2016
8.0.0.2542 1,210 9/1/2016
8.0.0.2541 891 9/1/2016
8.0.0.2528 946 8/23/2016
8.0.0.2523 966 8/19/2016
7.0.0.2493 27,809 6/27/2016
7.0.0.2489 843 6/27/2016
7.0.0.2480 2,067 6/10/2016
7.0.0.2474 1,337 5/26/2016
6.30.0.2421 1,084 3/24/2016
6.20.0.2354 1,102 1/20/2016
6.12.0.2239 4,099 9/22/2015
5.20.0.1871 1,628 2/5/2015
5.0.0.1626 1,634 8/14/2014
4.0.0.1487 1,143 5/31/2014
3.40.0.1349 1,323 3/11/2014
3.20.0.1092 1,307 8/5/2013
3.20.0.1075 2,320 7/12/2013
3.10.0.1051 1,301 6/29/2013
3.0.0.839 1,257 3/26/2013
2.50.0.769 1,279 2/25/2013

Bytescout PDF Extractor SDK for .NET, ASP.NET, ActiveX.

ByteScout, Inc. (c) 2008-2022.

Compatibility: .NET Framework 2.0 or later; .NET Core 2.0 or later.
Works with: .NET, ASP.NET, ActiveX, Visual Basic 6, Classic ASP, Delphi and others.

Features:

- Extracts data from PDF files in TXT, CSV, XML, XLS, XLSX, JSON formats;
- Extracts embedded images, files and attachments from PDF files;
- Splits and merges PDF files, extracts a single page or range of pages;
- Extracts data from whole document page or specified rectangular region;
- Extracts PDF document information (author, subject, producer etc);
- Detects tables;
- Searches text inside document with regex support;
- Extracts data from PDF forms;
- Reads text from scanned PDF documents using OCR (Optical Character Recognition);
- Provides ActiveX interface to use from legacy programming languages (Visual Basic 6, Delphi) and scripting (VBscript, JScript and others);
- And much more...

History of changes:

13.1.0.4386 (January 24, 2022)
==============================
+ DocumentMerger: Added property 'MergedDocumentTitle' allowing to override the title of merged document.
+ XLSExtractor: Added property 'CustomColumnWidths' allowing to specify exact column widths in generated Excel spreadsheet.
= JSONExtractor: The mode 'OutputStructure.Full' is renamed to 'OutputStructure.LegacyFixed' and made maximally compatible in field names with the mode 'OutputStructure.Legacy'.
+ Added support for UniKS-UCS2-H text encoding.
+ InfoExtractor: Added method 'GetFormFields()' returning information about form fields in PDF document.
= Improved COM/ActiveX interfaces for in-memory processing without file operations.
+ Extractors and SearchablePDFMaker: Added property 'OCRDisableAutoSegmentation' to solve OCR engine's segmentation issues.
= .NET Core min required version is 2.1 now (was 2.0).
- Line grouping was not affected by 'ConsiderFontSizes' and 'ConsiderFontColors' properties. Fixed now.
- Fixed disposing issue in 'SearchablePDFMaker'.
= Improved parsing of PDF documents.
= Other minor fixes and improvements.

13.0.0.4253 (October 4, 2021)
=============================
+ New column detection mode 'ColumnDetectionMode.ContentGroupsAI' that works better on tables without borders and on pages with multiple tables.
= Greatly improved tables detection in 'TableDetector2'.
= Improved filtering of shadow-like text ('ExtractShadowLikeText' option).
= Improved the 'LineGroupingMode.JoinOrphanedRows'.
= 'DocumentMerger': Improved merging of PDF forms. Now it can link fields with matching names or rename them to avoid unwanted linking. See the property 'RenameMatchingFieldsDuringMerge'.
= 'JSONExtractor' and 'XMLExtractor' now output the page size for each page.
= All extractor classes now support extraction of page ranges.
+ Added properties 'DetectUnderlineTextStyle' and 'DetectStrikeoutTextStyle' to `CSVExtractor` and `XLSExtractor`. They helps to prevent underlined text affecting the line grouping in table cells.
= Improved background color detection for the option 'ConsiderBackgroundColors'.
+ Added property 'NormalizeText' to all extractors. It replaced unicode spaces and hyphens in the extracted text with normal ' ' and '-' characters.
- 'Remover2': fixed handling of PDF page rotation.
- 'Remover2': making unsearchable now performed only for edited pages.
+ 'XMLExtractor': Added property 'IndentedXML' to control indentation.
+ 'JSONExtractor': Added property 'IndentedJSON' to control indentation.
- 'Stamper': fixed stamping of rotated pages.
+ Added new OCR mode - 'OCRMode.AutoRepairFonts'. It automatically tries to detect PDF documents with corrupted text and forces OCR font repair for them. Works only for English texts.
+ Added property 'PageSeparator' to CSV and XLS extractors.
= 'XLSExtractor': improved negative numbers detection.
- 'TextExtractor.FindAll()' method was ignoring the case sensitivity option. Fixed now.
+ Added property 'OCRDetectLines' that helps to detect table structure in scanned documents.
+ 'JSONExtractor' and 'XMLExtractor' now outputs number of pages in the result and number of pages for which OCR was performed.
+ Added property 'OCRPageCount' to extractors that contains number of pages for which OCR was performed during the last extraction.
+ 'JSONExtractor': Added property 'OutputStructure' that allows to select structure of output JSON.
+ 'JSONExtractor': Added property 'OutputTransformation' that allows to apply JSONPath expression to the output JSON.
= Performance improvements.
= Improved parsing of PDF documents.
= Other minor fixes and improvements.

12.1.0.4136 (May 18, 2021)
==========================
+ Added property 'TextExtractor.FuzzySearch' that enables 'fuzzy' text search algorithm. It allows to find 'approximately equal' strings.
+ Added 'DocumentSplitter2' class that splits document by found text.
+ Added 'CSVExtractor.NormalizeCSV' property. It makes CSV data produced from different document pages to contain the same number of columns.
+ Added property 'JSONExtractor.OutputStructure' that allows to change the structure of the generated JSON to one of predefined variants for easier postprocessing.
+ Added property 'JSONExtractor.OutputTransformation' that allows to apply JSONPath expression to the generated JSON.
+ Added property 'OCRPageCount' to extractor classes that contains number of pages for which OCR was performed.
+ 'JSONExtractor' and 'XMLExtractor' now add to the generated JSON and XML result the number of process pages and the number of pages for which OCR was performed.
+ Added property 'OCRDetectLines' to extractor classes that improves column detection in scanned documents.
+ Added property 'ConsiderBackgroundColors' to extractor classes that enables detection of background color under text objects. It may helps to improve row and column detection in tables without borders but with color stripes.
+ Added properties 'DocumentMerger.GenerateBookmarks' and 'DocumentMerger.BookmarkTitles' to enable automatic generation of bookmarks pointing to the merged parts.
= Improved PDF optimization in 'DocumentSplitter'.
= 'DocumentMerger' now uses the first input document as the base for the merged document. This allows to keep document information properties and outlines.
= DocumentMerger: added support for profiles.
= MultimediaExtractor: added support for more media types.
- 'TextExtractor.FindAll()' method was ignoring the case sensitivity option.
- Fixed issue with junk empty temporary files generated during OCR.
= Improved parsing of PDF documents.
= Other minor fixes and improvements.

12.0.0.4062 (February 8, 2021)
==============================
+ Added public 'BaseExtractor.ExtractionArea' property (in addition to 'SetExtractionArea()' method) for more intuitive use.
= Added the new property 'ColumnDetectionByTextAlignment' to extractors that affects the detection of table columns without separating lines between.
+ Added support for simplified profiles.
+ DocumentOptimizer: Added the property 'OptimizationOptions.GrayscaleImages' that converts all color images to grayscale.
+ UnsearchablePDFMaker: Added the new property 'KeepSkippedPages' that keeps pages excluded from the processing in the output document.
+ UnsearchablePDFMaker: Added the new property 'Grayscale' that converts all processed pages to grayscale.
+ Added the property 'BaseTextExtractor.TextAnalysisCorruptedTextThreshold' to fine-tune the text analysis.
= Member names in profiles are case-insensitive now.
= Improved filtering of invisible objects.
= Improved detection of bold fonts.
= Improved OCR rotation detection.
= Added missing OCR mode 'OCRMode.TextFromVectorsAndRepairedFonts'.
= RTL fonts detection is now enabled by default.
= JSON extractor now generates clean JSON (without the @ and# characters for attributes).
= Improved support for external Chinese fonts.
= Improved positioning of rotated PDF objects.
= Now the damaged CCITT and JBIG2 images are skipped from rendering avoiding crashes.
= SearchablePDFMaker: improved OCR when 'DiscardExistingDocumentText' is enabled.
= 'SearchablePDFMaker.GetPageOCRCells()' now detects text color.
= OCR in all extractors now detects text color if the 'ConsiderFontColors' property is enabled.
= 'LineGroupingMode.JoinOrphanedRows' now separates rows of different color if 'ConsiderFontColors' property is enabled.
- InfoExtractor: Fixed a crash if the input document is an image.
- Fixed OCR crash on rotated text.
- 'IsOCRRecommendedForPage()' now skips text objects outside the page crop box.
= Improved parsing of PDF documents.
= Other minor fixes and improvements.

...