Bytescout.PDFExtractor 12.0.0.4062

Bytescout PDF Extractor SDK for .NET, ASP.NET, ActiveX - extract data from PDF documents

Install-Package Bytescout.PDFExtractor -Version 12.0.0.4062
dotnet add package Bytescout.PDFExtractor --version 12.0.0.4062
<PackageReference Include="Bytescout.PDFExtractor" Version="12.0.0.4062" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Bytescout.PDFExtractor --version 12.0.0.4062
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: Bytescout.PDFExtractor, 12.0.0.4062"
#r directive can be used in F# Interactive, C# scripting and .NET Interactive. Copy this into the interactive tool or source code of the script to reference the package.
// Install Bytescout.PDFExtractor as a Cake Addin
#addin nuget:?package=Bytescout.PDFExtractor&version=12.0.0.4062

// Install Bytescout.PDFExtractor as a Cake Tool
#tool nuget:?package=Bytescout.PDFExtractor&version=12.0.0.4062
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Release Notes

Bytescout PDF Extractor SDK for .NET, ASP.NET, ActiveX.

ByteScout, Inc. (c) 2008-2020.

Compatibility: .NET Framework 2.0 or later; .NET Core 2.0 or later.
Works with: .NET, ASP.NET, ActiveX, Visual Basic 6, Classic ASP, Delphi and others.

Features:

- Extracts data from PDF files in TXT, CSV, XML, XLS, XLSX, JSON formats;
- Extracts embedded images, files and attachments from PDF files;
- Splits and merges PDF files, extracts a single page or range of pages;
- Extracts data from whole document page or specified rectangular region;
- Extracts PDF document information (author, subject, producer etc);
- Detects tables;
- Searches text inside document with regex support;
- Extracts data from PDF forms;
- Reads text from scanned PDF documents using OCR (Optical Character Recognition);
- Provides ActiveX interface to use from legacy programming languages (Visual Basic 6, Delphi) and scripting (VBscript, JScript and others);
- And much more...

History of changes:

12.0.0.4062 (February 8, 2021)
==============================
+ Added public 'BaseExtractor.ExtractionArea' property (in addition to 'SetExtractionArea()' method) for more intuitive use.
= Added the new property 'ColumnDetectionByTextAlignment' to extractors that affects the detection of table columns without separating lines between.
+ Added support for simplified profiles.
+ DocumentOptimizer: Added the property 'OptimizationOptions.GrayscaleImages' that converts all color images to grayscale.
+ UnsearchablePDFMaker: Added the new property 'KeepSkippedPages' that keeps pages excluded from the processing in the output document.
+ UnsearchablePDFMaker: Added the new property 'Grayscale' that converts all processed pages to grayscale.
+ Added the property 'BaseTextExtractor.TextAnalysisCorruptedTextThreshold' to fine-tune the text analysis.
= Member names in profiles are case-insensitive now.
= Improved filtering of invisible objects.
= Improved detection of bold fonts.
= Improved OCR rotation detection.
= Added missing OCR mode 'OCRMode.TextFromVectorsAndRepairedFonts'.
= RTL fonts detection is now enabled by default.
= JSON extractor now generates clean JSON (without the @ and# characters for attributes).
= Improved support for external Chinese fonts.
= Improved positioning of rotated PDF objects.
= Now the damaged CCITT and JBIG2 images are skipped from rendering avoiding crashes.
= SearchablePDFMaker: improved OCR when 'DiscardExistingDocumentText' is enabled.
= 'SearchablePDFMaker.GetPageOCRCells()' now detects text color.
= OCR in all extractors now detects text color if the 'ConsiderFontColors' property is enabled.
= 'LineGroupingMode.JoinOrphanedRows' now separates rows of different color if 'ConsiderFontColors' property is enabled.
- InfoExtractor: Fixed a crash if the input document is an image.
- Fixed OCR crash on rotated text.
- 'IsOCRRecommendedForPage()' now skips text objects outside the page crop box.
= Improved parsing of PDF documents.
= Other minor fixes and improvements.

11.3.0.3983 (October 26, 2020)
==============================
+ DocumentSplitter: Added support for regions with inverted page numbers. For example, "!1" means "the last page", "!1-!3" or "!3-" means "last three pages".
+ DocumentSplitter: Added support for "*" split range that means "split every single page".
+ Added 'InfoExtractor.Metadata' property that gets XMP metadata from the document.
= Improved joining of multi-line cells in tables without borders ('LineGroupingMode.JoinOrphanedRows' mode).
= Improved detection of OCR language file versions.
= Improved .NET Core 2.0 compatibility.
= Improved unwrapping of multi-line cell text.
- Fixed issue when invisible vector drawings were causing unwanted separation of text objects.
- Fixed extraction from area when running OCR against image file (not PDF!).
= Improved parsing of PDF documents.
- Other minor fixes and improvements.

11.2.0.3919 (June 20, 2020)
===========================
+ 'MultimediaExtractor' now supports extraction of 3D-animation objects.
- 'TextExtractor.Find()' now keeps original font names in found object information.
= Improved column detection in 'ColumnDetectionMode.Borders' mode.
- 'SearchablePDFMaker' did not process vector-only pages. Fixed now.
= Improved regex text search in 'TextExtractor'.
+ Added 'DetectUnderlineTextStyle' and 'DetectStrikeoutTextStyle' properties to 'JSONExtractor' and 'XMLExtractor'.
+ Added 'OCRWhiteList' and 'OCRBlackList' properties to extractors.
+ Added 'Invert' OCR preprocessing filter.
+ Added 'Scale' OCR preprocessing filter.
= Improved joining of multi-line cells in tables without borders ('LineGroupingMode.JoinOrphanedRows' mode).
= Improved performance of 'ImageExtractor'.
+ Added page rectangles to 'InfoExtractor'.
= Improved 'OCRAnalyzer'.
= Improved automatic deletion of duplicated text objects during the extraction.
- Fixed extraction issues in .NET Core version.
= Improved parsing of PDF documents.
- Other minor fixes and improvements.

11.1.0.3845 (March 19, 2020)
============================
+ Added 'OCROverallConfidence' property in all extractors that.
+ SearchablePDFMaker: Added 'KeepOriginalRotation' property.
- SearchablePDFMaker: fixed crash on mixed English-Arabic text recognition.
+ PDF Multitool: Added "Developer Tools" sub-menu to the context menu.
= Improved parsing of PDF documents.
- Other minor fixes and improvements.

11.0.0.3805 (February 11, 2020)
===============================
+ Added support for new revision of PDF encryption (ISO 32000-2:2017 compliance).
+ Added 'LicenseInfo' property providing detailed information about your license.
+ Added 'Grayscale' filter to OCRImagePreprocessingFilters.
= Dramatically improved column extraction for multiple tables on a page. Works only in 'ColumnDetectionMode.Borders' mode for tables with borders between columns and rows.
= Greatly improved 'ColumnDetectionMode.BorderedTables'. As in the table detection, it now uses optical recognition to detect bordered tables and their columns on scanned documents.
= Improved 'InfoExtractor' to return the encrypted and password-protected states without asking a password or throwing an exception.
= Added document permissions information to 'InfoExtractor'.
= DocumentSplitter: added zero-padding to page numbers in generated file names.
= Improved extraction of duplicated text (shadow-like effect).
= Improved 'MultimediaExtractor'.
- Fixed text search issues on some documents.
- Fixed bug that damaged extracted text only during multi-thread processing.
- Fixed crash on subsequent extractions with different OCR modes.
- Fixed .NET Core compatibility issue.
= Improved parsing of PDF documents.
- Other minor fixes and improvements.

...

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version History

Version Downloads Last updated
12.0.0.4062 461 2/8/2021
11.3.0.3983 795 10/26/2020
11.2.1.3959 680 9/1/2020
11.2.1.3929 579 7/14/2020
11.2.1.3926 234 7/9/2020
11.2.0.3919 305 6/30/2020
11.1.0.3869 2,475 4/10/2020
11.1.0.3864 394 4/4/2020
11.1.0.3849 452 3/27/2020
11.1.0.3845 435 3/19/2020
11.0.0.3834 553 3/6/2020
11.0.0.3832 308 3/4/2020
11.0.0.3830 276 3/4/2020
11.0.0.3815 478 2/21/2020
11.0.0.3805 540 2/11/2020
10.8.0.3758 1,233 12/19/2019
10.8.0.3750 365 12/17/2019
10.8.0.3744 320 12/12/2019
10.8.0.3741 267 12/10/2019
10.8.0.3736 384 12/6/2019
10.8.0.3732 335 12/4/2019
10.7.2.3710 682 11/13/2019
10.7.1.3705 315 11/11/2019
10.7.0.3697 435 11/2/2019
10.6.0.3666 1,109 10/1/2019
10.5.0.3637 1,067 9/2/2019
10.4.0.3618 764 8/15/2019
10.4.0.3613 383 8/13/2019
10.4.0.3602 435 8/7/2019
10.3.0.3566 980 7/2/2019
10.2.0.3548 1,129 6/13/2019
10.2.0.3534 348 6/11/2019
10.2.0.3525 368 6/7/2019
10.2.0.3514 400 5/28/2019
10.1.0.3444 813 4/5/2019
10.1.0.3439 393 4/4/2019
10.0.0.3429 458 3/25/2019
10.0.0.3427 371 3/25/2019
10.0.0.3424 375 3/23/2019
10.0.0.3423 359 3/23/2019
10.0.0.3422 362 3/23/2019
10.0.0.3421 413 3/21/2019
9.4.0.3398 503 3/12/2019
9.3.0.3366 851 2/12/2019
9.3.0.3357 525 2/4/2019
9.3.0.3354 423 1/31/2019
9.2.0.3293 1,433 11/20/2018
9.2.0.3262 743 10/24/2018
9.2.0.3259 480 10/24/2018
9.1.0.3170 1,158 7/26/2018
9.1.0.3167 658 7/18/2018
9.1.0.3165 561 7/18/2018
9.1.0.3163 611 7/18/2018
9.0.0.3095 1,746 4/23/2018
9.0.0.3087 861 4/13/2018
9.0.0.3080 676 4/11/2018
8.8.1.3046 1,131 2/20/2018
8.8.1.3025 1,363 1/29/2018
8.8.0.3021 702 1/23/2018
8.7.0.2981 2,363 11/8/2017
8.6.0.2917 1,664 8/2/2017
8.6.0.2912 633 8/1/2017
8.5.0.2863 863 6/9/2017
8.5.0.2861 731 6/8/2017
8.5.0.2856 723 6/1/2017
8.4.1.2829 4,945 4/12/2017
8.4.0.2821 719 3/29/2017
8.3.0.2809 1,073 3/13/2017
8.3.0.2806 652 3/12/2017
8.3.0.2803 661 3/6/2017
8.3.0.2801 630 3/6/2017
8.3.0.2800 635 3/6/2017
8.3.0.2798 623 3/6/2017
8.3.0.2796 642 3/6/2017
8.3.0.2794 639 3/6/2017
8.2.0.2699 1,046 1/11/2017
8.1.1.2606 1,540 10/25/2016
8.1.0.2600 709 10/21/2016
8.0.0.2542 905 9/1/2016
8.0.0.2541 686 9/1/2016
8.0.0.2528 731 8/23/2016
8.0.0.2523 689 8/19/2016
7.0.0.2493 25,458 6/27/2016
7.0.0.2489 634 6/27/2016
7.0.0.2480 1,404 6/10/2016
7.0.0.2474 1,037 5/26/2016
6.30.0.2421 874 3/24/2016
6.20.0.2354 899 1/20/2016
6.12.0.2239 3,656 9/22/2015
5.20.0.1871 1,395 2/5/2015
5.0.0.1626 1,423 8/14/2014
4.0.0.1487 938 5/31/2014
3.40.0.1349 1,071 3/11/2014
3.20.0.1092 1,084 8/5/2013
3.20.0.1075 1,797 7/12/2013
3.10.0.1051 947 6/29/2013
3.0.0.839 1,034 3/26/2013
2.50.0.769 1,047 2/25/2013