Archive for the ‘PDF Library’ category

Encryption API changes in Docotic.Pdf 7.5

We have released Docotic.Pdf 7.5 on our site and NuGet.

We made a lot of changes and improvements to the library’s encryption API in this release. And there is one more very important change: the library can now extract right-to-left and bidirectional text in the correct order.

Starting from version 7.5, the library can inspect and decrypt certificate-protected documents. And it is now possible to encrypt any PDF document with one or more certificates.

The new features required us to make a lot of changes to the existing encryption API. We added new classes for different types of encryption and decryption handlers. There is also a new clarified way to check if a PDF document is encrypted. To ease migration from the older API, we added 2021 Encryption API Migration Guide.

We changed text extraction methods in PdfDocument, PdfPage, and PdfCanvas to extract right-to-left and bidirectional text according to the logical order. From now on, these methods also normalize Hebrew and Arabic codepoints from Alphabetic and Arabic Presentation Forms. The text extraction methods now better process column-based and tabular layouts.

In the new version, there are some not so big new features. We added some new sample codes and updated some existing ones. And we fixed quite some bugs.

A lot of properties and methods were marked obsolete in the new version. In all cases, there is a new way to achieve the same.

Read about all new features and improvements in Docotic.Pdf 7.5 in Version History document.

We encourage you to download and try the new version. This version is also available on NuGet.

Please tell us your thoughts about the new version using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features, or ask for help.

Posted in

Image compression improvements in Docotic.Pdf 7.4

Docotic.Pdf 7.4 is now available on our site and on NuGet.

The new release adds ability to recompress images with stencil and soft masks. And now it is possible to resize masked images. You now can use JPEG 2000 compression scheme while resizing images. The new version can compress images with Indexed or Gray color spaces more efficiently. We updated Compress PDF document in .NET and Optimize PDF images in C# and VB.NET sample codes to use latest recommended image optimization approaches.

The new version extracts text faster. We did some important changes to improve in this area. Thanks to some of our customers for sending in great test files!

With Docotic.Pdf 7.4 it is possible to add a timestamp to any digital signature. It is also possible to retrieve and verify embedded timestamps from existing signatures. To illustrate the changes, we added new Sign PDF document and embed a timestamp in C# and VB.NET sample code. We also updated existing Read PDF signature properties in C# and VB.NET and Verify PDF signature in C# and VB.NET sample codes with new timestamping-related features.

This release contains bug fixes for processing of Lab*, Indexed, and Separation color spaces. And fixes for text measurement, drawing, and extraction. The new version contains other important bug fixes, too.

Read about all new features and improvements in Docotic.Pdf 7.4 in Version History document.

We encourage you to download and try the new version. This version is also available on NuGet.

Please tell us your thoughts about the new version using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Posted in

Fixes for handling of disposable objects in Docotic.Pdf 7.3

We released Docotic.Pdf 7.3 on our site and on NuGet.

In this release we fixed some parts of the library that didn’t properly dispose streams. These are quite important fixes and therefore we recommend everyone to update to the latest version of the library.

With the new release we are moving closer to our goal of getting rid of System.Drawing and GDI+ dependencies in Docotic.Pdf completely. Starting from version 7.3, the library no longer uses System.Drawing and GDI+ when resizing images, detecting which parts of text are invisible, or processing certain soft mask images. Also, we marked some methods, constructors, and operators that depend on System.Drawing types obsolete. For any now obsolete entity the library provides an other way to achieve the same.

Docotic.Pdf 7.3 can be used from Blazor and from HoloLens projects. After some changes from our side, the corresponding tools can properly process the library.

This release also contains bug fixes for text extraction and drawing (including drawing with some tricky CJK fonts).

Read about all new features and improvements in Docotic.Pdf 7.3 in Version History document.

We encourage you to download and try the new version. This version is also available on NuGet.

Please tell us your thoughts about the new version using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Posted in

Support for more logging platforms in Docotic.Pdf 7.2

Hello,

We have released Docotic.Pdf 7.2 on our site and on NuGet.

Starting from the new release, the library can automatically detect and attach to logging frameworks. NLog, Log4Net, Serilog and Loupe loggers are supported. You don’t need to do anything extra, if your solution uses NLog, for example. Docotic.Pdf will output its log messages into the configured loggers. We also added two new samples Logging with NLog and Logging with log4net to illustrate how it works.

We continue our efforts to get rid of System.Drawing and GDI+ dependencies in Docotic.Pdf completely. Starting from version 7.2, the library no longer uses System.Drawing and GDI+ when saving (extracting) images “as painted”. This also improves quality of the extracted images because there is no more unwanted image scaling. Previously, the images were scaled due to the difference in resolutions between PDF and GDI+ (72 vs. 96 dots per inch).

This release also contains bug fixes for processing of images and parsing of XMP metadata.

Read about all new features and improvements in Docotic.Pdf 7.2 in Version History document.

We encourage you to download and try the new version. This version is also available on NuGet.

Please tell us your thoughts about the new version using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Posted in

Docotic.Pdf 7.1 can compress certain PDFs better. And there are other improvements too.

HI,

Docotic.Pdf 7.1 is now available on our site and on NuGet.

In this release we added new PdfDocument.ReplaceDuplicateObjects methods. In addition to the previous ability to replace duplicate fonts, the new methods can deduplicate non-inline images, color spaces, patterns and shading objects. These methods are useful when you are trying to reduce output file size. New methods give good results for documents, which were incrementally updated or created by a merge of several documents with the same objects.

We also added new signature appearance options. Now it is possible to add an image to a signature. You can also specify the alignment of the text inside a signature. It is possible to hide all the text inside a signature, if you don’t need the text.

The new version can save whole PDF files or individual PDF pages as grayscale images. This usually produces smaller images. If you are interested, please take a look at the new ImageCompressionOptions.CreateGrayscaleJpeg, ImageCompressionOptions.CreateGrayscalePng, and ImageCompressionOptions.CreateGrayscaleTiff methods.

There are two breaking changes in version 7.1. One affects the way the library draws glyphs with zero width, and the other is about background and border colors of a control.

This release also contains bug fixes for text and images extraction, drawing of documents, and other areas.

Read about all new features and improvements in Docotic.Pdf 7.1 in Version History document.

We encourage you to download and try the new version. This version is also available on NuGet.

Please tell us your thoughts about the new version using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Posted in

OCR PDF in C# and VB.NET

Text extraction is one of the most popular PDF processing tasks. You would need to extract text from a PDF document if you want to:

  • index the document for full-text search
  • parse some data like names and prices
  • highlight, or delete, or replace a word or a phrase

You can extract text manually. Open a document in any PDF viewer, then select and copy some text. It works properly for most documents. We know such documents as “searchable PDF”. Searchable PDF documents render text using special PDF operators and contain correct mappings of glyphs to Unicode in font objects associated with the text.

Many PDF libraries can extract text from searchable PDF documents.

There are also non-searchable PDF documents. Non-searchable documents usually render text as a raster image. A typical example is a scanned PDF document. Non-searchable PDF documents may also render text as vector paths without using fonts or special PDF operators.

You need to perform optical character recognition (OCR) to extract text from non-searchable PDF documents. OCR does not guarantee correct results in 100% of cases. Results depend on the document’s quality and the recognition algorithm. Also, optical recognition is much slower than the extraction of text from searchable documents.

Let’s look at how to perform OCR and extract text from PDF documents in a C# and VB.NET applications.

Read more

Posted in ,

Extract text from PDF in C# and VB.NET

Extracting text from a PDF document is a common task for C# and VB.NET developers. You can use Docotic.Pdf library to extract text in just a few lines of code on Windows, Linux, macOS, Android, iOS, or in a cloud environment.

You will need Docotic.Pdf library to try the sample code. Download Docotic.Pdf binaries or use its NuGet package. Depending on your project, you can pick the version for either .NET Framework 4 or .NET Standard 2.0. To try the library without evaluation mode restrictions, you may get the free time-limited license key here.

There are different approaches to text extraction. Let’s look at some practical examples.

Read more

Posted in

Docotic.Pdf 7.0 with support for digital signatures

Hello,

We have published Docotic.Pdf 7.0 on our site and on NuGet.

The main feature of this release is support for digital signatures. The library can sign new and existing PDF documents. To sign a document please use one of the PdfDocument.SignAndSave() methods. You can create signatures of different types, in different formats, using different digest algorithms. For complete set of properties please take a look at the new PdfSigningOptions type.

The library can also verify existing digital signatures. It can verify if digest (hash) is valid, check if a signature contains embedded OCSP or CRL revocation data, or if the signing certificate is revoked. You can also access signing and issuer certificate properties. All this is available via PdfSignature.Contents property.

We created Digital signatures group of samples for all new abilities.

Starting from version 7.0 the library no longer uses System.Drawing.Bitmap when drawing images. This and other improvements increase stability of all ASP.NET applications that perform PDF to image conversion. Also, the library now consumes less memory when drawing PDF documents.

This release also contains bug fixes for text and images extraction, drawing of documents, and for processing of forms and annotations.

Read about all new features and improvements in Docotic.Pdf 7.0 in Version History document.

We encourage you to download and try the new version. This version is also available on NuGet.

Please tell us your thoughts about the new version using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Posted in

New PDF rendering engine in Docotic.Pdf 6.0

Hi,

We have published a new major release of Docotic.Pdf library.

Docotic.Pdf 6.0 brings a new PDF rendering engine that does not depend on System.Drawing.Graphics class. The new engine greatly improves PDF to image conversion in ASP.NET applications and also in Linux and Mac OS environments. This is a major step in “no dependency on System.Drawing” direction. We will continue improving in this area in future releases.

Along with the rendering engine change, we improved PdfPage.Save() method. The method now produces 24bpp images instead of 32bpp when background is opaque. In most cases that leads to smaller output files.

We marked methods of PdfCanvas, PdfDocumentView, and PdfPage that acccept parameters of types from System.Drawing namespace as obsolete. Those methods will be removed in the next release of Docotic.Pdf. For each of the now obsolete methods there is at least one overload. Please use the overloads instead of the obsolete methods.

There is a change our customers asked us about. In the newest release we added PdfTextExtractionOptions.Rectangle property. The property is useful when you want to extract text from only a part of a page.

We changed LicenseManager class so now it is thread-safe. You can use it from multiple threads at the same time. It is still recommended to add all license data at the start of your application. See remarks to LicenseManager.AddLicenseData method.

Read about all new features and improvements in Docotic.Pdf 6.0 in Version History document.

We encourage you to download and try the new version. This version is also available on NuGet.

Please tell us your thoughts about the new version using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Posted in

FIPS compliance, new annotation properties and a lot of bug fixes in Docotic.Pdf 5.10

Hello,

We have released Docotic.Pdf 5.10 on NuGet and on our site.

In this release we changed the library to be as much FIPS-compliant as possible. In fact, this is the first release you can actually use in FIPS mode. When running on a machine with FIPS mode enabled, the library can not use older (non-FIPS compliant) algorithms. It means it can not encrypt or decrypt documents with RC4 algorithm. But other functions like drawing or text extraction will work just fine.

Version 5.10 brings a lot of new properties for annotation classes. We extended PdfCaretAnnotation, PdfEllipseAnnotation, PdfFreeTextAnnotation, PdfFileAttachmentAnnotation, PdfInkAnnotation, PdfLineAnnotation, PdfPolygonAnnotation, PdfPolylineAnnotation, PdfPopupAnnotation, PdfRectangleAnnotation, PdfSoundAnnotation, PdfStampAnnotation, PdfTextMarkupAnnotation, and PdfTextAnnotation. And we added one property to the base PdfWidget class, too.

As usual, we increased speed of PDF drawing. And we improved support for PDFs with broken or incorrect structure. We also added new sample codes that show how to OCR PDF documents.

This release also contains a lot of bug fixes. The fixes cover different areas like drawing, text extraction, parsing, editing of annotations and controls, and some other areas, too.

Read about all new features and improvements in Docotic.Pdf 5.10 in Version History document.

We encourage you to download and try the new version. This version is also available on NuGet.

Please tell us your thoughts about the new version using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Posted in