Docotic.Pdf 8.6 with text processing improvements

We have a new release ready. Docotic 8.6 and its add-ons are now available on our site and NuGet.

The new version contains improvements related to text processing, memory consumption, and content extraction.

With Docotic.Pdf 8.6, you can programmatically detect cases when regular text extraction methods produce garbled / unexpected text. This happens for documents that do not contain mappings of glyphs to Unicode characters. Or contains incorrect mappings.

We added the ability to specify a custom handler for character codes that define unmapped glyphs. You can use the handler to skip or replace those glyphs. Fix garbled text when extracting from PDF documents sample code shows how to use OCR to extract text properly, even if a document contains unmapped glyphs.

Docotic.Pdf 8.6 uses less memory and time to decode LZW and Flate streams. It also imports PNG images faster and using less memory. When drawing a text using a Type3 font, the library uses less time, memory, and temporary streams. Processing a document with inline images also requires less temporary streams.

To help with content extraction and copying, we have added the ability to extract and copy painted Form XObjects. It is now possible to extract blend mode with page objects. From now on, the PdfPage.GetObjects() group of methods returns invisible paths.

To showcase the changes, we added Edit PDF page content sample. And we updated the existing Copy text, paths and images between PDF pages sample.

As always, we fixed bugs in the library. The new version contains fixes for text drawing and extraction, form controls, and images.

Read about all new features and improvements in Docotic.Pdf 8.6 in the Version History document.

We encourage you to download and try the new version. This version is also available on NuGet.

Please tell us your thoughts about the new version using e-mail or the support form. Don’t hesitate to write us your questions, suggest features, or ask for help.

Posted in