Improvements and new features in Docotic.Pdf 4.1

January 14th, 2014

Hello!

We have released new version of Docotic.Pdf library.

Docotic.Pdf 4.1 fixes some bugs related to opening of existing documents (including encrypted ones). The new version also brings number of improvements in drawing of PDF documents as well as in extraction of images.

This version adds new PdfDrawOptions.TileWidth and PdfDrawOptions.TileHeight properties which can be used to control how much memory is used while drawing.

Drawing of PDF documents using tiles can be slower than regular drawing but it allows you to draw very big documents in high resolution without consuming obscene amounts of memory. Were are expecting some speed optimizations in tiled drawing in upcoming releases of the library.

Read about all new features and improvements in Docotic.Pdf 4.1 in the Version History document.

We encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Docotic.Pdf 4. Much improved rendering engine and more.

October 11th, 2013

Hello!

I am very happy to announce that we’ve released Docotic.Pdf 4 on our site.

Seven months of active development were not in vain: the new major release brings some great improvements.

Docotic.Pdf 4 comes with new, much improved text rendering engine. With the new engine text gets drawn at the right positions with the right font. The new version of the library produces much more accurate outputs then the previous version did. Let me assure you: you will see the difference.

Text rendering and text extraction are often go hand in hand. With Docotic.Pdf that’s not different. New version of the library extracts text more precisely and can provide you more information about text in a PDF document. You can know the font, the pen and brush color and the rendering mode used to draw any chunk of text. There are new properties in the PdfTextData class for that.

The new version adds support for JPEG 2000 images. The library can add, extract and draw them. Other imaging related areas also received some of our attention. Some bugs related to processing of JPEGs were fixed. Extraction of images (including masked ones and those in the CMYK color space) was improved.

As with every release of the library, we also fixed processing of some PDFs which not quite meet the standard and documents with unusual internal structure.

Read about all new features and improvements in Docotic.Pdf 4 in the Version History document.

We encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Speed improvements, new features and fixes in Docotic.Pdf 3.7

March 1st, 2013

Hello!

I am happy to announce that Docotic.Pdf 3.7 is finally released.

Like in the previous release of the library, we optimized the code of Docotic.Pdf and now library opens documents even faster than before. The new release also adds some new features, part of which were suggested by our customers. And there are some bug fixes in the new version, too.

We added ability to embed and extract file attachments and file annotations. Please have a look at the new group of samples called Attachments.

The library improved to better preserve existing structures in PDF files. It means that Docotic.Pdf will try to keep an internal structure of a file as is unless it was changed by the user.

Read about all new features and improvements in Docotic.Pdf 3.7 in Version History document

We encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Docotic.Pdf 3.6. Two times faster and uses much less memory.

October 20th, 2012

Hi!

We have released new version of the Docotic.Pdf library. Most of the changes in this version are likely to benefit every customer.

We greatly optimized the code of the library and now Docotic.Pdf takes two times less time and in many cases two times less memory to complete a task. We published separate post about results of our optimizations. You can read it here.

We added ability to check if PDF document is a PDF/A compliant one. Please have a look at Check if PDF document is a PDF/A compliant one sample.

We also added new compression option. It’s called PdfSaveOptions.OptimizeIndirectObjects property. This option is turned on by default and you can turn it off if you want files to be saved faster.

Another improvement is ability to specify resolution for images produced while drawing pages.

We also improved processing of TIFF images and extraction of text and images from PDF documents.

Read about all new features and improvements in Docotic.Pdf 3.6 in Version History document

We encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Optimizations in Docotic.Pdf 3.6

October 20th, 2012

Hello!

Our customers told us that Docotic.Pdf is not always behaves modestly. The library tends to consume large amounts of memory for large files and often spends much time doing some operations.

We’ve done a lot to make new version of Docotic.Pdf faster and less memory-consuming. Now I want to share some statistics about results of our efforts.

To see what we achieved, we took five PDF files and ran some tests on them. Here is the description of the files we took:

File name Page count File size Contents
emerging.pdf 6 94 KB only text
rdsolr1907.pdf 111 2.03 MB mostly text, some images, linearized
official_journal_10022006.pdf 705 20 MB mostly text, some images, linearized
LargePDFFile.pdf 4800 34 MB mostly text, some images, linearized
OReilly.Head.First.C.Sharp.Nov.2007.pdf 765 146 MB mostly scanned images

For a start, we measured how much time and how many memory required to just open a file. The table below contains relative results of our measurements:

Open only
File name Time, % Memory consumption, %
emerging.pdf -13 -51
rdsolr1907.pdf -44 -55
official_journal_10022006.pdf -87 -95
LargePDFFile.pdf -91 -83
OReilly.Head.First.C.Sharp.Nov.2007.pdf -31 -53

It’s nice to see that opening of PDF files is now about 2 times faster and takes about 3 times less memory (on average). And for larger files improvements are even more obvious.

But how the library behaves in more complex scenarios?

Next, we took the same files and measured time and memory required to open PDF and extract formatted text from all of its pages. Below are the results:

Open and extract all text with formatting
File name Time, % Memory consumption, %
emerging.pdf -10 -33
rdsolr1907.pdf -70 -26
official_journal_10022006.pdf -59 -39
LargePDFFile.pdf -66 -39
OReilly.Head.First.C.Sharp.Nov.2007.pdf -54 -31

And again the whole process took about two times less time (on average). Memory gains are less impressive this time but still, about 30% less memory (on average) is not bad at all.

The last one test is simple but represents a real-life scenario. We measured time and memory required to open PDF, then encrypt it with AES 128bit and then save. Below are the results:

Open, encrypt with AES 128bit and save
File name Time, % Memory consumption, %
emerging.pdf -17 -69
rdsolr1907.pdf -42 -9
official_journal_10022006.pdf -84 -70
LargePDFFile.pdf -69 -41
OReilly.Head.First.C.Sharp.Nov.2007.pdf -19 -69

In this case the whole process took about 2 times less time and memory (on average).

We think that such improvements won’t go unnoticed by our customers. And we want to say that we have some thoughts about how to further improve the library. So, we continue to profile and improve Docotic.Pdf.

Please feel free to share your thoughts about recent improvements.

Docotic.Pdf 3.5. Even better than before.

July 30th, 2012

Hello!

We have released new version of the Docotic.Pdf library. The new version brings improvements in text and image extraction as well as in other areas.

We added ability to check if an image painted transformed and ability to save such image as painted, i.e. taking into account rotation, scaling and other transformations that might be applied to the image. Another new feature is ability to save PDF pages as TIFF images.

The new version of Docotic.Pdf library adds support for PDF form XObjects. Such objects are often used for watermarks, backgrounds and repeatable objects. XObjects can be created from scratch or from existing pages (from current or other document). Using XObjects created from pages you can impose (combine) PDF pages onto larger (or same size) sheets to make books, booklets, or special arrangements. Please have a look at new samples that show how to create and use PDF form XObjects.

As with every release of the library, we also fixed some bugs. This version fixes bugs related to opening of existing PDF documents with forms, reading document permissions and processing of fonts and images.

Read about all new features and improvements in Docotic.Pdf 3.5 in Version History document

We encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Write data from a database to PDF

May 23rd, 2012

Hello!

Our clients often ask us how to read some data from a database and then write that data to a PDF document. Common use case is to get people names and a PDF template and create personalized documents for clients.

Let’s look how to do this with help of Docotic.Pdf. The demo application from this article will read names from database and then modify template PDF by putting the names on to the first page.

Read the rest of this entry »

Improved extraction of text and images, PDF rasterizer and other improvements in Docotic.Pdf 3.4

April 27th, 2012

Hello!

We released new version of Docotic.Pdf library.

The version 3.4 adds new major feature: PDF rasterizer. Now the library can be used to draw and print PDF documents. And of course you can save images of document pages in PNG and JPEG format. Take a look at PdfPage.Save and PdfPage.Draw methods. You might find new group of samples interesting too.

This version also features improved support for extraction of text and images. You can now extract text as collection of words (with their bounding rectangles) and even individual characters. We added new Extract text by words sample that demonstrates how to do this.

From now on the library might be used to extract page objects. I.e. you can get collection of text and image objects to perform sophisticated analysis of what’s drawn on a page. Take a look at Extract text and images sample to get an idea of what information could be retrieved.

The new version adds support for extraction of previously unsupported image types. You might also be interested in new ability to scale and resize existing images in PDF documents. This ability is useful for optimization of existing documents.

As with any release of Docotic.Pdf, we also fixed some bugs. This version fixes bugs related to opening of existing PDF documents and processing of fonts and images. We also made library to use less time and memory for opening of existing PDFs.

Read about all new features and improvements in Docotic.Pdf 3.4 in Version History document.

As always, we encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions or ask for help.

Support for XMP Metadata, improved text extraction and other improvements in Docotic.Pdf 3.3

December 26th, 2011

Hello!

I am pleased to announce that we’ve released new version of Docotic.Pdf library.

The version 3.3 brings support for XMP Metadata, improved text extraction and other improvements and bug fixes.

Starting from this version Docotic.Pdf can be used to read and modify documents metadata. You can edit any XMP schema and add new schemas. The library provides convenient way to access properties of well-known XMP schemas like Dublin Core or XMP Basic. There is also support for setting custom metadata values. Please take a look at PdfDocument.Metadata property and XmpMetadata class. Set custom metadata and Set XMP metadata samples can also might be useful to get an idea how to use the new feature.

The new version of the library fixes some bugs related to text extraction and brings new properties and methods that should give you even more control over the text extraction process. Please have a look at PdfTextData.Size and PdfTextData.Bounds properties and PdfDocument.GetText and PdfPage.GetText methods.

With help of the latest version of Docotic.Pdf you can easily reorder pages in PDF documents. We added new PdfDocument.MovePage, PdfDocument.MovePages and PdfDocument.SwapPages methods that can be used to change order of pages within document. We think that you might find them useful.

We also fixed some bugs related to opening of existing PDF documents, linearization and processing of TIFF images.

Read about all new features and improvements in Docotic.Pdf 3.3 in Version History document.

As always, we encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions or ask for help.

Improved accuracy and new text extraction mode in Docotic.Pdf 3.1

September 25th, 2011

Hi!

We released new version of Docotic.Pdf library.

The version 3.1 brings improved accuracy of all calculations, new text extraction mode and support for shared scrips. And, as always, the new version contains fixes for some bugs in the library.

Read about other new features and improvements in Docotic.Pdf 3.1 in Version History document.

As always, we encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions or ask for help.