Archive for the ‘PDF Library’ category

Docotic.Pdf 3.6. Two times faster and uses much less memory.

Hi!

We have released new version of the Docotic.Pdf library. Most of the changes in this version are likely to benefit every customer.

We greatly optimized the code of the library and now Docotic.Pdf takes two times less time and in many cases two times less memory to complete a task. We published separate post about results of our optimizations. You can read it here.

We added ability to check if PDF document is a PDF/A compliant one. Please have a look at Check if PDF document is a PDF/A compliant one sample.

We also added new compression option. It’s called PdfSaveOptions.OptimizeIndirectObjects property. This option is turned on by default and you can turn it off if you want files to be saved faster.

Another improvement is ability to specify resolution for images produced while drawing pages.

We also improved processing of TIFF images and extraction of text and images from PDF documents.

Read about all new features and improvements in Docotic.Pdf 3.6 in Version History document

We encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Posted in

Optimizations in Docotic.Pdf 3.6

Hello!

Our customers told us that Docotic.Pdf is not always behaves modestly. The library tends to consume large amounts of memory for large files and often spends much time doing some operations.

We’ve done a lot to make new version of Docotic.Pdf faster and less memory-consuming. Now I want to share some statistics about results of our efforts.

To see what we achieved, we took five PDF files and ran some tests on them. Here is the description of the files we took:

File name Page count File size Contents
emerging.pdf 6 94 KB only text
rdsolr1907.pdf 111 2.03 MB mostly text, some images, linearized
official_journal_10022006.pdf 705 20 MB mostly text, some images, linearized
LargePDFFile.pdf 4800 34 MB mostly text, some images, linearized
OReilly.Head.First.C.Sharp.Nov.2007.pdf 765 146 MB mostly scanned images

For a start, we measured how much time and how many memory required to just open a file. The table below contains relative results of our measurements:

Open only
File name Time, % Memory consumption, %
emerging.pdf -13 -51
rdsolr1907.pdf -44 -55
official_journal_10022006.pdf -87 -95
LargePDFFile.pdf -91 -83
OReilly.Head.First.C.Sharp.Nov.2007.pdf -31 -53

It’s nice to see that opening of PDF files is now about 2 times faster and takes about 3 times less memory (on average). And for larger files improvements are even more obvious.

But how the library behaves in more complex scenarios?

Next, we took the same files and measured time and memory required to open PDF and extract formatted text from all of its pages. Below are the results:

Open and extract all text with formatting
File name Time, % Memory consumption, %
emerging.pdf -10 -33
rdsolr1907.pdf -70 -26
official_journal_10022006.pdf -59 -39
LargePDFFile.pdf -66 -39
OReilly.Head.First.C.Sharp.Nov.2007.pdf -54 -31

And again the whole process took about two times less time (on average). Memory gains are less impressive this time but still, about 30% less memory (on average) is not bad at all.

The last one test is simple but represents a real-life scenario. We measured time and memory required to open PDF, then encrypt it with AES 128bit and then save. Below are the results:

Open, encrypt with AES 128bit and save
File name Time, % Memory consumption, %
emerging.pdf -17 -69
rdsolr1907.pdf -42 -9
official_journal_10022006.pdf -84 -70
LargePDFFile.pdf -69 -41
OReilly.Head.First.C.Sharp.Nov.2007.pdf -19 -69

In this case the whole process took about 2 times less time and memory (on average).

We think that such improvements won’t go unnoticed by our customers. And we want to say that we have some thoughts about how to further improve the library. So, we continue to profile and improve Docotic.Pdf.

Please feel free to share your thoughts about recent improvements.

Posted in

Docotic.Pdf 3.5. Even better than before.

Hello!

We have released new version of the Docotic.Pdf library. The new version brings improvements in text and image extraction as well as in other areas.

We added ability to check if an image painted transformed and ability to save such image as painted, i.e. taking into account rotation, scaling and other transformations that might be applied to the image. Another new feature is ability to save PDF pages as TIFF images.

The new version of Docotic.Pdf library adds support for PDF form XObjects. Such objects are often used for watermarks, backgrounds and repeatable objects. XObjects can be created from scratch or from existing pages (from current or other document). Using XObjects created from pages you can impose (combine) PDF pages onto larger (or same size) sheets to make books, booklets, or special arrangements. Please have a look at new samples that show how to create and use PDF form XObjects.

As with every release of the library, we also fixed some bugs. This version fixes bugs related to opening of existing PDF documents with forms, reading document permissions and processing of fonts and images.

Read about all new features and improvements in Docotic.Pdf 3.5 in Version History document

We encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions, suggest features or ask for help.

Posted in

Write data from a database to PDF in C# .NET

Hello!

Our clients often ask us how to read some data from a database and then write that data to a PDF document. Common use case is to get people names and a PDF template and create personalized documents for clients.

Let’s look how to do this with help of Docotic.Pdf. The demo application from this article will read names from database and then modify template PDF by putting the names on to the first page.

Read more

Posted in

Improved extraction of text and images, PDF rasterizer and other improvements in Docotic.Pdf 3.4

Hello!

We released new version of Docotic.Pdf library.

The version 3.4 adds new major feature: PDF rasterizer. Now the library can be used to draw and print PDF documents. And of course you can save images of document pages in PNG and JPEG format. Take a look at PdfPage.Save and PdfPage.Draw methods. You might find new group of samples interesting too.

This version also features improved support for extraction of text and images. You can now extract text as collection of words (with their bounding rectangles) and even individual characters. We added new Extract text by words sample that demonstrates how to do this.

From now on the library might be used to extract page objects. I.e. you can get collection of text and image objects to perform sophisticated analysis of what’s drawn on a page. Take a look at Extract text and images sample to get an idea of what information could be retrieved.

The new version adds support for extraction of previously unsupported image types. You might also be interested in new ability to scale and resize existing images in PDF documents. This ability is useful for optimization of existing documents.

As with any release of Docotic.Pdf, we also fixed some bugs. This version fixes bugs related to opening of existing PDF documents and processing of fonts and images. We also made library to use less time and memory for opening of existing PDFs.

Read about all new features and improvements in Docotic.Pdf 3.4 in Version History document.

As always, we encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions or ask for help.

Posted in

Support for XMP Metadata, improved text extraction and other improvements in Docotic.Pdf 3.3

Hello!

I am pleased to announce that we’ve released new version of Docotic.Pdf library.

The version 3.3 brings support for XMP Metadata, improved text extraction and other improvements and bug fixes.

Starting from this version Docotic.Pdf can be used to read and modify documents metadata. You can edit any XMP schema and add new schemas. The library provides convenient way to access properties of well-known XMP schemas like Dublin Core or XMP Basic. There is also support for setting custom metadata values. Please take a look at PdfDocument.Metadata property and XmpMetadata class. Set custom metadata and Set XMP metadata samples can also might be useful to get an idea how to use the new feature.

The new version of the library fixes some bugs related to text extraction and brings new properties and methods that should give you even more control over the text extraction process. Please have a look at PdfTextData.Size and PdfTextData.Bounds properties and PdfDocument.GetText and PdfPage.GetText methods.

With help of the latest version of Docotic.Pdf you can easily reorder pages in PDF documents. We added new PdfDocument.MovePage, PdfDocument.MovePages and PdfDocument.SwapPages methods that can be used to change order of pages within document. We think that you might find them useful.

We also fixed some bugs related to opening of existing PDF documents, linearization and processing of TIFF images.

Read about all new features and improvements in Docotic.Pdf 3.3 in Version History document.

As always, we encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions or ask for help.

Posted in

Improved accuracy and new text extraction mode in Docotic.Pdf 3.1

Hi!

We released new version of Docotic.Pdf library.

The version 3.1 brings improved accuracy of all calculations, new text extraction mode and support for shared scrips. And, as always, the new version contains fixes for some bugs in the library.

Read about other new features and improvements in Docotic.Pdf 3.1 in Version History document.

As always, we encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions or ask for help.

Posted in

Linearize PDF files, protect PDFs with AES encryption, optimize images in PDFs

Hello!

I am pleased to announce that we’ve just released new major version of Docotic.Pdf library.

The Docotic.Pdf 3.0 adds support for advanced password protection. Now you can not only open but also create PDF files encrypted with AES 128 bit algorithm. And we’ve also added PdfDocument.IsPasswordProtected method that is useful if you need to know is a PDF requires a password in order to be opened.

Another major new feature is the ability to linearize PDF files and check if a file is linearized. As you know, linearization (also called Fast Web View optimization) helps to produce PDF files optimized for viewing in browser. We are sure you’ll find this new feature useful.

Starting with Docotic.Pdf 3.0 you can optimize images before adding them to PDFs and even change images in existing PDF files. Please take a look at RecompressWith* family of methods in PdfImage class and new Add single image frame to PDF document sample.

The new version of the library introduces support for Save Options. Using this feature you can fine-tune how PDF documents will be saved by the library. You can reduce size of PDF files or on the contrary save files in a human readable form (it may be useful for debugging purposes).

There are other improvements and bug fixes of course. As always, we encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions or ask for help.

Posted in

Extract images from PDF with Docotic.Pdf 2.3

Hi!

We’ve just released new version of Docotic.Pdf library.

The version 2.3 adds support for image extraction. Now the library can be used to extract embedded images from PDF documents. Images are extracted without any modifications or recompressions so you’ll get the same data as used in a document.

Another new feature is ability to extract or copy pages from existing documents. With this feature it’s possible to split a PDF in pages or make a copy of a PDF with only those pages that are needed.

Read about other new features and improvements in Docotic.Pdf 2.3 in Version History document.

As always, we encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions or ask for help.

Posted in

Better PDF form handling with Docotic.Pdf 2.2

Hello!

We are happy to announce the release of a new version of Docotic.Pdf Library.

The version 2.2 adds support for Forms Data Format files (FDF files). Now you can use Docotic.Pdf Library to import a form data from an FDF file or export data from your form into an FDF file for farther processing. Also, the PDF library now creates better looking PDF forms because we fixed some bugs in code related to form creation.

Another improvement we are proud of is speed. We made some optimizations and now the library performs up to 40 percent faster.

Read about other improvements in Docotic.Pdf 2.2 in Version History document.

As always, we encourage you to download and try the new version.

Please write us about your findings with Docotic.Pdf using e-mail or via the support form. Don’t hesitate to write us your questions or ask for help.

Posted in