Extract text and images from PDF

Extract text from PDFs

You can extract text from PDF files using Docotic.Pdf library.

Text can be extracted from a page at a time or from a whole document at once.

The library supports the extraction of plain and formatted text. Additionally, you can extract separate words, chars, or text chunks with their coordinates.

In case you need to perform a sophisticated analysis, there is also the ability to extract text, paths and image objects in one collection.

Extract images

The library can be used to extract images from PDF files as is or as painted.

Extracted images can be saved as TIFF and JPEG images.

The library does not recompress images while extracting them. You will get images with the same quality as in PDF.

You can get information about where on a page images are actually drawn.