Edit PDF in C#

Use Docotic.Pdf to modify your PDF documents. It is the PDF editing library for C# and VB.NET that combines powerful features with an intuitive API.

Docotic.Pdf library 9.5.17573-dev Regression tests 14,726 passed Total NuGet downloads 4,765,230
PDF editor icon

Docotic.Pdf provides a lot of means to edit PDF documents. Here are some of the library's key features for PDF editing:

  • Combine multiple PDFs into one or split a single PDF into multiple files.
  • Reduce PDF file size.
  • Rearrange, delete, rotate, or extract pages.
  • Read, change, or delete PDF metadata.
  • Sign PDF with digital signatures.
  • Encrypt documents or remove passwords from a PDF.
  • Add, edit, or remove annotations and attachments.
  • Fill in AcroForms, add or remove form controls.
  • Flatten form fields and annotations to make them part of the PDF content.
  • Add, delete, or replace text within the PDF.
  • Insert, replace, and resize images.
  • Add watermarks and backgrounds.
  • Convert scanned documents into editable and searchable text.

In the next sections, I will describe the PDF editing features in more detail. The sections also contain code snippets and links to relevant resources.

Merge and split PDFs

This section is about two features with the opposite goals.

Split and merge PDF in C# and VB.NET

Combine PDF

When you consolidate PDF files, you create a single PDF document. The merged PDF usually contains related information from multiple existing PDF files.

Here is a code snippet that shows how to combine PDF files using Docotic.Pdf.

using var pdf = new PdfDocument("first.pdf");
pdf.Append("second.pdf");
pdf.Save("merged.pdf");

The code is very simple because it shows the most basic case. We have an article that describes more complex merge cases. For example, it shows how to combine encrypted documents.

Split PDF

Splitting means extracting selected pages from the original PDF to create one or more new PDF files. This process is useful when you want to share only a part of a document.

The following code snippet shows how to create a new document from each page of a PDF.

using var pdf = new PdfDocument("compound.pdf");
for (int i = 0; i < pdf.PageCount; ++i)
{
    using PdfDocument copy = pdf.CopyPages(i, 1);

    // Helps to reduce file size when the copied pages reference
    // unused resources such as fonts, images, patterns.
    copy.RemoveUnusedResources();
    copy.Save(i + ".pdf");
}

Read about other approaches to implement a PDF splitter in the dedicated article.

PDF compression

There are two main approaches when compressing a PDF. The first one is to only apply operations that do not change the contents of the file, only its form. The second approach also includes changes that may cause loss of detail or the quality of the document for better compression.

Lossless PDF compression

By default, the library saves PDF objects so that they occupy fewer bytes. For this, it excludes unused objects, writes objects without formatting, and uses a shorter form where possible.

To further improve the compression, Docotic.Pdf also produce objects streams in the output PDFs. This is another form of writing objects with the most compact representation. The object streams get compressed with the Flate algorithm.

You can affect the way the library saves objects using save options.

Some documents contain duplicate fonts, images, color profiles, and other objects. It is usually the case for incrementally updated documents and files created by merging several documents. Deduplication of these objects by using the PdfDocument.ReplaceDuplicateObjects method can drastically decrease the output size.

Pages of a document can reference unused resources. For example, images that were previously visible on the page but no longer are. Use the PdfDocument.RemoveUnusedResources method to remove such resources.

All these operations on PDF reduce file size without losing quality.

Lossy transformations

For files with images, one of the most effective ways to shrink PDF size is to change the compression scheme of the images. For example, using a lossy compression like JPEG on the images will reduce the size. As a side-effect, compression artifacts and loss of detail can be visible on the images.

If the images in the PDF are larger than needed, you can resize them. This can provide even better compression. Another option is to convert images to black and white (bitonal).

You can flatten form fields and annotations to save some space. It makes sense when you no longer need editable annotations and form fields in your documents.

Fonts can take a lot of space in documents. The PdfDocument.RemoveUnusedFontGlyphs method can remove unused glyphs from fonts to optimize output size. Font subsetting is another name for this process. You can even completely remove font bytes from the document (unembed font). This might make sense when the document contains bytes of a popular font like Arial.

Other transformations directly remove information from documents. You can remove metadata, structure information, and private application data. This data is not visible but remove it only if it's not important for users of your documents.

To know more detail and see some code examples, read the article on how to reduce PDF file size.

Reorder pages in PDF

There are enough reasons to change the order of pages within a PDF. For example, you might need to group related information together, or improve readability of the document by ensuring the document flows logically.

Rearrange pages in PDF

Besides the ability to merge and split pages, Docotic.Pdf provides an extensive set of other methods to rearrange pages in PDF. I will use C# code snippets and Docotic.Pdf API to show how to organize PDF pages.

You can find complete test projects for this section examples in the Pages and Navigation group of sample codes. I use ten-pages.pdf in the snippets. This is a trivial test document with a Page N title on each page.

Move PDF pages

The following snippet shows how to move continuous ranges of pages. The code moves the first half of pages to the end of the document.

using var pdf = new PdfDocument("ten-pages.pdf");

pdf.MovePages(0, 5, pdf.PageCount);

pdf.Save("continuous-move.pdf");

It is possible to move arbitrary sets of pages. The following code moves odd pages to the end of the document.

using var pdf = new PdfDocument("ten-pages.pdf");

int[] indexes = [0, 2, 4, 6, 8];
pdf.MovePages(indexes, pdf.PageCount);

pdf.Save("arbitrary-move.pdf");

To move only one page, use the PdfDocument.MovePage method.

Swap PDF pages

To exchange two pages, use the code like in the following snippet.

using var pdf = new PdfDocument("ten-pages.pdf");

pdf.SwapPages(9, 0);
pdf.SwapPages(8, 1);

pdf.Save("swapped.pdf");

The PdfDocument.SwapPages method accepts indexes of the two pages that should take each other's position. When the number of pages is not equal to two, use one of the move pages methods.

Add and insert pages

Any PdfDocument contains at least one page. When you create a new document, the library adds one page implicitly.

Here is how to insert a blank page in PDF using Docotic.Pdf API. You can insert pages at positions with indexes from 0 to PageCount inclusive.

using var pdf = new PdfDocument();

var newPage = pdf.InsertPage(0);
newPage.Canvas.DrawString("This is the new first page");

pdf.Save("two-pages.pdf");

To add a blank page to PDF, use the PdfDocument.AddPage method. The method adds a new page to the end of the document. It does the same as a pdf.InsertPage(pdf.PageCount) call.

To add or insert pages from another document, use a combination of calls like described in the Prepend PDF section. You may use the combination of calls to add a cover page to a PDF.

Duplicate PDF pages

With the library API, you can perform the duplicate operation as two consecutive operations. The first one is the copy pages operation. See the code example in the Split PDF section.

The second is the paste pages operation. For this operation, use the PdfDocument.Append method. Then move the appended pages into the required position.

Extract pages

The following snippet shows how to extract pages from a PDF. It extracts the first three pages from the source document.

using var pdf = new PdfDocument("ten-pages.pdf");

using (var extracted = pdf.ExtractPages(0, 3))
{
    extracted.RemoveUnusedResources();
    extracted.Save("three-pages.pdf");
}

pdf.Save("seven-pages.pdf");

The ExtractPages method removes pages from the document. Because of this, only seven pages remain in the source document. The library does not allow to extract pages from a document that contains only one page.

We recommend removing unused resources from the document with the extracted pages.

Remove page from PDF

Check out the snippet that shows how to delete a page in a PDF document. It actually deletes two pages using different overloads of the RemovePage method. The first overload accepts a page index as its parameter. The second overload accepts a page object.

using var pdf = new PdfDocument("ten-pages.pdf");
pdf.RemovePage(0);
pdf.RemovePage(pdf.Pages[0]);
pdf.Save("without-first-two-pages.pdf");

To remove more than one page at a time, use the PdfDocument.RemovePages method. That method overloads work with arrays of either page indexes or page objects.

Digital signatures

Docotic.Pdf implements many operations for digital signatures in PDF and can help you to maintaining the trustworthiness and legal validity of your PDF documents. Here are some examples of what the library can do:

Sign PDF with certificate. Adding digital signatures in PDF is the way to confirm the identity of the signer and ensure nobody altered the document after signing.

Certify signature. To add additional restrictions to a PDF, you can sign it with a certification signature. You can completely lock PDF after signing or allow a few types of changes.

Verify signature in PDF. Check the validity of a digital signature to confirm the document's signed part didn't change.

Allow multiple signatures. Contracts, agreements, and forms often require multiple parties to sign a single document. Adding multiple signatures to PDF requires the document to be saved incrementally.

Embed signature timestamp. It is possible to specify a Timestamping Authority URL and, optionally, its credentials in signing options. The library will embed the received timestamp in the signature.

Embed certificate. The library automatically embeds signing certificates in digital signatures.

The Digital Signatures page contains sample codes and more information about each operation.

Protect PDF

There are three features that you can use to ensure PDF security. Docotic.Pdf can work with them in both directions: the library can protect PDF and unlock a secured PDF.

PDF security

Password protection

This feature allows you to set a password to restrict access to the PDF. Depending on the password type, the PDF will require the correct password to open or modify the document.

There are two types of passwords in PDFs:

  • Open password (user password). This type of password is required to open and view the PDF. Without the correct password, a conforming PDF viewer will not open the document.
  • Permissions password (owner password). This password is required to remove permissions from PDF. Opening a PDF with the owner password allows all actions, even if permissions restrict certain actions, such as printing, copying, or editing the PDF.

You can set both passwords for the same PDF document. Read about decrypting PDF files to know how to remove passwords from PDF documents.

Encryption

PDF encryption and PDF passwords work together to ensure that sensitive information within the PDF is only available to legitimate users. Only users with the correct decryption key or password can view the contents.

Docotic.Pdf can encrypt PDF files using RC4 40-bit, RC4 128-bit, AES 128-bit, and AES 256-bit encryption algorithms.

Permissions and restrictions

You can set various permissions on a PDF, such as restricting printing, copying text, editing the document, and more. The permissions only affect the experience when someone opens the PDF with the user password. Any restrictions do not apply for those who open the PDF with the owner password.

To remove permissions from a PDF, you would need to remove the PDF security password first. Read about how to do this using C# and Docotic.Pdf API.

To know how to ensure PDF integrity in addition to PDF security, read the section about digital signatures.

Metadata in PDF

PDF metadata is information embedded within a PDF file that provides details about the document. There are two primary sources of metadata: PDF document properties and XMP metadata.

Document properties are also commonly referred to as document information dictionary, file info, metadata fields, document attributes, and file attributes.

XMP (Extensible Metadata Platform) metadata is basically an XML file embedded in a PDF. XMP uses a flexible data model that can store any set of metadata properties. This metadata uses namespaces to group related properties. Some common namespaces include XMP Core/XMP Basic and Dublin Core.

Docotic.Pdf fully supports both XMP metadata and document properties. Please note that the PDF 2.0 standard marked most of the document information dictionary properties as deprecated. The only exceptions are creation date and modification date.

You can find complete test projects for this section examples in the Metadata group of sample codes.

Document properties

See how to edit the document properties with Docotic.Pdf.

using var pdf = new PdfDocument("file.pdf");
pdf.Info.Author = "An example code";
pdf.Info.Subject = "Showing how to access and change document metadata";
pdf.Info.Title = "Custom title goes here";
pdf.Info.Keywords = "pdf Docotic.Pdf";

pdf.Save("updated-file.pdf");

You can change value for each property, but please note that by default the library automatically updates some properties before it saves the PDF. You can change this in save options.

To remove all metadata specified in the document properties, use the PdfInfo.Clear method. The method can remove only custom properties, if you like.

XMP metadata

This snippet shows how to change properties of the XMP metadata in a PDF document.

using var pdf = new PdfDocument("file.pdf");

pdf.Metadata.DublinCore.Creators = new XmpArray(XmpArrayType.Ordered);
pdf.Metadata.DublinCore.Creators.Values.Add(new XmpString("me"));
pdf.Metadata.DublinCore.Creators.Values.Add(new XmpString("Docotic.Pdf"));
pdf.Metadata.DublinCore.Format = new XmpString("application/pdf");

pdf.Metadata.Pdf.Producer = new XmpString("me too!");

pdf.Save("updated-file.pdf");

The code changes properties in Dublin Core and Adobe PDF schemas. Please note that the Producer property gets overwritten because of the default save options.

You can extract raw XMP metadata using one of the XmpMetatada.Save methods. The method will produce an XML with all the properties.

To remove all XMP metadata from a document, use the XmpMetadata.Unembed method.

Sync metadata

It is desirable to make sure both XMP metadata and Document Info properties have the same values for the corresponding properties. It is especially true if you edit both sources of metadata in the same file.

Use PdfDocument.SyncMetadata method to synchronize values in XMP Metadata and Document Info. When a property changed in both sources, the method will overwrite one source with the value from the other source. Read the documentation for the method for more detail.

PDF bookmarks

PDF document can contain special shortcuts or links that help readers navigate to specific sections or pages quickly. PDF outline is another name for bookmarks.

PDF outline

Viewer apps usually display bookmarks like the table of contents in a book, but interactive. When the reader clicks on a bookmark, the viewer app jumps to the designated part of the document. A similar behaviour is possible to achieve using link annotations.

Here is a C# code snippet that shows how to add bookmarks to PDF:

using var pdf = new PdfDocument("ten-pages.pdf");

var root = pdf.OutlineRoot;
root.AddChild("Fifth page", 4);
root.AddChild("Seventh page", pdf.Pages[6]);

pdf.PageMode = PdfPageMode.UseOutlines;
pdf.Save("simple-bookmarks.pdf");

PDF outline can have main bookmarks and sub-bookmarks, making it easier to structure large documents. Here is how to create sub-bookmarks in PDF:

using var pdf = new PdfDocument("ten-pages.pdf");

var root = pdf.OutlineRoot;
var evenPages = root.AddChild("Even pages");

evenPages.AddChild("Second page", 1);
evenPages.AddChild("Fourth page", 3);

pdf.PageMode = PdfPageMode.UseOutlines;
pdf.Save("even-pages-bookmarks.pdf");

You can apply fonts and colors to bookmark items. Check out the complete example for creating an outline with styles.

To remove a bookmark from PDF, use the RemoveChild or RemoveChildAt methods. You can remove all bookmarks by calling the RemoveAllChildren method on the root node.

File attachments

PDF attachments are external files embedded within a PDF document. People also commonly refer to these files as embedded files and file attachments. You can attach any file: image, audio/video file, another PDF, Word document, Excel spreadsheets or anything else.

If you want to attach PDFs together, creating a combined PDF file, check the article about merging pdf documents.

Here is the C# code that shows how to add attachment to PDF with the help of Docotic.Pdf API.

using var pdf = new PdfDocument();

var excelFile = pdf.CreateFileAttachment("this-year-figures.xlsx");
pdf.SharedAttachments.Add(excelFile);

pdf.Save("shared-attachment.pdf");

The above code added the file as a shared attachment. Readers can find the attached file in the Attachments panel of their viewer.

It is also possible to add attachments to PDF pages. Such attachments are visible inside the page contents like any other annotations.

using var pdf = new PdfDocument();

var page = pdf.Pages[0];
page.Canvas.DrawString(20, 100, "Here is this year's figures document:");

var bounds = PdfRectangle.FromLTRB(155, 100, 165, 110);
var excelFile = pdf.CreateFileAttachment("this-year-figures.xlsx");
pdf.Pages[0].AddFileAnnotation(bounds, excelFile);

pdf.Save("page-attachment.pdf");

Check the Attachments group of sample codes to find complete test projects for this section's examples.

To remove attachments from PDF, you would need to enumerate both shared attachments and page annotations and remove the items you do not need. See the example for the enumerating code below. To remove all shared annotations, you can use a pdf.SharedAttachments.Clear() call.

You would also need to enumerate collections to extract embedded files from PDF. Here is an example code:

using var pdf = new PdfDocument("file-with-attachments.pdf");

int i = 0;
foreach (var attachment in pdf.SharedAttachments)
{
    if (attachment?.Contents == null)
        continue;

    var fileName = attachment.Specification ?? $"attachment{i++}";
    attachment.Contents.Save(fileName);
}

foreach (var widget in pdf.GetWidgets())
{
    var attachment = (widget as PdfFileAttachmentAnnotation)?.File;
    if (attachment?.Contents == null)
        continue;

    var fileName = attachment.Specification ?? $"attachment{i++}";
    attachment.Contents.Save(fileName);
}

Page labels

PDF page labels are custom names or numbers assigned to pages in a PDF document. Unlike standard page numbers, page labels can include a mix of letters, numbers, and even Roman numerals. Other names for page labels are page identifiers and page names.

Here is how to add page labels to PDF using Docotic.Pdf:

using var pdf = new PdfDocument("ten-pages.pdf");

pdf.PageLabels.AddRange(0, 3, PdfPageNumberingStyle.LowercaseRoman);
pdf.PageLabels.AddRange(4, PdfPageNumberingStyle.DecimalArabic, string.Empty, 5);
pdf.PageLabels.AddRange(7, PdfPageNumberingStyle.DecimalArabic, "Appendix page ", 1);

pdf.Save("page-labels.pdf");

The first four pages will have labels i, ii, iii, and iv. The next three labels are 5, 6, and 7. For the remaining pages, labels will be Appendix page 1, Appendix page 2, and Appendix page 3.

OCR PDF

Some PDF documents contain scanned pages and require optical character recognition (OCR) before you can extract text from them. Another use case for OCR is to extract text from a PDF that uses custom glyph to Unicode mapping.

OCR scanned PDF to extract text

We have a blog post that shows how to OCR scanned documents. The post contains a non-searchable PDF example and shows how to use Tesseract OCR, C# code and Docotic.Pdf to recognize text in image-only PDFs. You can also add an OCR text layer to scanned PDF files with the help of Docotic.Pdf.

Edit pages

This section talks about changes to existing PDF pages, like:

  • how to rotate PDF pages
  • how to change page size
  • using vector graphics on page canvas
  • adding HTML content

Read about Layout API of the library to know how to create PDF documents from building blocks like header and footer, tables, images, paragraphs of text and the like.

Check out the other sections for information about:

Rotate pages

See the C# code snippet for how to rotate only one page in PDF:

using var pdf = new PdfDocument("existing.pdf");

pdf.Pages[0].Rotation = PdfRotation.Rotate180;

pdf.Save("rotated.pdf");

The code rotates the first page by 180 degrees. You can rotate PDF pages by 0, 90, and 270 degree too.

Change page size

Docotic.Pdf provides more than one way to change page size of PDF. In the simplest case, you can use Width and Height properties of a PdfPage object to specify the desired size. For an existing document, it won't resize document pages content. And it won't remove any content. It will just hide all the page content that is outside the rectangle of the specified size.

A similar approach is to crop pages. You can change CropBox of a page using C# code like this:

using var pdf = new PdfDocument("existing.pdf");

var page = pdf.Pages[0];
var cropBoxBefore = page.CropBox;
page.CropBox = new PdfBox(0, cropBoxBefore.Height - 256, 256, cropBoxBefore.Height);

pdf.Save("cropped.pdf");

Changing crop box is the way to go if you would like to save a part of the page as an image.

If the goal is to keep all the contents visible on a page of the different size, then use the scaling approach. In the following code snippet, I create a XObject from a page. The XObject is like a vector image. You can draw the same object on multiple pages scaling and rotating it as needed.

After the XObject is ready, I clear the previous page content, resize the page, and then draw the object on the resized page.

using var pdf = new PdfDocument("existing.pdf");

var page = pdf.Pages[0];
var pageXObject = pdf.CreateXObject(page);

page.Canvas.Clear();
page.Width /= 2;
page.Height /= 2;

page.Canvas.DrawXObject(pageXObject, 0, 0, page.Width, page.Height, 0);

pdf.Save("resized.pdf");

Vector graphics

Docotic.Pdf library can add vector graphics like lines, curves, and shapes to PDF documents. You can construct graphics paths from graphics objects. Then you can fill or stroke the paths using colors from different color spaces.

Find example code for graphics-related features in the Graphics group of sample codes.

It is also possible to extract graphics from PDF. Start from calling the GetObjects method and then extract information from objects of PdfPageObjectType.Path type. Don't forget that XObjects can also contain nested paths.

using var pdf = new PdfDocument("existing.pdf");

var options = new PdfObjectExtractionOptions();
var objects = pdf.Pages[0].GetObjects(options);
foreach (var obj in objects)
{
    if (obj.Type == PdfPageObjectType.Path)
    {
        var path = (PdfPath)obj;
        Console.WriteLine($"Found path {path}");
    }
    else if (obj.Type == PdfPageObjectType.XObject)
    {
        var paintedXObject = (PdfPaintedXObject)obj;
        var nestedObjects = paintedXObject.XObject.GetObjects(options);
        // ...
    }
}

Add HTML to PDF pages

Overlaying HTML content onto a PDF document can be useful for adding dynamic elements like charts or stock price ticker to your PDFs.

Read about how to insert HTML in PDF to get more detail and download an example code.

Edit PDF text

This section is about how to edit the text in a PDF, how to change text color in PDF, and how to add new text.

PDF editor changes text

We have an article dedicated to how to extract text from a PDF. Check it out for more information on the topic.

Text flattening is also possible with the help of Docotic.Pdf.

Find and replace

To modify text in PDF, you would need to find the area that contains the text, then remove the text in the area. The last step is to add the new text to the same area of the document.

Searching PDFs can be tricky because internally the document can contain words in any order. The text can also be rotated. Luckily, we have a sample code that shows how to search a PDF for words or phrases.

When you have coordinates of the text to remove, it is time to edit the containing page contents. The library provides means to enumerate and copy page objects. So it is possible to omit some text while copying objects. This will essentially remove the text. The code of the edit PDF page content example shows all the details of the process. You would need to update the ShouldRemoveText method to use the found coordinates.

Read the next section to see how to add the new text to the document.

If you create documents with a placeholder text and later replace the placeholder with some other text, then you can use text boxes instead.

The idea is to add a read-only text box without borders to the document and put the placeholder text in it. Later you can open the document, find the text box by its name and replace the placeholder with a simple call box.Text = "new text";. Flatten the text box after the replacement if you don't want any further changes.

Add new text

To add some text to documents, use DrawString and DrawText methods of a PdfCanvas object. The methods use the current canvas font. The font must contain glyphs for all characters in the text. Use the PdfFont.ContainsGlyphsForText method to check if the font meets this requirement.

var canvas = pdf.Pages[0].Canvas;
canvas.Font = pdf.AddFont("NSimSun")
    ?? throw new ArgumentException("Font not found");

canvas.DrawString(10, 50, "Olá. 你好. Hello. This is some new text");

You can add Unicode text drawn with Type1, TrueType, and OpenType fonts. The library can use fonts installed on your system, 14 built-in Type1 fonts, or load a required font from a file.

Change text color

To change color of text in PDF, use the same approach as with removing text. You would need to change at least the ReplaceColor method in the sample code.

Images

Docotic.Pdf provides everything required to edit PDF images. Below are C# code snippets for the most popular operations.

The Images group of sample codes contains complete test projects for examples in this section.

Add image to PDF

The library can import images in GIF/TIFF/PNG/BMP/JPEG formats. You can also add an image from a System.Drawing.Image object.

var canvas = pdf.Pages[0].Canvas;
var image = pdf.AddImage("image.jpg")
    ?? throw new ArgumentException("Cannot add image");

canvas.DrawImage(image, 10, 50);

You can specify a rotation angle and an output size using overloads of the DrawImage method. To draw the same image on multiple pages, add the image once and use the same PdfImage object in multiple calls to the DrawImage method.

Combine images into PDF

Here is the C# code that shows how to combine multiple images into one PDF.

using var pdf = new PdfDocument();

var imagePaths = new string[] { "image.jpg", "another-image.png" };
foreach (var path in imagePaths)
{
    var image = pdf.AddImage(path)
        ?? throw new ArgumentException("Cannot add image");

    var page = pdf.AddPage();
    page.Width = image.Width;
    page.Height = image.Height;

    page.Canvas.DrawImage(image, 0, 0);
}

pdf.RemovePage(0);
pdf.Save("combined-images.pdf");

The code adds multiple images to PDF, changing each page size to match the corresponding image size. Before saving the result, the code removes the first implicitly added empty page.

Extract PDF images

We designed Docotic.Pdf for extracting images from PDF files without compromising the quality of the images. The library does not change images size or compression. You will get images of the same quality as in the PDF.

using var pdf = new PdfDocument("file-with-images.pdf");
int i = 0;
foreach (PdfImage image in pdf.GetImages())
{
    var path = image.Save($"image{i++}");
    Console.WriteLine($"Saved to {path}");
}

Remove and replace images

Use the PdfPage.RemovePaintedImages method to remove all or specific images from a PDF page. You can filter images by position, size, transformation, or other parameters.

using var pdf = new PdfDocument("file-with-images.pdf");
pdf.Pages[0].RemovePaintedImages(
    image =>
    {
        return image.Size.Width > 100;
    }
);

pdf.RemoveUnusedResources();
pdf.Save("no-wide-images.pdf");

The above C# code shows how to remove images with the help of Docotic.Pdf. I recommend removing unused resources after you changed or removed images.

Use the PdfImage.ReplaceWith method to replace all occurrences of the image within the PDF document.

using var pdf = new PdfDocument("file-with-images.pdf");
var firstImage = pdf.GetImages(false).FirstOrDefault()
    ?? throw new ArgumentException("No images found");

firstImage.ReplaceWith("another-image.png");

pdf.RemoveUnusedResources();
pdf.Save("replaced-image.pdf");

Change compression scheme

Docotic.Pdf provides methods for changing compression of PDF images. It is possible to repack the images using JPEG, CCITT Group 3 and 4 (fax), JPEG 2000, and zip/deflate compression algorithms.

Depending on the initial and the new compression, the change can cause loss of detail or the quality of the image. But lossy conversions usually help to reduce document size.

firstImage.RecompressWithJpeg2000(25);

There are other methods to repack an image. Check the PdfImage methods with names that start with RecompressWith. You can remove any compression from an image using the Uncompress method.

Resize images

If some images in a PDF document are larger than needed, the library can resize or downscale them for you.

firstImage.Scale(0.5, PdfImageCompression.Jpeg2000, 25);

The above code makes the first image two times smaller in both directions. The library uses JPEG 2000 compression for the resulting image.

You can use one of the ResizeTo methods to specify exact values for the resulting width and height.

Resizing images usually reduces PDF file size even more than changing their compression (see the section above), but it is a lossy process.

Watermarks & backgrounds

Watermark on PDF

Watermarking PDF involves these steps:

  • Create a XObject, the container for watermark contents
  • Fill the object with text, images, and vector graphics
  • Stamp PDF pages with the object

Here is the C# code that adds the Confidential watermark to PDF:

using var pdf = new PdfDocument("existing.pdf");

var watermark = pdf.CreateXObject();
watermark.DrawOnBackground = true;

var canvas = watermark.Canvas;
canvas.FontSize = 72;
canvas.Brush.Color = new PdfRgbColor(222, 35, 35);
canvas.Brush.Opacity = 45;
canvas.Pen.Color = canvas.Brush.Color;
canvas.Pen.Opacity = canvas.Brush.Opacity;
canvas.Pen.Width = 5;

var padding = 10;
var text = "CONFIDENTIAL";
canvas.DrawString(padding, padding, text);

var textSize = canvas.MeasureText(text);
var watermarkRect = new PdfRectangle(
    padding, padding, textSize.Width, textSize.Height);
canvas.DrawRoundedRectangle(watermarkRect, new PdfSize(padding, padding));

foreach (var page in pdf.Pages)
{
    page.Canvas.DrawXObject(
        watermark,
        (page.Width - watermarkRect.Width) / 2,
        (page.Height - watermarkRect.Height) / 2);
}

pdf.Save("watermarked.pdf");

The code sets the brush and pen properties of the watermark canvas. The brush is used to paint the text. To find out the text size, the code measures the text. Then it draws a rectangle with rounded corners around the text. The pen is used to stroke the rectangle.

After the watermark content is ready, the code draws it in the center of each page.

PDF backgrounds are very similar to watermarks. At least you can create them in almost the same way. To add a background to PDF, do the same as in the above code, but add watermark.DrawOnBackground = true; after the CreateXObject call. Please note that opaque content like images can obscure the background.

Annotations

Docotic.Pdf provides a rich API for annotations in PDF. You can create, edit, and remove annotations from PDF documents. It is also possible to flatten annotations.

To annotate a text, there are:

  • Sticky notes or text annotations. See the AddTextAnnotation method of the PdfPage class.
  • Highlights. See the AddHighlightAnnotation method.
  • Strikethroughs. See the AddStrikeoutAnnotation method.
  • Underlines. See the AddJaggedUnderlineAnnotation and AddUnderlineAnnotation methods.

Use links to jump from one page to another or to an external resource. You can use ink annotations for freehand drawing on a PDF page. There are redaction annotations for parts that are designated to be removed from the document. You can also embed audio, video, or 3D content.

Highlight text

Here is how to highlight text in PDF documents:

using var pdf = new PdfDocument();

var page = pdf.Pages[0];
var canvas = page.Canvas;
canvas.FontSize = 30;

var text = "Highlighted text.";
var position = new PdfPoint(10, 50);
canvas.DrawString(position, text);
canvas.DrawString(" Not highlighted.");

var size = canvas.MeasureText(text);
var bounds = new PdfRectangle(position, size);

var color = new PdfRgbColor(145, 209, 227);
var annotationText = "Please pay attention to this part.";
page.AddHighlightAnnotation(annotationText, bounds, color);

pdf.Save("highlighted.pdf");

To link to a specific page in PDF, use a code like this:

using var pdf = new PdfDocument();
var secondPage = pdf.AddPage();
secondPage.Canvas.DrawString(10, 50, "Welcome to the second page.");

var firstPage = pdf.Pages[0];
var canvas = firstPage.Canvas;
var linkRect = new PdfRectangle(10, 50, 100, 60);
canvas.DrawRectangle(linkRect, PdfDrawMode.Stroke);

var options = new PdfTextDrawingOptions(linkRect)
{
    HorizontalAlignment = PdfTextAlign.Center,
    VerticalAlignment = PdfVerticalAlign.Center
};
canvas.DrawText("Go to 2nd page", options);

firstPage.AddLinkToPage(linkRect, 1);

pdf.Save("linked.pdf");

In the code, the action area annotation works as an internal hyperlink. Such areas can navigate to external resources and perform a non-navigational actions, too.

Remove annotations

To remove annotations from PDF:

  1. Access the collection of widgets using the PdfPage.Widgets property or the PdfDocument.GetWidgets method.
  2. Check the type, properties, or otherwise decide which annotations you no longer need.
  3. Remove the annotation by using the PdfDocument.RemoveWidget method or methods of the PdfWidgetCollection object.

To remove attachments from PDF, you would need to remove both file annotations and shared attachments.

Redact PDF

As a PDF redaction library, Docotic.Pdf offers methods to permanently remove or quickly black out sensitive information from your PDF documents.

Blacking out information on a PDF

Redact text

Here is how to black out text in PDF without Redact tool, using only C# and Docotic.Pdf.

int i = 0;
foreach (var page in pdf.Pages)
{
    foreach (var word in page.GetWords())
    {
        if (i % 3 == 0)
        {
            page.Canvas.AppendRectangle(word.Bounds);
            page.Canvas.FillPath(PdfFillMode.Winding);
        }

        i++;
    }
}

The code draws a black rectangle over each third word in a document. Please note that the text behind the rectangles remains in the document and it is possible to extract it later. To permanently remove the text, use the approach from the section about replacing text.

Redact images

You can use black rectangles to cover images, too. But an easier approach would be to replace the image with a black 1 by 1 pixel image. This will not only visually highlight the redacted image, but will also remove the original image data.

Check the section about removing and replacing images for code examples. I also recommend calling the PdfDocument.ReplaceDuplicateObjects method after the replacement.

PDF forms

Docotic.Pdf can create Acroforms (this is another name for PDF forms) using all kinds of interactive elements like buttons, checkboxes, drop-down lists, list boxes, radio buttons, and text fields.

It usually takes only a few lines of code to add and set up a form field. For example, you can add editable fields to PDF by simply calling the PdfPage.AddTextBox method. The sample codes in the Forms and Annotations group provide more information about creating and using forms.

How to fill in a PDF form

Use the PdfDocument.GetControl method to find a PDF control by its full or partial name. An alternative is to enumerate document controls using the GetControls method. In either case, you would need to cast the control to the expected field type.

using var pdf = new PdfDocument(@"example-form.pdf");

if (pdf.GetControl("txt-name") is PdfTextBox nameTextBox)
    nameTextBox.Text = "Bit Miracle team";

if (pdf.GetControl("txt-email") is PdfTextBox emailTextBox)
    emailTextBox.Text = "support@bitmiracle.com";

if (pdf.GetControl("check-agree") is PdfCheckBox agreeCheckBox)
    agreeCheckBox.Checked = true;

pdf.Save("filled-form.pdf");

The code uses this PDF form example. In the code, I set values for the two text fields and tick the checkbox.

When you finished filling in a form, you can flatten all its fields.

Using JavaScript in forms

You can add actions to control events. The PdfControl class provides access to a pre-defined set of events. The names of the events start with "On" (e.g., OnMouseDown).

Here is an example of using JavaScript for PDF forms:

using var pdf = new PdfDocument(@"example-form.pdf");
foreach (var field in pdf.GetControls())
    field.OnChange = pdf.CreateJavaScriptAction($"app.alert('{field.Name} changed!',3)");

pdf.Save("javascript-events.pdf");

Forms Data Format

There is one more way to electronically fill out PDF. Use the FDF to PDF feature of the library to auto populate PDF form from database or another source.

using var pdf = new PdfDocument(@"example-form.pdf");
pdf.ImportFdf("form-data.fdf");
pdf.Save("auto-populated.pdf");

The code uses this FDF file to fill all form fields at once.

Flatten PDF

This section is about how to flatten a PDF.

PDF flattening

When you are flattening a PDF, you converting interactive elements like forms and annotations into static content to prevent further editing. A flattened PDF can take significantly fewer bytes while looking the same.

Flatten forms and annotations

To flatten a fillable PDF, use the PdfDocument.FlattenControls method. This method draws all form fields and other controls on its parent page, removing the source control from the document.

When you flatten a PDF form, it makes sense to flatten annotations, too. Use the PdfDocument.FlattenWidgets method to flatten both controls and annotations at the same time.

If you only want to convert some controls and/or annotations to their visual representation, then use the PdfWidget.Flatten method. You would need to find the required control or annotation first.

Flatten text

You can convert PDF text to outlines with the help of Docotic.Pdf. The usual reason for this is to achieve font independence. Regardless of whether the fonts are installed, the flattened text will look the same on any device.

However, once you convert text to outlines, you can no longer edit it as text. Also, during the flattening process, the library converts the text to vector graphics. This can increase the files size.

To flatten PDF text, you would need to extract the text as vector paths and copy it on a new or the same page. There is a sample code for this.

Save options

In the code snippets above, I used the PdfDocument.Save method without additional arguments. The library uses the default save options in such cases. We handpicked the defaults so that in usual cases they work perfect.

Still, there are cases when you need to override the default options. For this, create a PdfSaveOptions object, set up the options, and provide them to one of the save methods. Further, I will describe that cases.

To protect PDF with a password or a certificate, create an encryption handler and set it to the EncryptionHandler property.

When you want to sign the same PDF multiple times, turn on the incremental updates mode on by setting the WriteIncrementally property to true. Do the same when you are saving a previously signed file with new annotations or form data.

Set the Linearize property to true to produce a linearized (or a Fast Web View optimized) PDF file. Viewers that recognize this optimization can display such files faster.

To prevent save-time changes to some of the metadata fields, set the UpdateProducer and UpdateModifiedDate properties to false.