Edit PDF in C#
Use Docotic.Pdf to modify your PDF documents. It is the PDF editing library for C# and VB.NET that combines powerful features with an intuitive API.
9.5.17573-dev 14,726 passed Total NuGet downloads 4,765,230Docotic.Pdf provides a lot of means to edit PDF documents. Here are some of the library's key features for PDF editing:
- Combine multiple PDFs into one or split a single PDF into multiple files.
- Reduce PDF file size.
- Rearrange, delete, rotate, or extract pages.
- Read, change, or delete PDF metadata.
- Sign PDF with digital signatures.
- Encrypt documents or remove passwords from a PDF.
- Add, edit, or remove annotations and attachments.
- Fill in AcroForms, add or remove form controls.
- Flatten form fields and annotations to make them part of the PDF content.
- Add, delete, or replace text within the PDF.
- Insert, replace, and resize images.
- Add watermarks and backgrounds.
- Convert scanned documents into editable and searchable text.
In the next sections, I will describe the PDF editing features in more detail. The sections also contain code snippets and links to relevant resources.
Merge and split PDFs
This section is about two features with the opposite goals.
Combine PDF
When you consolidate PDF files, you create a single PDF document. The merged PDF usually contains related information from multiple existing PDF files.
Here is a code snippet that shows how to combine PDF files using Docotic.Pdf.
using var pdf = new PdfDocument("first.pdf");
pdf.Append("second.pdf");
pdf.Save("merged.pdf");
The code is very simple because it shows the most basic case. We have an article that describes more complex merge cases. For example, it shows how to combine encrypted documents.
Split PDF
Splitting means extracting selected pages from the original PDF to create one or more new PDF files. This process is useful when you want to share only a part of a document.
The following code snippet shows how to create a new document from each page of a PDF.
using var pdf = new PdfDocument("compound.pdf");
for (int i = 0; i < pdf.PageCount; ++i)
{
using PdfDocument copy = pdf.CopyPages(i, 1);
// Helps to reduce file size when the copied pages reference
// unused resources such as fonts, images, patterns.
copy.RemoveUnusedResources();
copy.Save(i + ".pdf");
}
Read about other approaches to implement a PDF splitter in the dedicated article.
PDF compression
There are two main approaches when compressing a PDF. The first one is to only apply operations that do not change the contents of the file, only its form. The second approach also includes changes that may cause loss of detail or the quality of the document for better compression.
Lossless PDF compression
By default, the library saves PDF objects so that they occupy fewer bytes. For this, it excludes unused objects, writes objects without formatting, and uses a shorter form where possible.
To further improve the compression, Docotic.Pdf also produce objects streams in the output PDFs. This is another form of writing objects with the most compact representation. The object streams get compressed with the Flate algorithm.
You can affect the way the library saves objects using save options.
Some documents contain duplicate fonts, images, color profiles, and other objects. It is usually
the case for incrementally updated documents and files created by merging several documents.
Deduplication of these objects by using the PdfDocument.ReplaceDuplicateObjects
method can
drastically decrease the output size.
Pages of a document can reference unused resources. For example, images that were previously
visible on the page but no longer are. Use the PdfDocument.RemoveUnusedResources
method to remove
such resources.
All these operations on PDF reduce file size without losing quality.
Lossy transformations
For files with images, one of the most effective ways to shrink PDF size is to change the compression scheme of the images. For example, using a lossy compression like JPEG on the images will reduce the size. As a side-effect, compression artifacts and loss of detail can be visible on the images.
If the images in the PDF are larger than needed, you can resize them. This can provide even better compression. Another option is to convert images to black and white (bitonal).
You can flatten form fields and annotations to save some space. It makes sense when you no longer need editable annotations and form fields in your documents.
Fonts can take a lot of space in documents. The PdfDocument.RemoveUnusedFontGlyphs
method can
remove unused glyphs from fonts to optimize output size. Font subsetting is another name for this
process. You can even completely remove font bytes from the document (unembed font). This might
make sense when the document contains bytes of a popular font like Arial.
Other transformations directly remove information from documents. You can remove metadata, structure information, and private application data. This data is not visible but remove it only if it's not important for users of your documents.
To know more detail and see some code examples, read the article on how to reduce PDF file size.
Reorder pages in PDF
There are enough reasons to change the order of pages within a PDF. For example, you might need to group related information together, or improve readability of the document by ensuring the document flows logically.
Besides the ability to merge and split pages, Docotic.Pdf provides an extensive set of other methods to rearrange pages in PDF. I will use C# code snippets and Docotic.Pdf API to show how to organize PDF pages.
You can find complete test projects for this section examples in the Pages and
Navigation group of sample codes. I use
ten-pages.pdf in the snippets. This is a trivial test document
with a Page N
title on each page.
Move PDF pages
The following snippet shows how to move continuous ranges of pages. The code moves the first half of pages to the end of the document.
using var pdf = new PdfDocument("ten-pages.pdf");
pdf.MovePages(0, 5, pdf.PageCount);
pdf.Save("continuous-move.pdf");
It is possible to move arbitrary sets of pages. The following code moves odd pages to the end of the document.
using var pdf = new PdfDocument("ten-pages.pdf");
int[] indexes = [0, 2, 4, 6, 8];
pdf.MovePages(indexes, pdf.PageCount);
pdf.Save("arbitrary-move.pdf");
To move only one page, use the PdfDocument.MovePage
method.
Swap PDF pages
To exchange two pages, use the code like in the following snippet.
using var pdf = new PdfDocument("ten-pages.pdf");
pdf.SwapPages(9, 0);
pdf.SwapPages(8, 1);
pdf.Save("swapped.pdf");
The PdfDocument.SwapPages
method accepts indexes of the two pages that should take each other's
position. When the number of pages is not equal to two, use one of the move pages
methods.
Add and insert pages
Any PdfDocument
contains at least one page. When you create a new document, the library adds one
page implicitly.
Here is how to insert a blank page in PDF using Docotic.Pdf API. You can insert pages at positions
with indexes from 0
to PageCount
inclusive.
using var pdf = new PdfDocument();
var newPage = pdf.InsertPage(0);
newPage.Canvas.DrawString("This is the new first page");
pdf.Save("two-pages.pdf");
To add a blank page to PDF, use the PdfDocument.AddPage
method. The method adds a new page to the
end of the document. It does the same as a pdf.InsertPage(pdf.PageCount)
call.
To add or insert pages from another document, use a combination of calls like described in the Prepend PDF section. You may use the combination of calls to add a cover page to a PDF.
Duplicate PDF pages
With the library API, you can perform the duplicate operation as two consecutive operations. The first one is the copy pages operation. See the code example in the Split PDF section.
The second is the paste pages operation. For this operation, use the PdfDocument.Append
method.
Then move the appended pages into the required position.
Extract pages
The following snippet shows how to extract pages from a PDF. It extracts the first three pages from the source document.
using var pdf = new PdfDocument("ten-pages.pdf");
using (var extracted = pdf.ExtractPages(0, 3))
{
extracted.RemoveUnusedResources();
extracted.Save("three-pages.pdf");
}
pdf.Save("seven-pages.pdf");
The ExtractPages
method removes pages from the document. Because of this, only seven pages remain
in the source document. The library does not allow to extract pages from a document that contains
only one page.
We recommend removing unused resources from the document with the extracted pages.
Remove page from PDF
Check out the snippet that shows how to delete a page in a PDF document. It actually deletes two
pages using different overloads of the RemovePage
method. The first overload accepts a page index
as its parameter. The second overload accepts a page object.
using var pdf = new PdfDocument("ten-pages.pdf");
pdf.RemovePage(0);
pdf.RemovePage(pdf.Pages[0]);
pdf.Save("without-first-two-pages.pdf");
To remove more than one page at a time, use the PdfDocument.RemovePages
method. That method
overloads work with arrays of either page indexes or page objects.
Digital signatures
Docotic.Pdf implements many operations for digital signatures in PDF and can help you to maintaining the trustworthiness and legal validity of your PDF documents. Here are some examples of what the library can do:
Sign PDF with certificate. Adding digital signatures in PDF is the way to confirm the identity of the signer and ensure nobody altered the document after signing.
Certify signature. To add additional restrictions to a PDF, you can sign it with a certification signature. You can completely lock PDF after signing or allow a few types of changes.
Verify signature in PDF. Check the validity of a digital signature to confirm the document's signed part didn't change.
Allow multiple signatures. Contracts, agreements, and forms often require multiple parties to sign a single document. Adding multiple signatures to PDF requires the document to be saved incrementally.
Embed signature timestamp. It is possible to specify a Timestamping Authority URL and, optionally, its credentials in signing options. The library will embed the received timestamp in the signature.
Embed certificate. The library automatically embeds signing certificates in digital signatures.
The Digital Signatures page contains sample codes and more information about each operation.
Protect PDF
There are three features that you can use to ensure PDF security. Docotic.Pdf can work with them in both directions: the library can protect PDF and unlock a secured PDF.
Password protection
This feature allows you to set a password to restrict access to the PDF. Depending on the password type, the PDF will require the correct password to open or modify the document.
There are two types of passwords in PDFs:
- Open password (user password). This type of password is required to open and view the PDF. Without the correct password, a conforming PDF viewer will not open the document.
- Permissions password (owner password). This password is required to remove permissions from PDF. Opening a PDF with the owner password allows all actions, even if permissions restrict certain actions, such as printing, copying, or editing the PDF.
You can set both passwords for the same PDF document. Read about decrypting PDF files to know how to remove passwords from PDF documents.
Encryption
PDF encryption and PDF passwords work together to ensure that sensitive information within the PDF is only available to legitimate users. Only users with the correct decryption key or password can view the contents.
Docotic.Pdf can encrypt PDF files using RC4 40-bit, RC4 128-bit, AES 128-bit, and AES 256-bit encryption algorithms.
Permissions and restrictions
You can set various permissions on a PDF, such as restricting printing, copying text, editing the document, and more. The permissions only affect the experience when someone opens the PDF with the user password. Any restrictions do not apply for those who open the PDF with the owner password.
To remove permissions from a PDF, you would need to remove the PDF security password first. Read about how to do this using C# and Docotic.Pdf API.
To know how to ensure PDF integrity in addition to PDF security, read the section about digital signatures.
Metadata in PDF
PDF metadata is information embedded within a PDF file that provides details about the document. There are two primary sources of metadata: PDF document properties and XMP metadata.
Document properties are also commonly referred to as document information dictionary, file info, metadata fields, document attributes, and file attributes.
XMP (Extensible Metadata Platform) metadata is basically an XML file embedded in a PDF. XMP uses a flexible data model that can store any set of metadata properties. This metadata uses namespaces to group related properties. Some common namespaces include XMP Core/XMP Basic and Dublin Core.
Docotic.Pdf fully supports both XMP metadata and document properties. Please note that the PDF 2.0 standard marked most of the document information dictionary properties as deprecated. The only exceptions are creation date and modification date.
You can find complete test projects for this section examples in the Metadata group of sample codes.
Document properties
See how to edit the document properties with Docotic.Pdf.
using var pdf = new PdfDocument("file.pdf");
pdf.Info.Author = "An example code";
pdf.Info.Subject = "Showing how to access and change document metadata";
pdf.Info.Title = "Custom title goes here";
pdf.Info.Keywords = "pdf Docotic.Pdf";
pdf.Save("updated-file.pdf");
You can change value for each property, but please note that by default the library automatically updates some properties before it saves the PDF. You can change this in save options.
To remove all metadata specified in the document properties, use the PdfInfo.Clear
method. The
method can remove only custom properties, if you like.
XMP metadata
This snippet shows how to change properties of the XMP metadata in a PDF document.
using var pdf = new PdfDocument("file.pdf");
pdf.Metadata.DublinCore.Creators = new XmpArray(XmpArrayType.Ordered);
pdf.Metadata.DublinCore.Creators.Values.Add(new XmpString("me"));
pdf.Metadata.DublinCore.Creators.Values.Add(new XmpString("Docotic.Pdf"));
pdf.Metadata.DublinCore.Format = new XmpString("application/pdf");
pdf.Metadata.Pdf.Producer = new XmpString("me too!");
pdf.Save("updated-file.pdf");
The code changes properties in Dublin Core and Adobe PDF schemas. Please note that the Producer property gets overwritten because of the default save options.
You can extract raw XMP metadata using one of the XmpMetatada.Save
methods. The method will
produce an XML with all the properties.
To remove all XMP metadata from a document, use the XmpMetadata.Unembed
method.
Sync metadata
It is desirable to make sure both XMP metadata and Document Info properties have the same values for the corresponding properties. It is especially true if you edit both sources of metadata in the same file.
Use PdfDocument.SyncMetadata
method to synchronize values in XMP Metadata and Document Info. When
a property changed in both sources, the method will overwrite one source with the value from the
other source. Read the documentation for the
method for more detail.
PDF bookmarks
PDF document can contain special shortcuts or links that help readers navigate to specific sections or pages quickly. PDF outline is another name for bookmarks.
Viewer apps usually display bookmarks like the table of contents in a book, but interactive. When the reader clicks on a bookmark, the viewer app jumps to the designated part of the document. A similar behaviour is possible to achieve using link annotations.
Here is a C# code snippet that shows how to add bookmarks to PDF:
using var pdf = new PdfDocument("ten-pages.pdf");
var root = pdf.OutlineRoot;
root.AddChild("Fifth page", 4);
root.AddChild("Seventh page", pdf.Pages[6]);
pdf.PageMode = PdfPageMode.UseOutlines;
pdf.Save("simple-bookmarks.pdf");
PDF outline can have main bookmarks and sub-bookmarks, making it easier to structure large documents. Here is how to create sub-bookmarks in PDF:
using var pdf = new PdfDocument("ten-pages.pdf");
var root = pdf.OutlineRoot;
var evenPages = root.AddChild("Even pages");
evenPages.AddChild("Second page", 1);
evenPages.AddChild("Fourth page", 3);
pdf.PageMode = PdfPageMode.UseOutlines;
pdf.Save("even-pages-bookmarks.pdf");
You can apply fonts and colors to bookmark items. Check out the complete example for creating an outline with styles.
To remove a bookmark from PDF, use the RemoveChild
or RemoveChildAt
methods. You can remove all
bookmarks by calling the RemoveAllChildren
method on the root node.
File attachments
PDF attachments are external files embedded within a PDF document. People also commonly refer to these files as embedded files and file attachments. You can attach any file: image, audio/video file, another PDF, Word document, Excel spreadsheets or anything else.
If you want to attach PDFs together, creating a combined PDF file, check the article about merging pdf documents.
Here is the C# code that shows how to add attachment to PDF with the help of Docotic.Pdf API.
using var pdf = new PdfDocument();
var excelFile = pdf.CreateFileAttachment("this-year-figures.xlsx");
pdf.SharedAttachments.Add(excelFile);
pdf.Save("shared-attachment.pdf");
The above code added the file as a shared attachment. Readers can find the attached file in the Attachments panel of their viewer.
It is also possible to add attachments to PDF pages. Such attachments are visible inside the page contents like any other annotations.
using var pdf = new PdfDocument();
var page = pdf.Pages[0];
page.Canvas.DrawString(20, 100, "Here is this year's figures document:");
var bounds = PdfRectangle.FromLTRB(155, 100, 165, 110);
var excelFile = pdf.CreateFileAttachment("this-year-figures.xlsx");
pdf.Pages[0].AddFileAnnotation(bounds, excelFile);
pdf.Save("page-attachment.pdf");
Check the Attachments group of sample codes to find complete test projects for this section's examples.
To remove attachments from PDF, you would need to enumerate both shared attachments and page
annotations and remove the items you do not need. See the example for the enumerating code below.
To remove all shared annotations, you can use a pdf.SharedAttachments.Clear()
call.
You would also need to enumerate collections to extract embedded files from PDF. Here is an example code:
using var pdf = new PdfDocument("file-with-attachments.pdf");
int i = 0;
foreach (var attachment in pdf.SharedAttachments)
{
if (attachment?.Contents == null)
continue;
var fileName = attachment.Specification ?? $"attachment{i++}";
attachment.Contents.Save(fileName);
}
foreach (var widget in pdf.GetWidgets())
{
var attachment = (widget as PdfFileAttachmentAnnotation)?.File;
if (attachment?.Contents == null)
continue;
var fileName = attachment.Specification ?? $"attachment{i++}";
attachment.Contents.Save(fileName);
}
Page labels
PDF page labels are custom names or numbers assigned to pages in a PDF document. Unlike standard page numbers, page labels can include a mix of letters, numbers, and even Roman numerals. Other names for page labels are page identifiers and page names.
Here is how to add page labels to PDF using Docotic.Pdf:
using var pdf = new PdfDocument("ten-pages.pdf");
pdf.PageLabels.AddRange(0, 3, PdfPageNumberingStyle.LowercaseRoman);
pdf.PageLabels.AddRange(4, PdfPageNumberingStyle.DecimalArabic, string.Empty, 5);
pdf.PageLabels.AddRange(7, PdfPageNumberingStyle.DecimalArabic, "Appendix page ", 1);
pdf.Save("page-labels.pdf");
The first four pages will have labels i
, ii
, iii
, and iv
. The next three labels are 5
,
6
, and 7
. For the remaining pages, labels will be Appendix page 1
, Appendix page 2
, and
Appendix page 3
.
OCR PDF
Some PDF documents contain scanned pages and require optical character recognition (OCR) before you can extract text from them. Another use case for OCR is to extract text from a PDF that uses custom glyph to Unicode mapping.
We have a blog post that shows how to OCR scanned documents. The post contains a non-searchable PDF example and shows how to use Tesseract OCR, C# code and Docotic.Pdf to recognize text in image-only PDFs. You can also add an OCR text layer to scanned PDF files with the help of Docotic.Pdf.
Edit pages
This section talks about changes to existing PDF pages, like:
- how to rotate PDF pages
- how to change page size
- using vector graphics on page canvas
- adding HTML content
Read about Layout API of the library to know how to create PDF documents from building blocks like header and footer, tables, images, paragraphs of text and the like.
Check out the other sections for information about:
- how to edit text in PDFs
- operations with images
- pdf watermarking
- how to annotate a PDF
- how to fill in PDF forms
Rotate pages
See the C# code snippet for how to rotate only one page in PDF:
using var pdf = new PdfDocument("existing.pdf");
pdf.Pages[0].Rotation = PdfRotation.Rotate180;
pdf.Save("rotated.pdf");
The code rotates the first page by 180 degrees. You can rotate PDF pages by 0, 90, and 270 degree too.
Change page size
Docotic.Pdf provides more than one way to change page size of PDF. In the simplest case, you can
use Width
and Height
properties of a PdfPage
object to specify the desired size. For an
existing document, it won't resize document pages content. And it won't remove any content. It will
just hide all the page content that is outside the rectangle of the specified size.
A similar approach is to crop pages. You can change CropBox
of a page using C# code like this:
using var pdf = new PdfDocument("existing.pdf");
var page = pdf.Pages[0];
var cropBoxBefore = page.CropBox;
page.CropBox = new PdfBox(0, cropBoxBefore.Height - 256, 256, cropBoxBefore.Height);
pdf.Save("cropped.pdf");
Changing crop box is the way to go if you would like to save a part of the page as an image.
If the goal is to keep all the contents visible on a page of the different size, then use the scaling approach. In the following code snippet, I create a XObject from a page. The XObject is like a vector image. You can draw the same object on multiple pages scaling and rotating it as needed.
After the XObject is ready, I clear the previous page content, resize the page, and then draw the object on the resized page.
using var pdf = new PdfDocument("existing.pdf");
var page = pdf.Pages[0];
var pageXObject = pdf.CreateXObject(page);
page.Canvas.Clear();
page.Width /= 2;
page.Height /= 2;
page.Canvas.DrawXObject(pageXObject, 0, 0, page.Width, page.Height, 0);
pdf.Save("resized.pdf");
Vector graphics
Docotic.Pdf library can add vector graphics like lines, curves, and shapes to PDF documents. You can construct graphics paths from graphics objects. Then you can fill or stroke the paths using colors from different color spaces.
Find example code for graphics-related features in the Graphics group of sample codes.
It is also possible to extract graphics from PDF. Start from calling the GetObjects
method and
then extract information from objects of PdfPageObjectType.Path
type. Don't forget that XObjects
can also contain nested paths.
using var pdf = new PdfDocument("existing.pdf");
var options = new PdfObjectExtractionOptions();
var objects = pdf.Pages[0].GetObjects(options);
foreach (var obj in objects)
{
if (obj.Type == PdfPageObjectType.Path)
{
var path = (PdfPath)obj;
Console.WriteLine($"Found path {path}");
}
else if (obj.Type == PdfPageObjectType.XObject)
{
var paintedXObject = (PdfPaintedXObject)obj;
var nestedObjects = paintedXObject.XObject.GetObjects(options);
// ...
}
}
Add HTML to PDF pages
Overlaying HTML content onto a PDF document can be useful for adding dynamic elements like charts or stock price ticker to your PDFs.
Read about how to insert HTML in PDF to get more detail and download an example code.
Edit PDF text
This section is about how to edit the text in a PDF, how to change text color in PDF, and how to add new text.
We have an article dedicated to how to extract text from a PDF. Check it out for more information on the topic.
Text flattening is also possible with the help of Docotic.Pdf.
Find and replace
To modify text in PDF, you would need to find the area that contains the text, then remove the text in the area. The last step is to add the new text to the same area of the document.
Searching PDFs can be tricky because internally the document can contain words in any order. The text can also be rotated. Luckily, we have a sample code that shows how to search a PDF for words or phrases.
When you have coordinates of the text to remove, it is time to edit the containing page contents.
The library provides means to enumerate and copy page objects. So it is possible to omit some text
while copying objects. This will essentially remove the text. The code of the edit PDF page
content example shows all the details of the process. You would need to
update the ShouldRemoveText
method to use the found coordinates.
Read the next section to see how to add the new text to the document.
If you create documents with a placeholder text and later replace the placeholder with some other text, then you can use text boxes instead.
The idea is to add a read-only text box without borders to the document and put the placeholder
text in it. Later you can open the document, find the text box by its name and replace the
placeholder with a simple call box.Text = "new text";
. Flatten the text box after the replacement
if you don't want any further changes.
Add new text
To add some text to documents, use DrawString
and DrawText
methods of a PdfCanvas
object. The
methods use the current canvas font. The font must contain glyphs for all characters in the text.
Use the PdfFont.ContainsGlyphsForText
method to check if the font meets this requirement.
var canvas = pdf.Pages[0].Canvas;
canvas.Font = pdf.AddFont("NSimSun")
?? throw new ArgumentException("Font not found");
canvas.DrawString(10, 50, "Olá. 你好. Hello. This is some new text");
You can add Unicode text drawn with Type1, TrueType, and OpenType fonts. The library can use fonts installed on your system, 14 built-in Type1 fonts, or load a required font from a file.
Change text color
To change color of text in PDF, use the same approach as with removing text. You
would need to change at least the ReplaceColor
method in the sample code.
Images
Docotic.Pdf provides everything required to edit PDF images. Below are C# code snippets for the most popular operations.
The Images group of sample codes contains complete test projects for examples in this section.
Add image to PDF
The library can import images in GIF/TIFF/PNG/BMP/JPEG formats. You can also add an image from a
System.Drawing.Image
object.
var canvas = pdf.Pages[0].Canvas;
var image = pdf.AddImage("image.jpg")
?? throw new ArgumentException("Cannot add image");
canvas.DrawImage(image, 10, 50);
You can specify a rotation angle and an output size using overloads of the DrawImage
method. To
draw the same image on multiple pages, add the image once and use the same PdfImage
object in
multiple calls to the DrawImage
method.
Combine images into PDF
Here is the C# code that shows how to combine multiple images into one PDF.
using var pdf = new PdfDocument();
var imagePaths = new string[] { "image.jpg", "another-image.png" };
foreach (var path in imagePaths)
{
var image = pdf.AddImage(path)
?? throw new ArgumentException("Cannot add image");
var page = pdf.AddPage();
page.Width = image.Width;
page.Height = image.Height;
page.Canvas.DrawImage(image, 0, 0);
}
pdf.RemovePage(0);
pdf.Save("combined-images.pdf");
The code adds multiple images to PDF, changing each page size to match the corresponding image size. Before saving the result, the code removes the first implicitly added empty page.
Extract PDF images
We designed Docotic.Pdf for extracting images from PDF files without compromising the quality of the images. The library does not change images size or compression. You will get images of the same quality as in the PDF.
using var pdf = new PdfDocument("file-with-images.pdf");
int i = 0;
foreach (PdfImage image in pdf.GetImages())
{
var path = image.Save($"image{i++}");
Console.WriteLine($"Saved to {path}");
}
Remove and replace images
Use the PdfPage.RemovePaintedImages
method to remove all or specific images from a PDF page. You
can filter images by position, size, transformation, or other parameters.
using var pdf = new PdfDocument("file-with-images.pdf");
pdf.Pages[0].RemovePaintedImages(
image =>
{
return image.Size.Width > 100;
}
);
pdf.RemoveUnusedResources();
pdf.Save("no-wide-images.pdf");
The above C# code shows how to remove images with the help of Docotic.Pdf. I recommend removing unused resources after you changed or removed images.
Use the PdfImage.ReplaceWith
method to replace all occurrences of the image within the PDF
document.
using var pdf = new PdfDocument("file-with-images.pdf");
var firstImage = pdf.GetImages(false).FirstOrDefault()
?? throw new ArgumentException("No images found");
firstImage.ReplaceWith("another-image.png");
pdf.RemoveUnusedResources();
pdf.Save("replaced-image.pdf");
Change compression scheme
Docotic.Pdf provides methods for changing compression of PDF images. It is possible to repack the images using JPEG, CCITT Group 3 and 4 (fax), JPEG 2000, and zip/deflate compression algorithms.
Depending on the initial and the new compression, the change can cause loss of detail or the quality of the image. But lossy conversions usually help to reduce document size.
firstImage.RecompressWithJpeg2000(25);
There are other methods to repack an image. Check the PdfImage
methods with names that start with
RecompressWith
. You can remove any compression from an image using the Uncompress
method.
Resize images
If some images in a PDF document are larger than needed, the library can resize or downscale them for you.
firstImage.Scale(0.5, PdfImageCompression.Jpeg2000, 25);
The above code makes the first image two times smaller in both directions. The library uses JPEG 2000 compression for the resulting image.
You can use one of the ResizeTo
methods to specify exact values for the resulting width and
height.
Resizing images usually reduces PDF file size even more than changing their compression (see the section above), but it is a lossy process.
Watermarks & backgrounds
Watermarking PDF involves these steps:
- Create a XObject, the container for watermark contents
- Fill the object with text, images, and vector graphics
- Stamp PDF pages with the object
Here is the C# code that adds the Confidential watermark to PDF:
using var pdf = new PdfDocument("existing.pdf");
var watermark = pdf.CreateXObject();
watermark.DrawOnBackground = true;
var canvas = watermark.Canvas;
canvas.FontSize = 72;
canvas.Brush.Color = new PdfRgbColor(222, 35, 35);
canvas.Brush.Opacity = 45;
canvas.Pen.Color = canvas.Brush.Color;
canvas.Pen.Opacity = canvas.Brush.Opacity;
canvas.Pen.Width = 5;
var padding = 10;
var text = "CONFIDENTIAL";
canvas.DrawString(padding, padding, text);
var textSize = canvas.MeasureText(text);
var watermarkRect = new PdfRectangle(
padding, padding, textSize.Width, textSize.Height);
canvas.DrawRoundedRectangle(watermarkRect, new PdfSize(padding, padding));
foreach (var page in pdf.Pages)
{
page.Canvas.DrawXObject(
watermark,
(page.Width - watermarkRect.Width) / 2,
(page.Height - watermarkRect.Height) / 2);
}
pdf.Save("watermarked.pdf");
The code sets the brush and pen properties of the watermark canvas. The brush is used to paint the text. To find out the text size, the code measures the text. Then it draws a rectangle with rounded corners around the text. The pen is used to stroke the rectangle.
After the watermark content is ready, the code draws it in the center of each page.
PDF backgrounds are very similar to watermarks. At least you can create them in almost the same
way. To add a background to PDF, do the same as in the above code, but add
watermark.DrawOnBackground = true;
after the CreateXObject
call. Please note that opaque
content like images can obscure the background.
Annotations
Docotic.Pdf provides a rich API for annotations in PDF. You can create, edit, and remove annotations from PDF documents. It is also possible to flatten annotations.
To annotate a text, there are:
- Sticky notes or text annotations. See the
AddTextAnnotation
method of thePdfPage
class. - Highlights. See the
AddHighlightAnnotation
method. - Strikethroughs. See the
AddStrikeoutAnnotation
method. - Underlines. See the
AddJaggedUnderlineAnnotation
andAddUnderlineAnnotation
methods.
Use links to jump from one page to another or to an external resource. You can use ink annotations for freehand drawing on a PDF page. There are redaction annotations for parts that are designated to be removed from the document. You can also embed audio, video, or 3D content.
Highlight text
Here is how to highlight text in PDF documents:
using var pdf = new PdfDocument();
var page = pdf.Pages[0];
var canvas = page.Canvas;
canvas.FontSize = 30;
var text = "Highlighted text.";
var position = new PdfPoint(10, 50);
canvas.DrawString(position, text);
canvas.DrawString(" Not highlighted.");
var size = canvas.MeasureText(text);
var bounds = new PdfRectangle(position, size);
var color = new PdfRgbColor(145, 209, 227);
var annotationText = "Please pay attention to this part.";
page.AddHighlightAnnotation(annotationText, bounds, color);
pdf.Save("highlighted.pdf");
Links
To link to a specific page in PDF, use a code like this:
using var pdf = new PdfDocument();
var secondPage = pdf.AddPage();
secondPage.Canvas.DrawString(10, 50, "Welcome to the second page.");
var firstPage = pdf.Pages[0];
var canvas = firstPage.Canvas;
var linkRect = new PdfRectangle(10, 50, 100, 60);
canvas.DrawRectangle(linkRect, PdfDrawMode.Stroke);
var options = new PdfTextDrawingOptions(linkRect)
{
HorizontalAlignment = PdfTextAlign.Center,
VerticalAlignment = PdfVerticalAlign.Center
};
canvas.DrawText("Go to 2nd page", options);
firstPage.AddLinkToPage(linkRect, 1);
pdf.Save("linked.pdf");
In the code, the action area annotation works as an internal hyperlink. Such areas can navigate to external resources and perform a non-navigational actions, too.
Remove annotations
To remove annotations from PDF:
- Access the collection of widgets using the
PdfPage.Widgets
property or thePdfDocument.GetWidgets
method. - Check the type, properties, or otherwise decide which annotations you no longer need.
- Remove the annotation by using the
PdfDocument.RemoveWidget
method or methods of thePdfWidgetCollection
object.
To remove attachments from PDF, you would need to remove both file annotations and shared attachments.
Redact PDF
As a PDF redaction library, Docotic.Pdf offers methods to permanently remove or quickly black out sensitive information from your PDF documents.
Redact text
Here is how to black out text in PDF without Redact tool, using only C# and Docotic.Pdf.
int i = 0;
foreach (var page in pdf.Pages)
{
foreach (var word in page.GetWords())
{
if (i % 3 == 0)
{
page.Canvas.AppendRectangle(word.Bounds);
page.Canvas.FillPath(PdfFillMode.Winding);
}
i++;
}
}
The code draws a black rectangle over each third word in a document. Please note that the text behind the rectangles remains in the document and it is possible to extract it later. To permanently remove the text, use the approach from the section about replacing text.
Redact images
You can use black rectangles to cover images, too. But an easier approach would be to replace the image with a black 1 by 1 pixel image. This will not only visually highlight the redacted image, but will also remove the original image data.
Check the section about removing and replacing images for code examples. I also
recommend calling the PdfDocument.ReplaceDuplicateObjects
method after the replacement.
PDF forms
Docotic.Pdf can create Acroforms (this is another name for PDF forms) using all kinds of interactive elements like buttons, checkboxes, drop-down lists, list boxes, radio buttons, and text fields.
It usually takes only a few lines of code to add and set up a form field. For example, you can add
editable fields to PDF by simply calling the PdfPage.AddTextBox
method. The sample codes in the
Forms and Annotations group provide more information about creating and
using forms.
How to fill in a PDF form
Use the PdfDocument.GetControl
method to find a PDF control by its full or partial name. An
alternative is to enumerate document controls using the GetControls
method. In either case, you
would need to cast the control to the expected field type.
using var pdf = new PdfDocument(@"example-form.pdf");
if (pdf.GetControl("txt-name") is PdfTextBox nameTextBox)
nameTextBox.Text = "Bit Miracle team";
if (pdf.GetControl("txt-email") is PdfTextBox emailTextBox)
emailTextBox.Text = "support@bitmiracle.com";
if (pdf.GetControl("check-agree") is PdfCheckBox agreeCheckBox)
agreeCheckBox.Checked = true;
pdf.Save("filled-form.pdf");
The code uses this PDF form example. In the code, I set values for the two text fields and tick the checkbox.
When you finished filling in a form, you can flatten all its fields.
Using JavaScript in forms
You can add actions to control events. The PdfControl
class provides access to a pre-defined set
of events. The names of the events start with "On" (e.g., OnMouseDown
).
Here is an example of using JavaScript for PDF forms:
using var pdf = new PdfDocument(@"example-form.pdf");
foreach (var field in pdf.GetControls())
field.OnChange = pdf.CreateJavaScriptAction($"app.alert('{field.Name} changed!',3)");
pdf.Save("javascript-events.pdf");
Forms Data Format
There is one more way to electronically fill out PDF. Use the FDF to PDF feature of the library to auto populate PDF form from database or another source.
using var pdf = new PdfDocument(@"example-form.pdf");
pdf.ImportFdf("form-data.fdf");
pdf.Save("auto-populated.pdf");
The code uses this FDF file to fill all form fields at once.
Flatten PDF
This section is about how to flatten a PDF.
When you are flattening a PDF, you converting interactive elements like forms and annotations into static content to prevent further editing. A flattened PDF can take significantly fewer bytes while looking the same.
Flatten forms and annotations
To flatten a fillable PDF, use the PdfDocument.FlattenControls
method. This method draws all form
fields and other controls on its parent page, removing the source control from the document.
When you flatten a PDF form, it makes sense to flatten annotations, too. Use the
PdfDocument.FlattenWidgets
method to flatten both controls and annotations at the same time.
If you only want to convert some controls and/or annotations to their visual representation, then
use the PdfWidget.Flatten
method. You would need to find the required control or
annotation first.
Flatten text
You can convert PDF text to outlines with the help of Docotic.Pdf. The usual reason for this is to achieve font independence. Regardless of whether the fonts are installed, the flattened text will look the same on any device.
However, once you convert text to outlines, you can no longer edit it as text. Also, during the flattening process, the library converts the text to vector graphics. This can increase the files size.
To flatten PDF text, you would need to extract the text as vector paths and copy it on a new or the same page. There is a sample code for this.
Save options
In the code snippets above, I used the PdfDocument.Save
method without additional arguments. The
library uses the default save options in such cases. We handpicked the defaults so that in usual
cases they work perfect.
Still, there are cases when you need to override the default options. For this, create a
PdfSaveOptions
object, set up the options, and provide them to one of the save methods. Further,
I will describe that cases.
To protect PDF with a password or a certificate, create an encryption
handler and set it to the EncryptionHandler
property.
When you want to sign the same PDF multiple times, turn on the incremental
updates mode on by setting the WriteIncrementally
property to true
. Do the same when you are
saving a previously signed file with new annotations or form data.
Set the Linearize
property to true
to produce a linearized (or a Fast Web View optimized) PDF
file. Viewers that recognize this optimization can display such files faster.
To prevent save-time changes to some of the metadata fields, set the UpdateProducer
and UpdateModifiedDate
properties to false
.