Compress PDF documents in C# and VB.NET

In many cases, it is a common desire to compress and optimize PDF documents. Smaller PDF documents are easier to transfer through network and cheaper to store. Reducing PDF file size is especially important for archiving purposes.

Optimize PDF documents in C# and VB.NET

.NET library to optimize PDF documents

Use Docotic.Pdf library to compress PDF documents in .NET Framework and .NET Core applications. You can download the binaries of the library or use its NuGet package. To try the library without evaluation mode restrictions, you may get the free time-limited license key by using the form here.

Docotic.Pdf library 9.3.16793 Regression tests 14,582 passed Total NuGet downloads 3,736,587

Docotic.Pdf provides different optimization means:

  • optimize PDF objects
  • remove duplicate PDF objects (fonts, images, etc.)
  • compress images
  • subset fonts
  • remove metadata
  • remove structure information
  • remove unused resources
  • remove private application data
  • flatten form fields and annotations
  • unembed fonts

You can use all the above to get the best compression ratio for your PDF documents. Look at the Compress PDF document in .NET sample to see all these techniques in action.

Let's review these compression methods in more detail.

Optimize PDF objects

Internally, a PDF file is a collection of low-level PDF objects: dictionaries, streams, arrays, and others. When saving a PDF file, Docotic.Pdf applies the following lossless optimizations by default:

  • compresses PDF streams with Flate encoding
  • deletes unused PDF objects
  • inlines indirect PDF objects
  • writes PDF objects without formatting
  • packs PDF objects to compressed object streams

This sample shows how to optimize PDF objects in C#:

using BitMiracle.Docotic.Pdf;

using (var pdf = new PdfDocument("input.pdf"))
{
    var saveOptions = new PdfSaveOptions();

    // These options are enabled by default and applied implicitly:
    //saveOptions.Compression = PdfCompression.Flate;
    //saveOptions.RemoveUnusedObjects = true;
    //saveOptions.OptimizeIndirectObjects = true;
    //saveOptions.UseObjectStreams = true;
    //saveOptions.WriteWithoutFormatting = true;

    pdf.Save("compressed.pdf", saveOptions);
}

All these optimizations don't affect visible contents (text, images, bookmarks, and anything else) of the PDF document. They only affect how PDF objects are written and compressed in the output PDF file.

Deleting unused PDF objects is important for other techniques discussed below. Do not set PdfSaveOptions.RemoveUnusedObjects property to false unless you have strong reasons to keep the unused objects.

Remove duplicate objects in PDF documents

When you merge PDF documents, the produced PDF often contains duplicate fonts and images. Replacing duplicate objects helps to reduce size of the produced PDF file. Here is the C# sample for this operation:

using (var pdf = new PdfDocument("merged.pdf"))
{
    pdf.ReplaceDuplicateObjects();

    pdf.Save("compressed.pdf");
}

It is recommended to remove duplicate objects before compressing images or unembedding fonts. Otherwise, a lot of extra work will be done to optimize copies of the same images or fonts.

PdfDocument.ReplaceDuplicateObjects method does not replace inline images. If your document contains inline images, then use PdfCanvas.MoveInlineImagesToResources method first. The method will convert the inline images to regular ones, and then the ReplaceDuplicateObjects method will be able to deduplicate converted images, too.

Compress images in PDF

Optimizing PDF images is usually the most effective compression method for documents with raster images.

Docotic.Pdf library provides built-in methods to recompress PDF images using JPEG, CCITT Group 3 and 4 (fax), JPEG 2000, and zip/deflate compression algorithms. You can also resize or downscale images to reduce PDF file size even more. Or you can optimize images yourself by using a 3rd-party tool and then replace images.

Look at the Optimize images in PDF document in C# and VB.NET sample on GitHub for an example.

Subset fonts

PDF documents usually embed fonts used to draw text. Embedded fonts often include information about all supported glyphs. Removing glyphs unused in a PDF document can significantly reduce the output file size.

This sample shows how to optimize PDF fonts in C#:

using (var pdf = new PdfDocument("text.pdf"))
{
    pdf.RemoveUnusedFontGlyphs();

    pdf.Save("compressed.pdf");
}

Subsetting does not affect fonts used in variable text controls, such as text boxes or combo boxes.

Remove metadata

PDF documents can contain uncompressed XMP metadata with information about the author, keywords, creation time, and so on. The metadata does not affect visible contents of the PDF document.

This sample shows how to remove metadata from a PDF file in C#:

using (var pdf = new PdfDocument("metadata.pdf"))
{
    XmpMetadata xmp = pdf.Metadata;
    xmp.Unembed();
    pdf.Info.Clear(false);

    pdf.Save("compressed.pdf");
}

Remove structure information

PDF documents can include information about their logical structure. The information is used to:

  • produce Tagged PDF documents
  • make PDF document accessible for assistive devices

Removing such information helps to reduce PDF file size. But the PDF will no longer be tagged or accessible for assistive devices. This sample shows how to delete structure information from PDF in C#:

using (var pdf = new PdfDocument("tagged.pdf"))
{
    pdf.RemoveStructureInformation();

    pdf.Save("compressed.pdf");
}

Remove unused resources from PDF

PDF pages and XObjects can reference more fonts, images, or patterns than they use. You can use PdfDocument.RemoveUnusedResources method to remove unused resources from PDF. Here is the C# sample:

using (var pdf = new PdfDocument("input.pdf"))
{
    pdf.RemoveUnusedResources();

    pdf.Save("compressed.pdf");
}

Remove private application data from PDF

PDF documents, produced by Adobe software, can include private application data. Such application data is stored in page-piece dictionaries.

This sample shows how to clean up and compress PDF in C# by removing page-piece dictionaries:

using (var pdf = new PdfDocument("input.pdf"))
{
    pdf.RemovePieceInfo();

    pdf.Save("compressed.pdf");
}

Flatten PDF form fields and annotations

You can reduce size of a PDF document with a completed form by flattening the form fields. The flattening will replace form fields with their visual representation. The filled-in values will be preserved. This C# sample shows how to flatten all PDF form fields:

using (var pdf = new PdfDocument("form.pdf"))
{
    pdf.FlattenControls();

    pdf.Save("compressed.pdf");
}

Alternatively, you can flatten all annotations and controls using the PdfDocument.FlattenWidgets method.

Also, the PdfWidget.Flatten method allows you to flatten individual annotations or controls.

Unembed fonts

Embedding of PDF fonts makes perfect sense for custom or rare fonts. At the same time, the widely available fonts like Arial or Verdana can increase PDF file size without a good reason. You can unembed popular fonts available on your target platforms. Sample C# code:

using (var pdf = new PdfDocument("input.pdf"))
{
    unembedFonts(pdf);

    pdf.Save("compressed.pdf");
}

/// <summary>
/// This method unembeds any font that is:
/// * installed in the OS
/// * or has its name included in the "always unembed" list
/// * and its name is not included in the "always keep" list. 
/// </summary>
private static void unembedFonts(PdfDocument pdf)
{
    string[] alwaysUnembedList = new string[] { "MyriadPro-Regular" };
    string[] alwaysKeepList = new string[] { "ImportantFontName", "AnotherImportantFontName" };

    foreach (PdfFont font in pdf.GetFonts())
    {
        if (!font.Embedded ||
            font.EncodingName == "Built-In" ||
            Array.Exists(alwaysKeepList, name => font.Name == name))
        {
            continue;
        }

        if (font.Format == PdfFontFormat.TrueType || font.Format == PdfFontFormat.CidType2)
        {
            SystemFontLoader loader = SystemFontLoader.Instance;
            byte[] fontBytes = loader.Load(font.Name, font.Bold, font.Italic);
            if (fontBytes != null)
            {
                font.Unembed();
                continue;
            }
        }
        
        if (Array.Exists(alwaysUnembedList, name => font.Name == name))
            font.Unembed();
    }
}

Do not use this technique with PDF/A documents. A PDF/A document must embed all fonts.

Please always test PDF documents with unembedded fonts in your target operating systems (Windows, Linux, macOS, iOS, Android) and PDF viewers (Adobe, Foxit, etc.).

Other methods to reduce PDF size

There are many optimization methods mentioned above. However, you can compress PDF documents even more by removing unimportant content. Docotic.Pdf allows you to delete these objects from PDF documents:

  • annotations
  • attachments
  • bookmarks
  • form fields
  • pages
  • scripts
  • transparency

You can also remove text, images, and vector graphics from PDF pages. Text flattening is also possible.

Conclusion

You can use Docotic.Pdf library to compress PDF in C# and VB.NET. Docotic.Pdf provides many effective optimization means.

Download and try the complete Compress PDF document in C# and VB.NET sample from GitHub.

Contact us, and we will advise how to achieve the best compression ratio for your PDF documents.