Split PDF documents in C# and VB.NET

Docotic.Pdf library allows you to divide a PDF document into a group of smaller files. You can extract individual pages or page ranges. You can also split PDF documents based on certain criteria.

Split PDF documents

Docotic.Pdf comes with paid licenses, but it is also free in certain cases. You can download the library and get an evaluation license key on the Docotic.Pdf download page.

Docotic.Pdf library 9.4.17467-dev Regression tests 14,760 passed Total NuGet downloads 4,415,970

PDF splitting basics

The PdfDocument.CopyPages methods allow you to copy pages from PdfDocument objects. This is the primary Docotic.Pdf API to split PDF documents.

Split PDF to individual pages

The following C# code saves each PDF page to a separate file:

using var pdf = new PdfDocument("source.pdf");

for (int i = 0; i < pdf.PageCount; ++i)
{
    using PdfDocument copy = pdf.CopyPages(i, 1);
    copy.RemoveUnusedResources();
    copy.Save(i + ".pdf");
}

The PdfDocument.RemoveUnusedResources method helps to reduce output files. It is useful when copied pages reference unused fonts, images, patterns. Read more about PDF compression in the Optimize output files section.

Split to page groups

The CopyPages method supports copying of any page range. This code snippet shows how to extract the third and the first pages:

using var pdf = new PdfDocument(@"source.pdf");

using PdfDocument copy = pdf.CopyPages(new int[] { 2, 0 });
copy.RemoveUnusedResources();
copy.Save("result.pdf");

The order of page indexes is important. It defines the order of pages in the resulting document.

Try the Copy pages code sample from GitHub.

Split PDF by condition

You can split documents based on content. That is helpful if you do not know in advance which pages to extract. For example, extract pages containing specific text:

string textToFind = ".NET Standard";
using (var pdf = new PdfDocument("C# in depth.pdf"))
{
    var pageIndexes = new List<int>();
    for (int i = 0; i < pdf.Pages.Count; i++)
    {
        string pageText = pdf.Pages[i].GetText();
        if (pageText.Contains(textToFind, StringComparison.CurrentCultureIgnoreCase))
            pageIndexes.Add(i);
    }

    if (pageIndexes.Count > 0)
    {
        using var copy = pdf.CopyPages(pageIndexes.ToArray());
        copy.RemoveUnusedResources();
        copy.Save(textToFind + ".pdf");
    }
}

You can read more about text extraction in the Extract text from PDF in C# and VB.NET article.

Advanced PDF splitting

Extract pages

The CopyPages methods do not change the associated PdfDocument object. There are also the PdfDocument.ExtractPages methods. They allow you to remove extracted pages from the document:

using var pdf = new PdfDocument(@"source.pdf");

using PdfDocument copy = pdf.ExtractPages(0, 3);
copy.Save("extracted.pdf");

pdf.Save("original.pdf");

You can try the corresponding Extract pages code sample from GitHub.

Remove and reorder pages

The CopyPages and ExtractPages methods produce a new document with selected pages. An alternative is to remove pages from a current document:

using var pdf = new PdfDocument(@"source.pdf");
pdf.RemovePages(0, 3);
pdf.Save("remaining.pdf")

You can also reorder pages after removal. Look at the related code samples:

Optimize output files

Earlier, I used the RemoveUnusedResources method to optimize resulting files. Docotic.Pdf provides more options for PDF compression. For example, you can remove structure information or compress images. Read the Compress PDF documents in C# and VB.NET article for more information. You can also try the Compress PDF document in .NET code sample from GitHub.

PDF splitting is sometimes used to get page files smaller than some limit. In such cases, you can measure the resulting size and compress the file if necessary. Sample code:

using var pdf = new PdfDocument("source.pdf");

using PdfDocument copy = pdf.CopyPages(0, 1);
copy.RemoveUnusedResources();

using var ms = new MemoryStream();
copy.Save(ms);

byte limit = 1024 * 1024;
if (ms.Length > limit)
{
    compress(copy);
    copy.Save("result.pdf");
}

Note that it might be impossible to compress a PDF file below a certain limit. The results depend on the file content and on the limit value.

Extract page content

It is also possible to change page content when splitting. For example, you can scale extracted pages before using them in a PDF imposition. Try the related Create XObject from page sample project from GitHub.

Or you can remove or replace some content on extracted pages. Look at the Copy text, paths and images code sample that shows how to copy PDF page objects.

Split PDF to images

Docotic.Pdf also allows you to split PDF document to page images. Read the Convert PDF to image in C# and VB.NET article for more detail.

Split in parallel threads

You might want to parallelize PDF splitting for large documents. The PdfDocument class is not thread-safe. But it is possible to use separate PdfDocument objects in each thread:

string fileName = "source.pdf";
using var temp = new PdfDocument(fileName);
int pageCount = temp.PageCount;

Parallel.For(0, pageCount, i =>
{
    using var pdf = new PdfDocument(fileName);
    using var copy = pdf.CopyPages(i, 1);
    copy.RemoveUnusedResources();
    copy.Save($"split_{i}.pdf");
});

Note that the single-threaded code is usually faster. Multi-threaded solution involves an overhead related to parsing of extra PdfDocument objects. Use the single-threaded version unless tests prove that a parallel code is faster.