Split PDF documents in C# and VB.NET
Docotic.Pdf library allows you to divide a PDF document into a group of smaller files. You can extract individual pages or page ranges. You can also split PDF documents based on certain criteria.
Docotic.Pdf comes with paid licenses, but it is also free in certain cases. Get the library and a free time-limited license key on the Download C# .NET PDF library page.
9.6.17807 14,868 passed Total NuGet downloads 5,134,090PDF splitting basics
The PdfDocument.CopyPages methods allow you to copy pages from PdfDocument objects. This is the primary Docotic.Pdf API to split PDF documents.
Split PDF to individual pages
The following C# code saves each PDF page to a separate file:
using var pdf = new PdfDocument("source.pdf");
for (int i = 0; i < pdf.PageCount; ++i)
{
using PdfDocument copy = pdf.CopyPages(i, 1);
copy.RemoveUnusedResources();
copy.Save(i + ".pdf");
}
The PdfDocument.RemoveUnusedResources method helps to reduce output files. It is useful when copied pages reference unused fonts, images, patterns. Read more about PDF compression in the Optimize output files section.
Split to page groups
The CopyPages method supports copying of any page range. This code snippet shows how to extract the third and the first pages:
using var pdf = new PdfDocument(@"source.pdf");
using PdfDocument copy = pdf.CopyPages(new int[] { 2, 0 });
copy.RemoveUnusedResources();
copy.Save("result.pdf");
The order of page indexes is important. It defines the order of pages in the resulting document.
Try the Copy pages code sample from GitHub.
Split PDF by condition
You can split documents based on content. That is helpful if you do not know in advance which pages to extract. For example, extract pages containing specific text:
string textToFind = ".NET Standard";
using (var pdf = new PdfDocument("C# in depth.pdf"))
{
var pageIndexes = new List<int>();
for (int i = 0; i < pdf.Pages.Count; i++)
{
string pageText = pdf.Pages[i].GetText();
if (pageText.Contains(textToFind, StringComparison.CurrentCultureIgnoreCase))
pageIndexes.Add(i);
}
if (pageIndexes.Count > 0)
{
using var copy = pdf.CopyPages(pageIndexes.ToArray());
copy.RemoveUnusedResources();
copy.Save(textToFind + ".pdf");
}
}
You can read more about text extraction in the Extract text from PDF in C# and VB.NET article.
Advanced PDF splitting
Extract pages
The CopyPages
methods do not change the associated PdfDocument
object. There are also the
PdfDocument.ExtractPages methods. They allow
you to remove extracted pages from the document:
using var pdf = new PdfDocument(@"source.pdf");
using PdfDocument copy = pdf.ExtractPages(0, 3);
copy.Save("extracted.pdf");
pdf.Save("original.pdf");
You can try the corresponding Extract pages code sample from GitHub.
Remove and reorder pages
The CopyPages
and ExtractPages
methods produce a new document with selected pages.
An alternative is to remove pages from a current document:
using var pdf = new PdfDocument(@"source.pdf");
pdf.RemovePages(0, 3);
pdf.Save("remaining.pdf")
You can also reorder pages after removal. Look at the code samples in these sections:
Optimize output files
Earlier, I used the RemoveUnusedResources method to optimize resulting files. Docotic.Pdf provides more options for PDF compression. For example, you can remove structure information or compress images. Read the Compress PDF documents in C# and VB.NET article for more information. You can also try the Compress PDF document in .NET code sample from GitHub.
PDF splitting is sometimes used to get page files smaller than some limit. In such cases, you can measure the resulting size and compress the file if necessary. Sample code:
using var pdf = new PdfDocument("source.pdf");
using PdfDocument copy = pdf.CopyPages(0, 1);
copy.RemoveUnusedResources();
using var ms = new MemoryStream();
copy.Save(ms);
byte limit = 1024 * 1024;
if (ms.Length > limit)
{
compress(copy);
copy.Save("result.pdf");
}
Note that it might be impossible to compress a PDF file below a certain limit. The results depend on the file content and on the limit value.
Extract page content
It is also possible to change page content when splitting. For example, you can scale extracted pages before using them in a PDF imposition. Try the related Create XObject from page sample project from GitHub.
Or you can remove or replace some content on extracted pages. Look at the Copy text, paths and images code sample that shows how to copy PDF page objects.
Split PDF to images
Docotic.Pdf also allows you to split PDF document to page images. Read the Convert PDF to image in C# and VB.NET article for more detail.
Split in parallel threads
You might want to parallelize PDF splitting for large documents. The
PdfDocument class is not thread-safe. But it is possible to
use separate PdfDocument
objects in each thread:
string fileName = "source.pdf";
using var temp = new PdfDocument(fileName);
int pageCount = temp.PageCount;
Parallel.For(0, pageCount, i =>
{
using var pdf = new PdfDocument(fileName);
using var copy = pdf.CopyPages(i, 1);
copy.RemoveUnusedResources();
copy.Save($"split_{i}.pdf");
});
Note that the single-threaded code is usually faster. Multi-threaded solution involves an overhead related to parsing of extra PdfDocument objects. Use the single-threaded version unless tests prove that a parallel code is faster.