Extract pages, split at bookmarks, or divide by content pattern. Batch split thousands of documents with bookmark and label preservation.
use pdfluent::{Sdk, split::{SplitOptions, SplitBy}};
fn main() -> pdfluent::Result<()> {
let sdk = Sdk::new()?;
let doc = sdk.open("annual_report.pdf")?;
// Split at every level-1 bookmark boundary
let opts = SplitOptions::builder()
.by(SplitBy::BookmarkLevel(1))
.preserve_child_bookmarks(true)
.preserve_page_labels(true)
.preserve_named_destinations(true)
.build();
let parts = doc.split(opts)?;
for (i, part) in parts.iter().enumerate() {
let path = format!("output/chapter_{:02}.pdf", i + 1);
part.save(&path)?;
println!(
"Chapter {}: {} pages, {} bookmarks",
i + 1,
part.page_count(),
part.bookmark_count()
);
}
Ok(())
}Run cargo add pdfluent@1.0.0-beta.5 to get started.
Extract any page range by specifying start and end page numbers. Ranges can overlap and multiple ranges can be extracted in a single pass. Output documents are independent, fully valid PDFs.
Split at level-1 or level-2 bookmark boundaries. Each output document starts at the bookmarked page and contains all content up to the next bookmark at the same level. Child bookmarks are preserved within each output.
Split on detected content patterns such as a page that matches a regex in its text layer, or a page that contains a specific image hash. Useful for splitting batched invoice PDFs where each invoice starts with a known header.
Page labels (Roman numerals, alphabetic, custom prefixes) are recalculated for each split output so that internal label sequences remain correct. A document split from pages 5-10 of an original with Roman-numeral front matter gets its own consistent label range.
Named destinations that land within a split segment are preserved in that segment. Cross-segment named destination links are removed cleanly rather than pointing to missing pages.
Process thousands of PDFs in parallel using Rayon. Each split operation is stateless. Typical throughput is 200-400 documents per second for simple page-range splits on a 4-core machine.