Solutions

Split PDFs by page, bookmark, or pattern.

Extract pages, split at bookmarks, or divide by content pattern. Batch split thousands of documents with bookmark and label preservation.

Code example

rust
use pdfluent::{Sdk, split::{SplitOptions, SplitBy}};

fn main() -> pdfluent::Result<()> {
    let sdk = Sdk::new()?;
    let doc = sdk.open("annual_report.pdf")?;

    // Split at every level-1 bookmark boundary
    let opts = SplitOptions::builder()
        .by(SplitBy::BookmarkLevel(1))
        .preserve_child_bookmarks(true)
        .preserve_page_labels(true)
        .preserve_named_destinations(true)
        .build();

    let parts = doc.split(opts)?;

    for (i, part) in parts.iter().enumerate() {
        let path = format!("output/chapter_{:02}.pdf", i + 1);
        part.save(&path)?;
        println!(
            "Chapter {}: {} pages, {} bookmarks",
            i + 1,
            part.page_count(),
            part.bookmark_count()
        );
    }

    Ok(())
}

Run cargo add pdfluent@1.0.0-beta.5 to get started.

What it does

Page range extraction

Extract any page range by specifying start and end page numbers. Ranges can overlap and multiple ranges can be extracted in a single pass. Output documents are independent, fully valid PDFs.

Bookmark-level splitting

Split at level-1 or level-2 bookmark boundaries. Each output document starts at the bookmarked page and contains all content up to the next bookmark at the same level. Child bookmarks are preserved within each output.

Pattern-based splitting

Split on detected content patterns such as a page that matches a regex in its text layer, or a page that contains a specific image hash. Useful for splitting batched invoice PDFs where each invoice starts with a known header.

Page label preservation

Page labels (Roman numerals, alphabetic, custom prefixes) are recalculated for each split output so that internal label sequences remain correct. A document split from pages 5-10 of an original with Roman-numeral front matter gets its own consistent label range.

Named destination handling

Named destinations that land within a split segment are preserved in that segment. Cross-segment named destination links are removed cleanly rather than pointing to missing pages.

Batch splitting

Process thousands of PDFs in parallel using Rayon. Each split operation is stateless. Typical throughput is 200-400 documents per second for simple page-range splits on a 4-core machine.

Deployment options

Server-side (Linux/macOS/Windows)AWS LambdaDockerKubernetesCloudflare Workers (WASM)

Frequently asked questions