Solutions

Text extraction. Done right.

PDF text extraction with 97.5% pass rate. Preserve layout, reading order, and font metadata.

Code example

rust

use pdfluent::PdfDocument;

fn main() -> pdfluent::Result<()> {
    let doc = PdfDocument::open("document.pdf")?;

    // Plain text
    println!("{}", doc.extract_text()?);

    // Positioned text blocks: text + bounding box + page
    for block in doc.text_with_layout()? {
        println!("p{} {:?} {}", block.page, block.bbox, block.text);
    }
    Ok(())
}

Run cargo add pdfluent@1.0.0-beta.17 to get started.

What it does

Full text extraction

Extract text with character-level accuracy. Preserve reading order, paragraph boundaries, and column layout for complex multi-column documents.

97.5% pass rate

Text extraction passes 97.5% of test cases from the PDF 1.7 specification corpus. Failures are reported with exact coordinates for debugging.

Layout preservation

Multi-column detection, table structure recognition, and list formatting. Extract text that makes sense in context, not just raw character streams.

Font and style metadata

Get font names, sizes, colors, and styles for each text run. Useful for document analysis, redaction detection, and content classification.

Coordinates and bounding boxes

Every text block includes x, y, width, height coordinates. Map extracted text back to the original PDF for highlighting or annotation.

Page structure tree

Get the complete page structure: paragraphs, headings, tables, lists, and inline elements. Structured output for indexing or transformation.

Deployment options

Server-side (Rust binary)Docker containerAWS LambdaAzure FunctionsWebAssembly (browser)

Frequently asked questions

Related how-to guides

Extract text from a PDF in Rust Extract text by page in Rust Extract text with position and bounding box in Rust

View pricing