Solutions

Text extraction. Done right.

PDF text extraction with 97.5% pass rate. Preserve layout, reading order, and font metadata.

Code example

rust
use pdfluent::{Sdk, extract::{TextOptions, text_with_layout()}};

let sdk = Sdk::init_with_license("license.json")?;
let doc = sdk.open("contract.pdf")?;

let opts = TextOptions::builder()
    .reading_order(text_with_layout()::Spatial)
    .include_font_info(true)
    .include_coordinates(true)
    .preserve_tables(true)
    .build();

let result = doc.text(opts)?;

for page in result.pages() {
    println!("\n=== Page {} ===", page.number());
    for block in page.blocks() {
        match block {
            TextBlock::Paragraph(p) => {
                println!("  [{}pt {}] {}", p.font_size(), p.font_name(), p.text());
            },
            TextBlock::Table(t) => {
                for row in t.rows() {
                    println!("  | {} |", row.cells().join(" | "));
                }
            },
            TextBlock::Heading(h) => {
                println!("  ## [{}] {}", h.level(), h.text());
            }
        }
    }
}

Run cargo add pdfluent@1.0.0-beta.5 to get started.

What it does

Full text extraction

Extract text with character-level accuracy. Preserve reading order, paragraph boundaries, and column layout for complex multi-column documents.

97.5% pass rate

Text extraction passes 97.5% of test cases from the PDF 1.7 specification corpus. Failures are reported with exact coordinates for debugging.

Layout preservation

Multi-column detection, table structure recognition, and list formatting. Extract text that makes sense in context, not just raw character streams.

Font and style metadata

Get font names, sizes, colors, and styles for each text run. Useful for document analysis, redaction detection, and content classification.

Coordinates and bounding boxes

Every text block includes x, y, width, height coordinates. Map extracted text back to the original PDF for highlighting or annotation.

Page structure tree

Get the complete page structure: paragraphs, headings, tables, lists, and inline elements. Structured output for indexing or transformation.

Deployment options

Server-side (Rust binary)Docker containerAWS LambdaAzure FunctionsWebAssembly (browser)

Frequently asked questions