What coordinate system does PDFluent use?

PDF points with the origin at the bottom-left corner of the page. One point is 1/72 of an inch. The y axis increases upward, which is the opposite of most screen coordinate systems.

Can I get line-level bounding boxes instead of word-level?

Yes. Use text_with_layout() which returns TextLine values. Each line has a text field and a bounding box covering the entire line.

Does this work for rotated text?

Yes. PDFluent normalises the text matrix for each glyph before computing bounding boxes. Rotated text returns bounding boxes in the normalised coordinate space.

How do I search across all pages?

Iterate over doc.pages(), call extract_words() on each, and track the page index alongside each hit. There is no cross-page search built in, but building one with a simple loop is straightforward.

PDFluentSDK

← Editor Download

How-to guides/Text Extraction

Extract text with bounding box positions in Rust

Get each word or character with its x, y, width, and height on the page. Useful for building search, redaction, or document analysis tools.

rust

use pdfluent::PdfDocument;

fn main() -> pdfluent::Result<()> {
    let doc = PdfDocument::open("file.pdf")?;
    for block in doc.text_with_layout()? {
        println!("p{} {:?} {}", block.page, block.bbox, block.text);
    }
    Ok(())
}

Install:cargo add pdfluent@1.0.0-beta.17Download SDK →

Step by step

Open the document

Load the PDF.

rust

use pdfluent::prelude::*;

let doc = PdfDocument::open("document.pdf")?;

Call text_with_layout

Returns Vec<TextBlock> document-wide. Each TextBlock carries the text, its 1-based page number, and bounding-box coordinates in PDF points (bottom-left origin).

rust

let blocks = doc.text_with_layout()?;
println!("{} text blocks", blocks.len());

Access per-block fields

Read block.page, block.x, block.y, block.width, block.height, block.text.

rust

for block in doc.text_with_layout()? {
    if block.page == 1 {
        println!("[{:.1},{:.1}] {:?}", block.x, block.y, block.text);
    }
}

Notes and tips

Coordinates use the PDF coordinate system: origin at the bottom-left, y increases upward.
For screen rendering where y starts at the top, compute screen_y = page_height_pts - (word.y + word.height).
Word grouping is heuristic. Very close characters that share a text run are merged into one word entry.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions

Download PDFluent

Extract text with bounding box positions in Rust

Step by step

Open the document

Call text_with_layout

Access per-block fields

Notes and tips

Why PDFluent for this

Frequently asked questions

Related guides