Read all text content from a PDF document. PDFluent preserves reading order and handles multi-column layouts, right-to-left scripts, and CID fonts.
use pdfluent::prelude::*;
fn main() -> Result<()> {
let doc = PdfDocument::open("document.pdf")?;
for page in doc.pages() {
let text = page.text()?;
println!("--- Page {} ---", page.number());
println!("{}", text);
}
Ok(())
}Load the PDF. Text extraction works page by page, so memory usage stays low even for large documents.
use pdfluent::prelude::*;
let doc = PdfDocument::open("contract.pdf")?;Access a page by its 1-based index and call text(). The method returns a plain String with words separated by spaces and paragraphs separated by newlines.
let page = doc.page(1)?;
let text = page.text()?;
println!("{}", text);Iterate over doc.pages() to process every page. Each call to text() is independent.
let full_text: String = doc
.pages()
.map(|p| p.text().unwrap_or_default())
.collect::<Vec<_>>()
.join("\n\n");Use doc.text_with_layout() to get a Vec<TextBlock> at the document level. Each block carries the text, the page number, and the bounding box in PDF points (bottom-left origin).
for block in doc.text_with_layout()? {
println!(
"[page {}] [{:.1},{:.1}] {:?}",
block.page, block.x, block.y, block.text,
);
}No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.
Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.
Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.