Migration guides/Apache PDFBox

Migrate from Apache PDFBox to PDFluent

A step-by-step guide for replacing Apache PDFBox with PDFluent. Covers dependency setup, document loading, text extraction, form filling, and saving.

Migrating from Apache PDFBox to PDFluent. Install with cargo add pdfluent@1.0.0-beta.5

Migration steps

1

Replace the dependency

Remove PDFBox from pom.xml or build.gradle and add pdfluent to Cargo.toml.

Apache PDFBox (before)
<!-- pom.xml -->
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>3.0.1</version>
</dependency>
PDFluent (after)
# Cargo.toml
[dependencies]
pdfluent = "0.9"
2

Open a document

PDFBox uses PDDocument.load() with a File or byte array. PDFluent uses Document::open which returns a Result.

Apache PDFBox (before)
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;

PDDocument doc = PDDocument.load(new File("contract.pdf"));
PDFluent (after)
use pdfluent::PdfDocument;

let doc = PdfDocument::open("contract.pdf")?;
3

Extract text

PDFBox requires a PDFTextStripper instance and produces a single string for the whole document. PDFluent extracts per-page.

Apache PDFBox (before)
import org.apache.pdfbox.text.PDFTextStripper;

PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(1);
stripper.setEndPage(1);
String text = stripper.getText(doc);
PDFluent (after)
let text = doc.page(1)?.text()?;

// All pages
for i in 0..doc.page_count() {
    let text = doc.page(i)?.text()?;
    println!("{}", text);
}
4

Fill AcroForm fields

PDFBox accesses fields through PDDocumentCatalog and PDAcroForm. PDFluent uses a direct acroform() handle.

Apache PDFBox (before)
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;

PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
acroForm.getField("first_name").setValue("Jane");
acroForm.getField("last_name").setValue("Smith");
acroForm.flatten();
PDFluent (after)
let mut form = doc.acroform()?;
form.set_field("first_name", "Jane")?;
form.set_field("last_name", "Smith")?;
form.flatten()?;
5

Save and close

PDFBox requires explicit close(). PDFluent drops the document when it goes out of scope; call save() to write.

Apache PDFBox (before)
doc.save("output.pdf");
doc.close(); // must call close() to release file handles
PDFluent (after)
doc.save("output.pdf")?;
// doc drops automatically at end of scope

Things to watch out for

  • !PDFBox page numbers are 1-indexed in PDFTextStripper but 0-indexed in PDPageTree. PDFluent always uses 0-indexed.
  • !PDFBox does not support XFA forms. If your PDFs use XFA, PDFluent handles them natively.
  • !PDFBox PDDocument must be closed explicitly or resource leaks occur. PDFluent documents drop cleanly with Rust ownership.

Frequently asked questions