Skip to Content

PdfBox to unit test pdf files

If you generate a pdf file in your application, there is an easy way to unit test its content using pdfBox.

I tend to prefer iText to generate pdfs but pdfBox is easy enough to use to verify documents:

First, you need to import pdfBox in your pom.xml:

<dependency>
   <groupid>org.pdfbox</groupid>
   <artifactid>com.springsource.org.pdfbox</artifactid>
   <version>0.7.3</version>
   <scope>test</scope>
</dependency>

Here is a method to extract the whole text from a pdf:

private static String extractPdfText(byte[] pdfData) throws IOException {
   PDDocument pdfDocument = PDDocument.load(new ByteArrayInputStream(pdfData));
   try {
      return new PDFTextStripper().getText(pdfDocument);
   } finally {
      pdfDocument.close();
   }
}

This is useful to verify that the pdf you’ve generated contains a given piece of text (I use Fest Assert for assertions):

assertThat(extractPdfText(pdfData)).contains("a text").contains("another text");

Here is another piece of code (more obscure) to verify that a pdf file is signed:

private static boolean isSigned(byte[] pdfData) throws IOException {
  PDDocument pdfDocument = PDDocument.load(new ByteArrayInputStream(pdfData));
  try {
    COSDictionary trailer = pdfDocument.getDocument().getTrailer();
    COSDictionary root = (COSDictionary) trailer.getDictionaryObject(COSName.ROOT);
    COSDictionary acroForm = (COSDictionary) root.getDictionaryObject(COSName.getPDFName("AcroForm"));
    if (null != acroForm) {
      COSArray fields = (COSArray) acroForm.getDictionaryObject(COSName.getPDFName("Fields"));
      for (int i = 0; i < fields.size(); i++) {
        COSDictionary field = (COSDictionary) fields.getObject(i);
        String type = field.getNameAsString("FT");
        if ("Sig".equals(type)) {
          return true;
        }
      }
    }
  } finally {
    pdfDocument.close();
  }

  return false;
}
comments powered by Disqus