PdfBox to unit test pdf files
If you generate a pdf file in your application, there is an easy way to unit test its content using pdfBox.
I tend to prefer iText to generate pdfs but pdfBox is easy enough to use to verify documents:
First, you need to import pdfBox in your pom.xml:
<dependency>
<groupid>org.pdfbox</groupid>
<artifactid>com.springsource.org.pdfbox</artifactid>
<version>0.7.3</version>
<scope>test</scope>
</dependency>
Here is a method to extract the whole text from a pdf:
private static String extractPdfText(byte[] pdfData) throws IOException {
PDDocument pdfDocument = PDDocument.load(new ByteArrayInputStream(pdfData));
try {
return new PDFTextStripper().getText(pdfDocument);
} finally {
pdfDocument.close();
}
}
This is useful to verify that the pdf you’ve generated contains a given piece of text (I use Fest Assert for assertions):
assertThat(extractPdfText(pdfData)).contains("a text").contains("another text");
Here is another piece of code (more obscure) to verify that a pdf file is signed:
private static boolean isSigned(byte[] pdfData) throws IOException {
PDDocument pdfDocument = PDDocument.load(new ByteArrayInputStream(pdfData));
try {
COSDictionary trailer = pdfDocument.getDocument().getTrailer();
COSDictionary root = (COSDictionary) trailer.getDictionaryObject(COSName.ROOT);
COSDictionary acroForm = (COSDictionary) root.getDictionaryObject(COSName.getPDFName("AcroForm"));
if (null != acroForm) {
COSArray fields = (COSArray) acroForm.getDictionaryObject(COSName.getPDFName("Fields"));
for (int i = 0; i < fields.size(); i++) {
COSDictionary field = (COSDictionary) fields.getObject(i);
String type = field.getNameAsString("FT");
if ("Sig".equals(type)) {
return true;
}
}
}
} finally {
pdfDocument.close();
}
return false;
}