Test Automation for PDF Files

Test Automation for PDF Files

For years, the automated verification of PDFs was incredibly challenging, if not impossible. Because of this, teams would automate their UI tests but would skip the part where they verify that their PDF artifacts were accurate. This then became the boring, mundane, error-prone task left for the testers to repeat release after release.

Since then, visual validation tools such as Applitools Eyes have hit the scene making the automated regression testing of the look and feel of an application possible. A common question I receive is “does Applitools work for PDFs?”. I knew the answer was yes, but I decided to actually give it a try myself to see exactly how it works.

There’s actually two ways to invoke PDF test automation. The one described in the tool’s tutorial page shows how to execute the PDF validation via the command line. However, as an automation engineer, I wondered if it was possible to do this via my existing automation framework. Since the magic happens via a Java CLI command, I was pretty sure it should also work from my code. But I wanted to try it out just to be sure. It worked!

I’ll detail both approaches.

Command Line Interface (CLI)

Applitools provides an executable, ImageTester.jar, which is a tool that verifies stand-alone images and also PDF files.

It’s pretty straightforward to use. You put your PDF files in a directory and run a command from your terminal to have this tool verify all files within that directly. Alternatively, you can specify an individual PDF filename and it will only verify that particular file.

The -k argument is your API key which you can obtain by opening a free account. And the -f  argument is the path to the directory or file that you want verified.

I moved the ImageTester.jar into a directory and also added another directory there called Invoice_PDFs where I stored this PDF file. I then ran the command and voila, the test was executed!


The first time I ran this, a baseline was saved, and then every time this was run again, the PDF was compared against the baseline. If anything changed on the PDFs, we’d get an error message on the console and link to review the differences in the Applitools dashboard.



The CLI approach is cool, but I got to thinking about how I would want to use this as an automation engineer. It would be in the midst of an automated scenario where I’ve taken action on the UI, am downloading the resulting PDF, and now want to verify it.

So, I wrote an automated test for Invoice Simple that uses the UI to create a new invoice, then downloads a PDF of that invoice and then uses the ImageTester to verify the PDF.

After writing all the UI code, I needed to add the following as well:

  1. Code to move the PDF file from my computer’s default download directory to the directory I store the PDF files that I want verified.

  2. Execute the ImageTester.jar command. I wrote a utility method so that I could reuse it from any test.
    Then from my test, I call this method and assert on the result.

  3. There is a date on the PDF file indicating when this file was generated. Well, that date will change each day that this test runs. Fortunately, Applitools has a way to ignore certain regions of the PDF file. After my initial test ran and the baseline was captured, I was able to go to the dashboard and specify to ignore the date area.

    I really like how flexible this tool is. There’s a host of other arguments you can use as well.

This all worked like a charm and was much simpler than I anticipated. You can find all of my code for this automated test for PDF files on my Github.

See Code on Github

Angie Jones
  • Jim Hazen

    You’re right Angie, a PDF file has been one of the toughest things to “test” with automation (and even by human means). There have been tools (with a CLI) in the past that will do comparison of two files and give a Pass/Fail error messages. But they are prone to false negative/positives as you point out due to date/timestamps and other things that are “sensitive data” that cause flakiness. Nice to see that Applitools Eyes can do a “reverse mask” of the region to remove it from the comparison. An old technique that has found new life once again. Also there is the risk of potential differences at the pixel level. Fortunately with some fuzzing logic that can be tuned to allow for useful partial matches within tolerance ranges. As we both know using Image comparison can be tricky, but with today’s technology (in comparison to 25 years ago) there can be uses for it that have benefit.

    January 27, 2019 at 1:17 pm Reply
    • Angie Jones

      yeah, Applitools doesn’t use pixel to pixel comparison fortunately, so those false negatives aren’t a prevalent issue. Image comparison has certainly come a long way. 🙂

      January 27, 2019 at 8:16 pm Reply
  • Puneet Bisht

    Thanks Angie for the post , for sure the PDF Automation was the most critical/challenging/ROI automation for our organization . In addition to verification of the PDF content there were few other areas in the ecosystem of PDF Automation , need the best practices to make the tool efficient(if using open source frameworks) :1) How to generate the new PDF efficiently (we use Karate, previously using our own in-house Java API framework )2) for PDF Compare (Free and serves our need)3)Reporting – we use Cucumber report and attach the pdf-Diff along with the credentials to replicate the issue4) Manage the Baselines

    January 28, 2019 at 7:56 am Reply
    • Angie Jones

      I just read another post today about using Rest-Assured for downloading files. Pretty cool

      I’m glad you found a PDF comparison tool that’s working for you. I am weary of the ones that do pixel to pixel comparison as it can lead to false negatives. Fortunately, Applitools doesn’t use pixel to pixel. They have a more mature AI-driven algorithm that is more reliable and also allows for more flexibility such as ignoring areas, colors or even comparing the layout of the form and not necessarily the data – for when you have PDFs that may vary in content but should be the same in structure.
      January 28, 2019 at 11:54 am Reply
    • Maik Toepfer

      Thanks for pointing me to pdfcompare. It follows a rather traditional, non-clound based approach but it looks like a very good 80% solution.

      December 27, 2019 at 2:09 pm Reply
  • Andy

    Gave this a whirl, as why not…looks like a great little tool. Sadly the corporate firewall didn’t like the direct connection of the CLI. I think I will need to give it another go from a scripted solution so that I can enforce a connection through a proxy.

    February 19, 2019 at 6:42 am Reply
  • Musaffir

    Thanks for the post AngieI work for a company which makes payroll software’s, so we produce a lot of PDFs from the app. Like payslips , year end reports etc. So far we haven’t considered PDF automation mainly due to all the challenges you have described. I explored applitools eyes and I found a real value in it. We just have to make a budget and purchase applitools eyes licence to get all these benefits 🙂

    April 11, 2019 at 8:02 pm Reply
  • Hiromi

    This is a fantastic tool. Our company has OCR and image processing product and I was looking for the way to automate the validation. Thank you, Angine for the post!

    July 26, 2019 at 7:15 am Reply
  • Sahil Thareja

    Nicely presented! I have one question here we need to have paid membership of applitools in order to perform this validation am i right?

    May 24, 2020 at 1:30 am Reply
    • Angie Jones

      There’s a free account as well

      May 24, 2020 at 8:59 am Reply
  • Ramkumar Gour

    Hi Angie, Thanks for this blog. This tool looks promising. One question – do we need to set the baseline always which mean we need to trigger java command with args two times ? Also, you said first time also it does some validations, what does it test when it doesnt have any thing to compare to ?

    January 30, 2023 at 5:59 am Reply

Post a Comment