Top

Using Apache POI to Read Excel Files

Using Apache POI to Read Excel Files

Here is a recipe for programmatically reading data from an Excel spreadsheet using Apache POI. Note that in this example, I’m using the spreadsheet to store test data. However, this is NOT an endorsement to manage your test data this way. This is just one practical example of why you may want to programmatically read from a spreadsheet.

 

 

Recipe to Programmatically Read from Excel Spreadsheets

Ingredients

  • Apache POI
  • Excel spreadsheet

Instructions

  1. Open spreadsheet
  2. Switch to worksheet you want to read from
  3. Read header row to identify columns
  4. Get data from a specific cell
— Angie Jones 👩🏾‍🍳

 

Live Stream Video

 

 

To demonstrate this recipe in the context of test automation, we’ll use Excel to store test data, read the data from the spreadsheet using Apache POI, then supply the data to Selenium WebDriver for scenario execution.

 

Ingredients

 

1 Apache POI

 

We’ll need both the poi and poi-ooxml dependencies from mvnrepository. This is the library that enables us to read from the Excel spreadsheet.

 

2 Excel Spreadsheet

 

Inside of the Excel spreadsheet, I’ll make three worksheets: the first and last being blank, and the second one containing test data for this scenario. The purpose of having multiple worksheets is simply to demonstrate how to switch to different sheets within our code.

 

 

Instructions

 

1 Open Spreadsheet

 

In my  web scenario, I want to find a specific transaction.

To do so, I need to enter the account ID and the transaction ID. That data lives in our spreadsheet, so let’s open the file. To do so, we create a File object and pass it to WorkbookFactory.create(). The create() method throws an exception so we need to use a try/catch block.

Side note: Here I’m using the try with resources feature in Java. This is a nice, streamlined way to work with AutoCloseable resources. Learn more in my chapter on Exceptions in my free Java course.

 

2 Switch to Worksheet

 

Now that we’re in the worksheet, we need to go to the tab that we want to read from. For this scenario, we’re working with the user john, so we go to the “john” tab which is the second worksheet. To go to a specific worksheet, call the getSheet() method.

 

3 Read Header Row

 

I strongly recommend using the first row of your worksheet to name each of the columns. This will allow you to read the data more reliably, even if the order of the columns switch later on. Notice here on row 1, we have labeled each column. So as opposed to randomly trying to read cell A2 with the assumption that it’s an account ID, we can instead go to the Account column and then read the data.

 

To keep track of the columns, I’m going to read them in by calling getRow(0) to go to the first row [note: indices start at 0], and getting each cell in row 1 and storing the cell contents into a Map.  On line 7 of the code below, I get the data in the cell by calling getStringCellValue(). Note that this works because the data in the header row are all Strings. We’ll see how to handle other data types in a bit.

 

4 Get Data from Cell

 

Ok, remember, our goal here is to get the account ID and the transaction ID to pass into this web form. So, we need to identify which row we want to read from, and then get the data in the Account ID and Transaction ID columns.

Here I indicate that I’d like to read from the second row (index 1), and I specify the column I’d like to read from.

 

The getCellDataAsString() method here demonstrates how to read different data types from the spreadsheet. This is my least favorite thing about Apache POI because I have to account for all of the various data types, when I don’t really care about.

 

I’d like everything to be a String for my web automation purposes. So, this method uses the appropriate POI method to read the data but then turns it into a String.

Note: getNumericCellValue() on line 13, returns a double even if the number in the spreadsheet is an integer. This is to be able to handle any type of numeric value. But this poses a problem for us as the account ID is being returned as 12345.0. So, in the code, I cast it to an integer to remove the decimal place. Only do this if you’re sure the cell doesn’t really contain decimal places.

 

After entering the account ID and transaction ID into the web form using Selenium, we’re taken to this page which shows us the transaction details.

We can use the same approach to verify the data on the UI with what’s in the spreadsheet.

 

 

Bonus

I’d like to be able to reuse these POI methods for other tests without having to recreate this logic every time. So, I’m going to store the methods in a utility class.

SpreadsheetUtil.java

 

Test

 

See Code on Github

Angie Jones
6 Comments
  • Nevena
    Hi Angie,
    Quote: “The create() method throws an exception so we need to use a try/catch block.”This sentence made me curios – is there something like a general rule when we should include catching exceptions in our page object methods?
    May 2, 2020 at 9:10 am Reply
    • Angie Jones

      I don’t recommend catching exceptions in page object classes. let the test handle it.

      May 2, 2020 at 10:13 am Reply
      • Nevena
        Good to know, thanks!
        Any specific rule about when catching exceptions should be included in test methods?
        May 2, 2020 at 8:03 pm Reply
  • Hemant Varhekar
    Hi Angie,
    Thanks for article

    I am getting an error Switch stamens. It’s throwing switch statement is not supported in java version 12 and above. Not sure whether this is an issue with java version 13.02 which I have. But when I switched to old “Switch ” block it works

    May 3, 2020 at 8:08 am Reply
    • Angie Jones

      You have to select the option to enable previews (or use Java 14)

      May 6, 2020 at 6:16 pm Reply
  • Daniel
    how about the case of reading a new row for each test case that requires to read data from the excel
    July 2, 2020 at 6:27 am Reply

Post a Comment