Power Query Challenge 6

Are you ready for Power Query Challenge 6?  In this challenge we'll look at splitting nested data sets into a nice table in First Normal Form.

Let's take a look at Power Query Challenge 6:

Where did the challenge come from?

The inspiration for this challenge came from last week's Self Service BI Boot Camp that we hosted in Vancouver, BC (you should have been there, it was awesome).  At the outset, we talked about how to identify if your data is in First Normal Form (the format that is ideal for a PivotTable), and I showed this data set:

Data table with nested data sets

Data table with multiple data points per cell

Notice how the invoice has multiple ItemID, multiple Quantity and multiple Price fields per cell?  That's not good at all.

What we Really Need - Data in First Normal Form

As I explained in the Boot Camp, it is essential that the data source be in First Normal Form in order for a PivotTable to consume it.  What does that mean?  It means that it needs to look like this:

Data in 1NF

Data shown in First Normal Form (1NF)

Notice that in this case the data is atomic - it only has one data point per cell.  In addition, there is now one complete record per row.  The InvoiceID shows on every row now, and each data point has been correctly split up and applied to them.

So what's the thrust of the Power Query Challenge 6?

Well, as it turns out this is a bit tricky.  There are a few issues at play here:

  • The data is separated by line feeds within the cells
  • There are a different number of line feeds in each row
  • At least the number of line feeds is consistent for each cell in the entire row though!

So the challenge is this: break the table apart so that the data is in First Normal Form like the second image.

You can download the data source file (and post your solutions) in our forum here.  I'll give you my version tomorrow.