Solutions for Power Query Challenge 6

This morning I logged in to check the solutions for Power Query Challenge 6 that were submitted and... Wow!  There were a bunch, and some cool variety there.  So now it's time to show you all what I came up with here.

What was Power Query Challenge 6?

The full description and source data can be found in yesterday's post, but in short it was to convert this:

Data table with multiple data points per cell

To this:

Data shown in First Normal Form (1NF)

So how do we do it?

Two Solutions for Power Query Challenge 6

Wait, two solutions?  Why?

As it turns out, I put together two for this one. My first attempt was cooked up to solve the issue on short notice.  Then I built another that seems a bit more elegant.  So I'll show them both, although quickly.

Solution 1 - Splitting Individual Columns and Merging Back Together

The first of my solutions for Power Query Challenge 6 is actually quite similar to what Ali posted in the solution thread.  It basically follows this method:

• Step 1:
• Create a Data query that connects to the source data, and load it as connection only
• Step 2:
• Create 3 new queries for ItemID, Quantity and Price which
• Reference the data query
• Keep the InvoiceID column and the other relevant column
• Split the relevant column by delimiter, ensuring it splits to rows (not columns as it defaults to)
• Step 3:
• Reference one of the Step 2 tables, and merge the other two tables to it, based on matching the Index column in each

So when complete the query chain looks like this:

And returns the table we're after:

The only real trick to this one is that - when you are building the Price query - the Price column will pick the decimal as the delimiter, so you have to force it to a line feed.  So building the Price query would go through the following steps:

• Right click the Data query --> Reference
• Select the InvoiceID and Price columns --> Right click --> Remove Other Columns
• Right click the Price column --> Split column --> By Delimiter
• Clear the decimal from the Custom area
• Click the arrow to open the Advanced area
• Change the selection to split to Rows
• Check "Split using special characters"
• Choose to insert a Line Feed character
• Click OK
• Set the Data Types

Resulting in this:

The ItemID and Quantity queries follow the same steps, except that Power Query now correctly identifies the Line Feed as the character to split on.

Solution 2 - Group and Split

While the first solution to Power Query Challenge 6 worked, it left me less than satisfied as it took a bunch of queries.  While effective, it didn't feel elegant.  So I cooked up another solution that uses Grouping.  It's actually quite similar to the first solution that Bill Szysz posted.

The basic method is as follows:

• Connect to the data
• Right click the InvoiceID column --> UnPivot Other Columns
• Right click the Value column --> Split Column --> By Delimiter --> OK

Following the steps above gets us to this state:

To unwind this, we group it:

• Go to Transform --> Group By
• Group By InvoiceID, Attribute
• Aggregate a "Data" column using the All Rows operation

At this point, we need to number these rows, so I just busted out the pattern to do that from our Power Query Recipe cards (recipe 50.125).

• Go to Add Column --> Custom
• Column Name:  Custom
• Formula:  =Table.AddIndexColumn( [Data], "Row", 1, 1)
• Right click the Custom column --> Remove Other Columns
• Expand all fields from the Custom column

Leaving us with this data:

The next step is to Pivot it:

• Select the Attribute column --> Transform --> Pivot Column
• Choose Value for the Values
• Change the Aggregation to "Don't Aggregate"
• Click OK
• Select the "Row" column and Remove it.  (Yes, it was needed to unpivot this correctly, but now adds no value.)
• Set the data types
• Load it to the desired destination

At this point, the query (again) looks perfect:

Now, I must admit, this felt far more professional and left me feeling good about it.

Which Solution to Power Query Challenge 6 is Better?

Naturally, solution 2 is better.  It takes less queries, and looks way cooler.  Right?  Not so fast...

The real question is in the performance.  And for this one I thought I'd test it.  But I needed more data.  I expanded the set to 11,000 rows and then used a tool we're working on to time the refreshes.  Privacy was left on, and all times shown are in seconds:

• Solution 1:  1.43, 1.48, 1.11, 1.27  Avg ~1.32 seconds
• Solution 2:  2.77, 2.65, 2.63, 2.68  Avg ~2.68 seconds

I'll be honest, this surprised me.  So I went back and added the GroupKind.Local parameter into the Grouped Rows step, like this (as that often speeds things up):

Table.Group(#"Changed Type1", {"InvoiceID", "Attribute"}, {{"Data", each _, type table [InvoiceID=number, Attribute=text, Value=number]}}, GroupKind.Local)

The revised timing for Solution 2 now gave me this:

• Solution 2A:  2.54, 2.49, 2.56, 2.61.  Avg ~2.55 seconds

So while the local grouping did have a small impact, the message became pretty clear here.  Splitting this into smaller chunks was actually way more efficient than building a more elegant "all in one" solution!

My solution (including 5,000 rows of the data), can be found in the solution thread here.

Power Query Challenge 6

Are you ready for Power Query Challenge 6?  In this challenge we'll look at splitting nested data sets into a nice table in First Normal Form.

Let's take a look at Power Query Challenge 6:

Where did the challenge come from?

The inspiration for this challenge came from last week's Self Service BI Boot Camp that we hosted in Vancouver, BC (you should have been there, it was awesome).  At the outset, we talked about how to identify if your data is in First Normal Form (the format that is ideal for a PivotTable), and I showed this data set:

Data table with multiple data points per cell

Notice how the invoice has multiple ItemID, multiple Quantity and multiple Price fields per cell?  That's not good at all.

What we Really Need - Data in First Normal Form

As I explained in the Boot Camp, it is essential that the data source be in First Normal Form in order for a PivotTable to consume it.  What does that mean?  It means that it needs to look like this:

Data shown in First Normal Form (1NF)

Notice that in this case the data is atomic - it only has one data point per cell.  In addition, there is now one complete record per row.  The InvoiceID shows on every row now, and each data point has been correctly split up and applied to them.

So what's the thrust of the Power Query Challenge 6?

Well, as it turns out this is a bit tricky.  There are a few issues at play here:

• The data is separated by line feeds within the cells
• There are a different number of line feeds in each row
• At least the number of line feeds is consistent for each cell in the entire row though!

So the challenge is this: break the table apart so that the data is in First Normal Form like the second image.

You can download the data source file (and post your solutions) in our forum here.  I'll give you my version tomorrow.