Yesterday, Nick (one of our forum users) posted a Phone Words challenge in our forum, which looks like a great candidate for Power Query Challenge #7. I haven't had time to tackle it myself, but if you're up for a Friday challenge, why not give it a go?
Here's the Challenge (word for word):
Take any Phone # and convert it to all possible Letters aka Phone Words. Phone # can be input any format (strip out all other characters)
PS, if anyone else has challenges they'd like to post, I'm more than happy to set them up in this series. Obviously it's been a bit quiet here lately, for reasons that we hope to announce soon!
This morning I logged in to check the solutions for Power Query Challenge 6 that were submitted and... Wow! There were a bunch, and some cool variety there. So now it's time to show you all what I came up with here.
What was Power Query Challenge 6?
The full description and source data can be found in yesterday's post, but in short it was to convert this:
Data table with multiple data points per cell
To this:
Data shown in First Normal Form (1NF)
So how do we do it?
Two Solutions for Power Query Challenge 6
Wait, two solutions? Why?
As it turns out, I put together two for this one. My first attempt was cooked up to solve the issue on short notice. Then I built another that seems a bit more elegant. So I'll show them both, although quickly.
Solution 1 - Splitting Individual Columns and Merging Back Together
The first of my solutions for Power Query Challenge 6 is actually quite similar to what Ali posted in the solution thread. It basically follows this method:
Step 1:
Create a Data query that connects to the source data, and load it as connection only
Step 2:
Create 3 new queries for ItemID, Quantity and Price which
Reference the data query
Keep the InvoiceID column and the other relevant column
Split the relevant column by delimiter, ensuring it splits to rows (not columns as it defaults to)
Add an Index column
Step 3:
Reference one of the Step 2 tables, and merge the other two tables to it, based on matching the Index column in each
So when complete the query chain looks like this:
And returns the table we're after:
The only real trick to this one is that - when you are building the Price query - the Price column will pick the decimal as the delimiter, so you have to force it to a line feed. So building the Price query would go through the following steps:
Right click the Data query --> Reference
Select the InvoiceID and Price columns --> Right click --> Remove Other Columns
Right click the Price column --> Split column --> By Delimiter
Clear the decimal from the Custom area
Click the arrow to open the Advanced area
Change the selection to split to Rows
Check "Split using special characters"
Choose to insert a Line Feed character
Click OK
Set the Data Types
Go to Add Column --> Add Index Columns
Resulting in this:
The ItemID and Quantity queries follow the same steps, except that Power Query now correctly identifies the Line Feed as the character to split on.
Solution 2 - Group and Split
While the first solution to Power Query Challenge 6 worked, it left me less than satisfied as it took a bunch of queries. While effective, it didn't feel elegant. So I cooked up another solution that uses Grouping. It's actually quite similar to the first solution that Bill Szysz posted.
The basic method is as follows:
Connect to the data
Right click the InvoiceID column --> UnPivot Other Columns
Right click the Value column --> Split Column --> By Delimiter --> OK
Following the steps above gets us to this state:
To unwind this, we group it:
Go to Transform --> Group By
Group By InvoiceID, Attribute
Aggregate a "Data" column using the All Rows operation
At this point, we need to number these rows, so I just busted out the pattern to do that from our Power Query Recipe cards (recipe 50.125).
Right click the Custom column --> Remove Other Columns
Expand all fields from the Custom column
Leaving us with this data:
The next step is to Pivot it:
Select the Attribute column --> Transform --> Pivot Column
Choose Value for the Values
Expand the Advanced arrow
Change the Aggregation to "Don't Aggregate"
Click OK
Select the "Row" column and Remove it. (Yes, it was needed to unpivot this correctly, but now adds no value.)
Set the data types
Load it to the desired destination
At this point, the query (again) looks perfect:
Now, I must admit, this felt far more professional and left me feeling good about it.
Which Solution to Power Query Challenge 6 is Better?
Naturally, solution 2 is better. It takes less queries, and looks way cooler. Right? Not so fast...
The real question is in the performance. And for this one I thought I'd test it. But I needed more data. I expanded the set to 11,000 rows and then used a tool we're working on to time the refreshes. Privacy was left on, and all times shown are in seconds:
I'll be honest, this surprised me. So I went back and added the GroupKind.Local parameter into the Grouped Rows step, like this (as that often speeds things up):
Table.Group(#"Changed Type1", {"InvoiceID", "Attribute"}, {{"Data", each _, type table [InvoiceID=number, Attribute=text, Value=number]}}, GroupKind.Local)
The revised timing for Solution 2 now gave me this:
So while the local grouping did have a small impact, the message became pretty clear here. Splitting this into smaller chunks was actually way more efficient than building a more elegant "all in one" solution!
Are you ready for Power Query Challenge 6? In this challenge we'll look at splitting nested data sets into a nice table in First Normal Form.
Let's take a look at Power Query Challenge 6:
Where did the challenge come from?
The inspiration for this challenge came from last week's Self Service BI Boot Camp that we hosted in Vancouver, BC (you should have been there, it was awesome). At the outset, we talked about how to identify if your data is in First Normal Form (the format that is ideal for a PivotTable), and I showed this data set:
Data table with multiple data points per cell
Notice how the invoice has multiple ItemID, multiple Quantity and multiple Price fields per cell? That's not good at all.
What we Really Need - Data in First Normal Form
As I explained in the Boot Camp, it is essential that the data source be in First Normal Form in order for a PivotTable to consume it. What does that mean? It means that it needs to look like this:
Data shown in First Normal Form (1NF)
Notice that in this case the data is atomic - it only has one data point per cell. In addition, there is now one complete record per row. The InvoiceID shows on every row now, and each data point has been correctly split up and applied to them.
So what's the thrust of the Power Query Challenge 6?
Well, as it turns out this is a bit tricky. There are a few issues at play here:
The data is separated by line feeds within the cells
There are a different number of line feeds in each row
At least the number of line feeds is consistent for each cell in the entire row though!
So the challenge is this: break the table apart so that the data is in First Normal Form like the second image.
You can download the data source file (and post your solutions) in our forum here. I'll give you my version tomorrow.
Yesterday, I posted a new Power Query Challenge, and in this post I'm going to show my solution to challenge 4. You can pick up this solution as well as solutions created by the community in this thread of the Excelguru Forum. And as a quick note - the very first answer posted there is much slicker than what I've written up here... but hopefully some tricks here will still help you up your Power Query game. 😉
Background on the Solution to Challenge 4
The original issue was to create a header from different rows in the data. You can read the full reason for this in the original blog post, but basically put, I needed to convert this:
To this:
On a dynamic basis so that I could easily repoint the data set to this one:
And return this:
The major wrinkles in creating a Solution to Challenge 4
The biggest issues I had to deal with were these:
I couldn't just promote row 1 to headers and rename the first column. Why? Because Power Query would have hard coded "admin" in the first set, which would have triggered an error when pointed to Sales
I couldn't just rename the annual columns. Why? Because the years change, so Column2 is 2015 in one data set and 2016 in the other
After promoting the header row, I couldn't declare data types. Why? Because the column names get hard coded, meaning that the code would trigger an error when it couldn't find 2015 or 2018 in the data sets.
Fixing these on a static basis is easy, it's wanting it to be dynamic that is the issue.
As described in the original post, I edited the MakeMyHeaders query in order to do the work. I then:
Demoted Headers (Home -> Use First Row as Headers -> Use Headers as First Row)
I then right clicked the "Changed Type" step in the Applied Steps area and renamed it to "AllData" (with no space)
This basically gives me an easy-to-come-back-to point of reference for later. And since I did not include a space, it's super easy to type. (If I left a space in there, it would be #"All Data" instead.)
Next, I needed to create my header row which involved:
Keeping the top 2 rows only (Home -> Keep Top Rows -> 2)
Right clicking and renaming the step "HeaderBase"
I then replaced the department name (Admin) with "Name". The trick was to make this dynamic, which involved a couple of steps.
Right click Column1 and do a replacement as follows:
Then, in the formula bar, remove the " characters around both HeaderBase[Column1]{1} and HeaderBase[Column1]{0}
So what was that all about? It's replacing the value in the 2nd row - row {1} - with the value from the first row - row {0}. And it's completely dynamic! (For reference, the reason I reduced this to only 2 rows was that if I tried this when all rows were showing, the departments on the data rows would be lost.)
I then went and removed the Top 1 row, leaving me with this:
So far so good. Now I just needed to add back the original data. To do that:
I went to Home -> Append and appended the MakeMyHeaders query (yes, I appended it to itself)
I then modified the formula from:
= Table.Combine({#"Removed Top Rows", #"Removed Top Rows"})
To
= Table.Combine({#"Removed Top Rows", AllData})
Which left this:
The final cleanup took a few more steps:
Promote First Row to Headers
Delete the automatically created Changed Type step (so we don't lock down the years in the code)
Remove Top Rows -> Top 2 Rows (to get ride of the rows I used to create the Header)
And we're done!
Proving that the Solution to Challenge 4 works
Naturally, loading this query to a table will show that it works. But to really prove it out:
Edit the MakeMyHeaders query
Select the Source step
Change the formula to =Sales
Go to Home -> Close & Load
You'll see that it updates nicely with the updated headers
Why the solution to Challenge 4 even matters
If you ever decide to combine Excel worksheets, and want to hold on to the worksheet name as well as the data, you'll need this technique!
It's time for Power Query Challenge 4! This one is a tricky little challenge with creating a header row - but from different rows in the data set.
The real world scenario driving Power Query Challenge 4
Have you ever tried to combine Excel files in a folder, and wanted to preserve the worksheet name along with the contents? If you have, you'll end up looking at data that follows this kind of pattern:
Notice that in step 1 we have the sheet name and a table with the contents. And step 2 shows what happens when we expand all columns from the table. So what's the issue?
This is the crux of Power Query Challenge 4… we need a header row that looks like this:
Easy right? Not so fast!
The value in the Name column will change for each file in the folder. In addition, the data in the columns may also have different names. So you can't hard code anything here…
Sample data for Power Query Challenge 4
Let's be honest, this isn't simple or it wouldn't be a challenge, but we're going to try and keep it simpler by focussing on just the header row issue. (We're going to skip the whole combine files stuff, and just use some pre-formatted Excel tables that exhibit the problem.)
To build and test your solution I'm providing a file with two different data tables (Admin and Sales) and a query called "MakeMyHeaders" that just refers to the Admin data set right now:
The data in the Admin table looks like this:
And Sales looks like this:
Solving Power Query Challenge 4
To solve this challenge, you should work in the MakeMyHeaders query, and convert the data so that it outputs the data shown here:
And, if you did it correctly, you should then be able to edit the MakeMyHeaders query, select the Source step and change the formula to =Sales. After loading, you should get the output shown here (without any errors):
Simple right?
Posting Your Solution
As mentioned in The Future of Power Query Challenges, we are no longer accepting submissions by email. And don't post your solution here either. (Comments are still welcome!) Instead, we are collecting answers on the Power Query Challenge 4 thread in our forum. Post your solution there, provide a short description of the approach you took, and have a look at other's submission there.
My solution will come out as a blog post tomorrow.