Un-pivoting With Subcategories in Power Query

My last few posts have been relatively technical, so this time I figured I’d look at something practical.  I can’t believe it’s been almost a year since I blogged about Un-Pivoting data in Power Query, so it’s about time we looked at that again… but this time with a twist.  This time we’ll look at un-pivoting with subcategories in Power Query.

Background

The data we’re going to start with looks like a typical financial report.  Whether a restaurant or a shoe store, in the manager’s office you’re liable to come up with a report that looks something along the lines of this:

SNAGHTMLf92afc

Now for the challenge… someone decides they need an alternate view of this data.  So how do you quickly un-pivot this into a format that you can use for other things?

If you want to follow along, you can download the file from my OneDrive.

Issue 1 – Getting the Data into Power Query

The first issue we come across is that, while Power Query can consume data from inside an Excel file, it MUST be formatted as a table.  But this hardly looks like it’s conducive to a table format with all those blank rows and such.  But what the heck… let’s apply a table to it anyway, and see what happens.

  • Click anywhere
  • Choose Power Query –> From Table
  • Adjust the range to cover all of the data (A4:H17)
  • Uncheck the box that indicates your table has headers

When you’re done, the box should look as follows:

image

And when you say OK, you should be taken into Power Query.

If you take a quick peek back at Excel, you can see that the table has indeed been applied, and that there are generic column headers above each column:

SNAGHTML100f05e

This is also reflected in Power Query:

image

Data Cleanup

Before we get into the trick of how to deal with subcategory columns, let’s clean up some of the garbage here.  Ideally what we’d like to get here is a nice pure table that we can easily un-pivot, just like we did in the prior article.

Cleanup Step 1:

Looking at the first column, we’ve got a bunch of null values in there, as well as some section headers.  What we really need is those section headers repeated on the lines below them.  So let’s make that happen.

  • Select Column1
  • Go to Transform –> Fill –> Down

You’ll see that the section headers are filled into any of the null areas. As soon as they encounter data however, they stop. (I’ve drawn a box around the Revenue lines below – notice how they fill until they reach Total Revenues, which then fill until they reach Expenses, and so on.)

image

Cleanup Step 2:

Now, we don’t really need any of the rows that are showing null values in Column3 through Column 8.  Let’s filter Column3 to remove those.  Click the drop down on Column3 and uncheck (null).  The result looks like this:

image

So… why didn’t I filter the null values out of Column2?  After all, there are blank data rows, and with a PivotTable we can recreate the subtotals…  The answer is that I’m not ready to lose the first two rows yet.  I need those in order to un-pivot my data. 😉

Issue 2 – Un-Pivoting the Data

If things were lined up perfectly, we could just select Column3 through Column 8 and un-pivot it now.  Unfortunately, if we do we’ll get some really wonky results.  (Go ahead and try it if you like. Remember to click the x to the left of the “Unpivoted Columns” step once you confirm you’ve made a disaster of it!)

image

Preparing to Un-Pivot

Okay, so what do we need to do… well, the first thing we need to do is fill the first row (containing April and May) across the columns.  Here’s the rub though… there is no Fill—>Across feature.  So how do we do it?

Transposing the Table

To an accountant, transposition is an evil word that means you made a mistake and flipped two digits around.  It’s nasty and something we never look forward to.  But to Power Query it’s simply awesome.  Check this out…

  • Go to Transform—>Transpose

This instantly flips the columns to rows and rows to columns!

SNAGHTML1147243

And would you look at that… April and May are in Column1 and below them… null values!  We know what to do with those now!

  • Select Column1
  • Go to Transform –> Fill –> Down

Is this enough though?  Nope, sorry.  You might be tempted to “un-transform” and then un-pivot it, but you’d still end up with garbage.  We still need to do a bit more.

Concatenating the Category and Subcategory

This is the trick to un-pivoting with subcategories: you need to concatenate them first, then un-pivot, then split them up.  So let’s get to it.  Using the tip from Merging Columns with Power Query, let’s join up Column1 and Column2.

  • Select Column1
  • Hold down CTRL or SHFT and select Column2
  • Click Transform –> Merge Columns
  • Choose the Colon for the separator (or whichever you prefer)

Note:  If you don’t have the Merge Columns feature, you’re using an old version of Power Query. Either download the latest version, or you’ll need to manually join the columns by:

  • Adding a new column
  • Using the formula =[Column1]&”:”&[Column2]
  • Delete Column1 and Column2
  • Move the new column to the first position

Once you’ve got it done, the output should look as follows:

SNAGHTML120c673

Un-transposing the Table

Awesome… we’ve got concatenated headers now.  We just need to flip the table back right side up and we’re almost ready to un-pivot it:

  • Go to Transform –> Transpose

Final Preparations

The very last thing we need to do before we un-pivot our table is provide some decent headers.  This will ensure that the data will make sense when it is un-pivoted.  To that end:

  • Go to Transform –> Use First Row as Headers
  • Rename the first column to “Class”
  • Rename the second column to “Category”
  • Filter “Category” to remove the null values

Our table now looks nice and clean:

image

And we’re ready!

Un-Pivot It!

We now follow the steps of a regular un-pivot operation, with only a minor extra step:

  • Select the Class column
  • Hold down CTRL or SHFT
  • Select the Category column
  • Go to Transform –> Unpivot Columns –> Unpivot Other Columns
  • Rename the “Value” column to “Amount”

And now the extra step:

  • Select the “Attribute” column
  • Go to Transform –> Split Column –> By Delimiter –> Colon
  • Rename the Attribute.1 column to “Month”
  • Rename the Attribute.2 column to “Measure”

The results:

SNAGHTML12d5ffa

That’s pretty much it.  The last thing I’d do is change the Query name from Table1 to something more intelligible… maybe Data or something… then load it to the worksheet.

From a Static Report to a Data Source

Now that we’ve got our report reformatted into a data source, we can click anywhere in the table and pivot it to our heart’s content!

A Quick Recap

To be fair, this post has been pretty long, but only because I included a LOT of pictures and detailed instructions.  Once you’ve got the process nailed down, it doesn’t take all that long at all.  Remember, the big key to this is:

  • Suck your report into a table (without headers)
  • Fill any rows you can
  • Transpose the data
  • Concatenate your category and subcategory fields together
  • Un-transpose it
  • Un-pivot it
  • Split the category and subcategory back into their pieces

Enjoy!  🙂

Power Query’s Round Function

The other day I asked one of my co-workers how many ways he knew of to round a number.  His answer was one… if it ends in .4 it rounds down and if it ends in .5 it rounds up.  My guess is that most people would answer along similar lines, but that's NOT how Power Query's Round function actually works...

Interestingly though, there are a bunch of different ways to round, depending on your needs, and Excel has a bunch of functions to support them: ROUND, ROUNDUP, ROUNDDOWN, FLOOR, CEILING, EVEN, ODD, TRUNC, INT and MROUND.

Power Query also has a bunch of rounding formulas as well but, given that the function names can be somewhat different in Power Query (as we first saw here), I thought it might be interesting to see how to convert each of those functions from Excel to Power Query’s structure.

Background Setup

To start with, I created a very simple structure: a two column table with some random values in the “Value” column and then rounded them to 2 decimals using the formula =ROUND([@Value],2)  The output, after feeding it through Power Query, looks like this:

SNAGHTML20a5291b

The blue table on the left is the Excel table, and the green table on the right is the Power Query output.  (There is a completed example file available here example file available here.)

Creating the Round function

I love the ROUND function in Excel.  I use it constantly – rounding everything that uses multiplication or division – and pretty much have it burned into muscle memory.  So to me this was a logical place to start with Power Query.  Naturally, the syntax is just a bit different from Excel though:

Syntax
Excel =ROUND(number,digits)
Power Query =Number.Round(value, digits, roundingMode)

Hmm… we know that the Power Query function will be case sensitive.  In addition, it has an extra parameter.  The valid options are:

  • RoundingMode.Up
  • RoundingMode.Down
  • RoundingMode.AwayFromZero
  • RoundingMode.TowardZero
  • RoundingMode.ToEven

Let’s see what we can do with this.

I open up my query, select Add Column, and put in the formula as shown below:

image

Pretty easy, just Number.Round([Value],2).  In fact, it’s so similar to Excel it’s awesome!

So I click OK, save the query, and have a look at my results.  And that’s when I notice something isn’t quite right.  I’ve added some conditional formatting to the table below so you can see it easily:

image

This is the default?

Notice all the numbers that don’t match?  Can you spot the pattern?  It’s the oddest damn thing I’ve ever seen, to be honest, and I’ve never heard of anyone rounding in this way.

The default “RoundingMode” for Power Query is “Round to Even”.  What that means is that if there is a tie in the numbers it will round up or down to the closest even number.  So in the case of 1.645, it will round down to 1.64 as that is closer than 1.66.  (1.64 is .05 away from 1.645, where 1.66 is 0.15 away from 1.645).

I find this deeply disturbing.  I personally think that every user would expect Excel and Power Query’s default rounding methods to line up exactly, and this doesn’t.  How serious is this?  I’m not sure.  I think I’ll let someone from the scientific community ponder that.

Using RoundingMode.Up

Since the default plainly doesn’t work for us, it looks like it’s time to figure out which of the additional parameters we need.  Let’s try adding RoundingMode.Up to see if that will fix it.

I open Power Query again, and added a new custom column with the following formula:

=Number.Round([Value],2,RoundingMode.Up)

And the results are as follows:

image

Um… uh oh.  It seems to work above 0, but below is another matter.  That –5.245 is rounding down, not up! (Yes, from a technical perspective I am aware you can argue the words I used, but you get the idea.)

Using RoundingMode.Down

Now I’d be surprised if this came up with numbers consistent with the Excel formula, but let’s just check it for good measure.  The formula is:

=Number.Round([Value],2,RoundingMode.Down)

And the results:

image

So now numbers greater than 0 get rounded down, where numbers less than 0 are being rounded up (away from zero).

Let’s try another:

Using RoundingMode.AwayFromZero

Here’s our next option:

=Number.Round([Value],2,RoundingMode.AwayFromZero)

And these results are pleasing!

image

Look at that… we finally found the one that works!

Using RoundingMode.TowardZero

We’ve only got one other option we haven’t explored, so we might as well use it too, just for the sake of completeness:

=Number.Round([Value],2,RoundingMode.TowardZero)

For some reason, I’m incapable of typing TowardZero the first time I type this.  I always type TowardsZero and end up with an error!  At any rate, the results:

image

Thoughts

As a tool that is built for the Excel audience, I am having some real difficultly accepting the default parameter for this function in Power Query.  I HOPE that this is a bug, and not a design choice, although the documentation would suggest it is the latter.  If that’s the case, I think it’s a HUGE mistake.

Excel’s ROUND formula defaults to round away from zero.  Power Pivot’s DAX ROUND formula defaults to round away from zero.  VBA’s Application.Round function defaults to round away from zero. (As pointed out by Rory Archibald on Twitter, VBA's Round function - without the application. prefix - does use banker's rounding though.)

In my impression, if the Power Query formula holds the same name (at least after the Number. portion) it should return the same results as the Excel function.  In fact, I would venture to say that virtually every Excel pro would expect this.

My bigger concern would be that, with one of Power Query’s big selling features being it’s ability to re-shape and process large volumes of data, how quickly will a user realize that the Rounding function they thought they had is NOT working the way they expected?  Not good news at all.

I’m curious to hear your impressions.  Please leave a comment!

Want to see if for yourself?

Download the example file with all the formulas already in place.

Merging Columns with Power Query

The August update for Power Query was finally made available on Sept 1, and it has some pretty cool stuff in it.  In this week’s segment I thought we’d cover off one of the features that I’m most excited about as an Excel Pro: Merging columns with Power Query.

The old way

It’s been possible to merge two or more columns together in the past, but you had to write a formula to do it.  Honestly, it wasn’t a huge deal, but it still took a bit of know-how and work.  Assume, for example, we had this:

SNAGHTML702fae4

And our goal is to concatenate the Account and Dept columns together with a hyphen between them.  Here’s what you had to do:

  • Insert a New Column (the steps for this varied depending on the version of Power Query you are running.  Currently it is Add Column –> Add Custom Column)
  • When the prompt pops up you had to provide a formula like shown below:

image

Okay, so not a huge deal.  Just =[Column1] & “-“ & [Column2]

But you still had to write it.  I’ve lost count of how many people to whom I’ve taught the simple & shortcut for Excel formulas, but it’s enough to say that it’s probably not intuitive.

So it worked, but could it become easier?  We now know the answer is Hell Yeah!

The new way

This time we’ll do it differently…

  • Select the Account column
  • Hold down CTRL (or SHFT) and select the Dept column
  • On the Add Column tab, click Merge Columns

image

  • Choose your separator.  The default is –None- (meaning it will just mash them together), but other pre-defined options include Comma, Colon, Equals Sign, Semicolon, Space, Tab
  • What I want (a minus sign) isn’t there, so I’m going to choose --Custom--

image

  • Now I’ll enter a - (minus) sign and click OK

image

And that’s it!  My output comes together nicely:

SNAGHTML71194af

Now, to be fair, I still have to rename the column.  I do wish this interface had a way to name the column in advance (like exists when you create a custom column.)  Hopefully the PQ team will retrofit us with that ability at some point in the future.

In the mean time, we can either right click the column header and rename it there, or we can edit the column directly in the formula bar.  Just change the highlighted part shown below:

image

Like this:

image

So honestly, it’s not that much more efficient, why do I think this is cool?  Well, it’s not that much more efficient with 2 columns.  But try 4.  Or when you just need to put 4 columns back together with no spaces in between.  Then it starts to make life much easier.

Multi Condition Logic In Power Query

In my last post, we looked at creating an IF statement using Power Query.  This time we’re going to go a bit deeper and look at multi condition logic in Power Query, exploring a scenario where we need to choose between several options.

The base scenario

In the last post I talked about my billing code setup.  As a reminder, it’s a 10 digit code that is constructed as follows:

  • Char 1 – Alpha – Indicates the division (G = Golf, F = Fitness, M = Marina)
  • Char 2 – Alpha – Indicates the billing type (D = Dues, S = Pass, A = Annual Moorage, P = Periodic Moorage)
  • Char 3-4 – Numeric – Indicates the number of months of coverage for the product (1-12)
  • Char 5-6 – Numeric - Indicates the start month (and subsequent anniversary) for the customer’s product
  • Char 7-8 – Variable – Slip length (in feet) for a boat in the case of marina customers, or SG, CP or CS for golf (indicating single, couple primary or couple spouse)
  • Char 9 – Text – A variety of single letter codes indicating specific things we want to know. (Outlined below)
  • Char 10 – Text – Indicates the payment method (F = Financed, P = Paid up front, C = Comp/Honorary)

(Note that the sample data only includes records for Marina data)

Sample file

If you’d like to follow along, download the sample file here.

Multi condition logic – Using IF

So, building on my previous two posts (using text functions and creating IF statements), we could easily break the first character into pieces by nesting 2 IF tests together:

=if Text.Start([BillingCode],1)="G" then "Golf" else "two options left"

=if Text.Start([BillingCode],1)="F" then "Fitness" else "Marina"

into

if Text.Start([BillingCode],1)="G" then "Golf" else if Text.Start([BillingCode],1)="F" then "Fitness" else "Marina"

Not too hard really.  In fact, we can even build each IF statement separately, then just copy the second to replace the “two options left” part without making any other changes at all.  No parentheses or anything needed.

More than 3 options

But what if you have a whole bunch of options that you need to work with?  Let’s look at the 9th character in our billing code.  I haven’t given the details yet for that one, but here are the options:

E = Employee, S = Yacht Club, N = Non-Taxable, R = Restricted, I = Inactive, L = Social, M = Medical, U = Regular

Wow.  That’s a whole lot of possibilities, and would make for one monster nested IF statement.  That wouldn’t be a lot of fun to write, nor maintain.  So how would we deal with it?

In Excel proper, we would probably separate these options into a separate table, then use VLOOKUP to pull the appropriate value into the table.  So we just need a Power Query VLOOKUP function… except there isn’t one.

We actually have a couple of different methods to deal with this.  We could either:

  1. Split the first character into one column, create an Excel table with the first letter in column 1 and the appropriate match in column 2, then merge the two using Power Query’s merge function. (Maybe I’ll write a future post on it.)
  2. Build our own CHOOSE function inside Power Query (or SWITCH if you prefer Power Pivot’s DAX version.)  This is way more fun, so let’s do that.  🙂

Building a CHOOSE function

This actually isn’t too hard once you know the basic structure.  It basically goes like this:

function_name = (input) => let
values = {
{result_1, return_value_1},
{input, “Undefined”}
},
Result = List.First(List.Select(values, each _{0}=input)){1}
in
Result,

The key parts to recognize here are:

  • We can change the “function_name” part to be whatever we want/need,
  • result_1 is the first of the possibilities that we may pass TO the function
  • return_value_1 is the value that we’d like to return if the first value is result_1
  • if we need more values, we just insert another comma after the value_1 section and put in a value_2 section
  • we can keep adding as many values as we need.
  • the “Undefined” value will return the text “Undefined” if the value you pass isn’t in your provided list of options (it’s the Else portion of the CHOOSE statement)

Using this structure, we could write a CHOOSE function for our scenario as follows:

fnChoose_CustCode = (input) => let
values = {
{"E", "Employee"},
{"S", "SCYC"},
{"N", "Non-Taxable"},
{"R", "Restricted"},
{"I", "Inactive"},
{"L", "Social"},
{"M", "Medical"},
{"U", "Regular"},
{input, "Undefined"}
},
Result = List.First(List.Select(values, each _{0}=input)){1}
in
Result,

Notice that I changed a couple of things:

  1. I gave the function a name so that I can recognize it, and also so that I can create more than one function with different names.  This one is fnChoose_CustCode.
  2. I created a list of all the options I needed.

Implementing the CHOOSE function

Okay, so now we have this, how do we use it?  Again, we’ve got two options.  I’ll focus on the other option at some other time, but for this scenario I want to build this right into an existing query.  So here’s how I do it.

First I created a brand new query that just pulls my table into Power Query, resulting in the following:

SNAGHTML76c03ee

Let’s go and inspect the code that pulls this in.  We need to click View –> Advanced Editor.  That will bring up the following code:

let
Source = Excel.CurrentWorkbook(){[Name="Customers"]}[Content]
in
Source

Not too complicated (yet).  Let’s paste in our code just before the Source = line:

let

fnChoose_CustCode = (input) => let
values = {
{"E", "Employee"},
{"S", "SCYC"},
{"N", "Non-Taxable"},
{"R", "Restricted"},
{"I", "Inactive"},
{"L", "Social"},
{"M", "Medical"},blah
{"U", "Regular"},
{input, "Undefined"}
},
Result = List.First(List.Select(values, each _{0}=input)){1}
in
Result,

Source = Excel.CurrentWorkbook(){[Name="Customers"]}[Content]
in
Source

Perfect.  And yet it doesn’t exactly look like much.  In fact, beyond adding a new line in the Steps section of the Editor, we don’t see any changes:

image

So what good did that do then?

As it turns out, we’ve only pasted in our function to make it available to the Power Query engine.  We haven’t actually told Power Query to do anything with it.  So why don’t we do that now?

Using our new CHOOSE function

You’re going to be amazed how easy this is…

First we’re going to add a new column (Add Column –> Add Custom Column).  When the dialog pops up, we’ll create a formula to return the letter we want to pass to the function:

=Text.Range([BillingCode],8,1)

And that gives us the following result (assuming we provided the column name of Status):

SNAGHTML29d7316

Cool stuff.  Now, let’s amp this up and use our function.  We’ll click the gear next to the Status step and wrap the existing formula with our function call.  (Don’t forget the extra  parenthesis needed at the end):

=fnChoose_CustCode(Text.Range([BillingCode],8,1))

Which gives us the following:

SNAGHTML2a06cbf

The end effect

You’ll find that all the sample codes in the data work just fine, and that nothing comes back as undefined.  If you’d like to see how the query reacts to different items, go back to the Customers table and try changing the second to last letter to something else.  When you refresh the table, you’ll find that it will evaluate the new character and return the appropriate result.

Caveat

It should be noted that the function as written above is case sensitive, meaning that a code of MP010450uP would return “Undefined”.  This is expected in my case, as valid codes are made up of upper case letters.

If I wanted to accept either case I would need to modify my Text.Range function to force it to upper case.  This would result in a function call that reads as follows:

=fnChoose_CustCode(Text.Upper(Text.Range([BillingCode],8,1)))

Which would work, as you can see here:

SNAGHTML2be7f6a