Tame Power Query Workbook Privacy Settings

I recently built a cool Excel solution at work that uses Power Query to reach out and grab some weather data, then upload it into database.  We use that data in our dashboards, as weather is pretty important to the golf industry in which I work.  But then I went to deploy the file, and needed to find a way to tame the Power Query Workbook Privacy settings.

The Issue

What happens is, every time a user runs a Power Query that was last saved by another user, they are prompted to set the Workbook’s Privacy level.  This is maddening, as we have two staff that use this workbook, where one covers for the other while they’re away.  Naturally, long lapses of time can occur in between… just long enough to forget what to do when you’re prompted by this frustrating message:

image

So while I can (and have) set the privacy level for the web data they are going to retrieve (see my tool here), I have no way to permanently set the Workbook’s Privacy level.  Worse is that, if the user clicks Cancel, or chooses the wrong privacy level, (even trying to protect the output table structure using Chris Webb’s technique here,) fails.  The table generates an error, and all business logic in the workbook is blown apart.  The only recourse is to exit the file without saving and try again.

Naturally, this concerns them, and they call to make sure they do the right thing.  That’s awesome, and I wouldn’t change that at all.  But the deeper underlying problem is that Power Query’s Workbook security is engineered to require developer interaction.  And that is bad news!

How to Tame Power Query Workbook Privacy Settings

Unfortunately I can’t use VBA to set this in the workbook (or at least, I haven’t tried, anyway), but I can work a little trick to at least warn my users when they are about to be prompted, and remind them which privacy level they need to select.  Here’s what I need to do in order to make that work:

Step 1: Create a range to hold the last user

  • Create a named range somewhere in the workbook called “rngLastUser”  (I put mine on the “Control Panel” worksheet and hid it.)
  • Drop the following code in a Standard VBA Module:

Public Sub UpdateLastUser()
With This
Workbook.Worksheets(“Control Panel”)
.Range(“rngLastUser”) = Application.UserName
End With
End Sub

Step 2: Create a macro to update your Power Queries

Public Sub GetWeather()
With ThisWorkbook
‘Check if privacy level will need to be set
If .Worksheets(“Control Panel”).Range(“rngLastUser”) <> Application.UserName Then
MsgBox “You are about to be prompted about Privacy Levels” & vbNewLine & _
“for the Current Workbook. When the message pops up, ” & vbNewLine & _
“you’ll see an option to ‘Select’ to the right side of the Current Workbook.” & vbNewLine & _
vbNewLine & _
“Please ensure you choose PUBLIC from the list.”, vbOKOnly + vbInformation, _
“Hold on a second…”
End If

‘Refresh the Power Query table
.Worksheets(“Weather”).ListObjects(“WeatherHistory”).QueryTable.Refresh BackgroundQuery:=True
End With
Call UpdateLastUser
End Sub

Step 3: Link the macro to a button for your users

  • Link my GetWeather() routine to a button
  • And we’re good!

What Did We Do?

So basically, what I did here was this:

  • Every time the user clicks the button…
  • Excel checks the contents of rngLastUser to see if the username is the same as the current user
    • If it is, it just goes on to refresh the table
    • If it’s not, it kicks up the following message:

image

    • After the user clicks OK (if necessary), then it prompts the user to set the security level. Yes, they can still get it wrong, but at least they have a chance now!
    • Once the security level is set, the macro goes on to refresh the table
  • After the table is refreshed, Excel updates the rngLastUser cell to the name of the current user.

And that’s it.  We now have a system that will prompt our users with the correct answer, so that they don’t have to come back and ask us every time.

Thoughts On The Security Model

Ideally it would be nice to not have to do this, and there is – in fact – a way.  Microsoft’s answer is “Yes, just enable Fast Combine.”  That’s great and all, but then it ignores all privacy levels and runs your query.  What if, as a developer, I need a way to ensure that what I’ve built stays Public, Private or Organizational?  What it if actually matters?

To me, Fast Combine is akin to setting Macro Security to Low in Excel 2003 and earlier.  Basically, we’re saying “I don’t like nagging messages, so let’s run around in a war zone with no bullet proof vest.”  Sure, you might come out alive, but why should you take that risk?

In my opinion, the Power Query security model needs work.  Even if we could assign a digital certificate to the query to prove it (and the privacy levels) had not been modified, that would be great.  I’m not sure exactly what it is we need, but we need something better than what we have today.

Refresh Power Query With VBA

When I’ve finished building a solution in Excel, I like to give it a little polish, and really make it easy for my users to update it.  The last thing I want to do is to send them right clicking and refreshing everywhere, so I program a button to refresh Power Query with VBA.

The interesting part about the above statement is that Power Query doesn’t have any VBA object model, so what kind of black magic trick do we need to leverage to pull that off?  As it turns out, it’s very simple… almost too simple in fact.

A Simple Query

Let’s just grab the sample data file from my post on pulling Excel named ranges into Power Query.  Once we’ve done that:

  • Click in the blue table
  • Go to Power Query –> From Table
  • Let’s sort Animal ascending (just so we know something happened)
  • Next save and Exit the query

At this point, we should get a new “Sheet2” worksheet, with our table on it:

SNAGHTML34cf68

 The Required VBA Code

Next, we need to build our VBA for refreshing the table.  Rather than record and tweak a macro, I’m just going to give you the code that will update all Query Tables in the entire workbook in one shot.  But to use it, you need to know the secret handshake:

  • Press Alt + F11

This will open the Visual Basic editor for you.  If you don’t see a folder tree at the left, then press CTRL+R to make it show up.

  • Find your project in the list (It should be called “”VBA Project (Selecting Data.xlsx)”
  • Right click that name and choose “Insert Module”
  • In the window that pops up, paste in the following code:

Public Sub UpdatePowerQueries()
‘ Macro to update my Power Query script(s)

Dim cn As WorkbookConnection

For Each cn In ThisWorkbook.Connections
If Left(cn, 13) = “Power Query -” Then cn.Refresh
Next cn
End Sub

Now, I’ll admit that I find this a little looser than I generally like.  By default, all Power Query scripts create a new connection with the name “Power Query –“ the name of your query.  I’d prefer to check the type of query, but this will work.

Speaking of working, let’s prove it…  But first, close the Visual Basic Editor.

Proving The Refresh Works

The easiest way to do this is to go back to the table on Sheet 1 and add a new row to the table.  I’m going to do that first, then I’m going to:

  • Press Alt + F8
  • Choose “UpdatePowerQueries”
  • Click Run
  • Go back to Sheet2 to verify it’s updated

If all goes well,  you should now have another row of data in your table, as I do:

image

Adding Polish

Let’s face it, that’s probably harder than going to Data –> Refresh All.  The goal here was to make it easier for my users.  So let’s do that now.

  • Return to Sheet 1
  • Go to the Developer Tab (if you don’t see it, right click the ribbon, choose “Customize Ribbon” and check the box next to the Developer tab to expose it)
  • Click Insert and select the button in the top left

image

  • Left click and drag a button onto your worksheet

When you let go, you’ll be prompted to assign a macro.

  • Choose “UpdatePowerQueries” and click OK
  • While the dots are still on the corners, click in the text
  • Backspace it out and replace it with something helpful like “Update Queries” (if you click elsewhere, you’ll need to right click the button to get the selection handles back.)
  • Click in the worksheet to de-select the button

SNAGHTML57873a

That’s it.  Test it again by adding some more data to the table then clicking the button.

Ramifications

I write a lot of VBA to make my users lives easier, and generally use this kind of technique as part of a bigger goal.  But regardless, it can still be useful as a stand alone routine if you want to avoid having to train users on how to do things through the ribbon.

Create Dynamic Table Headers With Power Query

In my last post, I looked at how to pull a named range into Power Query and turn it into a table of data.  Today we’re going to look at how to create dynamic table headers with Power Query, but using a slightly different, slightly more complicated way to do the same task.  Why?  Well, the reality is that sometimes the simple method won’t work for us.

Background

In this scenario, we’re going to use the same file I used in my last post, we’re just going to build the output differently.  The key that I’m after here is that I want to pull the data range into Power Query, but I want to use a table.  The issue, however, as I mentioned in my last post, is:

If I set up the table with headers in row 3, it would convert the dates to hard numbers, thereby blowing apart the ability to easily update the dynamic column headers next year.  That would defeat the purpose of making the input sheet dynamic in the first place.

I’ve really struggled with this feature in Tables, where it converts your headers to hard values.  So many of the tabular setups I create use dynamic headers, it’s actually more rare that they don’t.  So how do we work around this?  How do we create dynamic table headers in Excel?

Setting Up The Table For Success

It’s actually a LOT easier than you might think.  I use this very simple trick with both PivotTables and regular Tables, allowing me to take advantage of their power, but still control my headers and make them dynamic.  Here’s how:

Step 1: Add Static Headers Manually

  • Insert 2 rows below the current dynamic date headers in Row 3
  • Put some static headers in Row 5 that are generic but descriptive.  (In this case, CYM# means “Current Year, Month #”)

SNAGHTML1a520e81 The important part here is to make sure these static headers are each unique so that you can “unwind” them later with Power Query.

Step 2: Create The Table

Next, we need to create the table.  It’s going to cover A5:N24, and will therefore inherit the CYM column headers.  Since they are static values, it won’t make any changes to them, and my dynamic dates are still showing up top.

Step 3: Build a Translation Table

Huh?  A what?  Bear with me, as this will come clear a bit later.  Here’s how we do it:

  • Enter the following in B4:  =B3
  • Copy this across to cover B4:N4
  • Make sure B4:M4 is selected and go to Home—>Editing—>Find & Select –> Replace
  • Using the dialog, make the following two replacements:
    • Replace = with =$
    • Replace 3 with $3

This has the effect of making the formulas all absolute, which is important for our next step.

  • Select B4:N5 –> Right Click –> Copy
  • Select a cell down below your data range somewhere (I used A34)
  • Right click the cell and choose Paste Special
  • In the Paste Special options, choose the following:
    • Paste:  All
    • Check the Transpose box

Very cool, we’ve now got the beginnings of a table that is linked to the formulas in row 3.  We just need to finish it.

  • Add the column header of “Date” in A33
  • Add the column header of “Period” in B33
  • Format the data range from A33:B46 as a table
  • Rename the table to “DateTranslation”

SNAGHTML1a6575f9

Step 4: Final Header Cleanup

Row 4 has now served it’s purpose for us, so you can delete that, and then hide (the new) row 4.  The end result is that we have a header on our table that looks like it’s part of the table, but isn’t really.  The benefits here are that we can preserve the dynamic nature of it, and we have a wider variety of formatting options. SNAGHTML1a67304b We also have a completely separate date translation table too…

Setting Up The Power Query Scripts

So now we need to get the data into Power Query.  Let’s do that.

Step 1: Import and Reformat the Rounds Table

To do this we will:

  • Click somewhere in the main table
  • Power Query –> From Table
  • Remove the “TOTAL” column
  • Filter the first column to remove text that begins with “Total” and values that equal null
  • Filter the second column to remove null values
  • Right click the first column and Un-Pivot Other Columns
  • Rename the “Month” column to “Round Type”
  • Rename the query to “Budget”

At this point, you should have the following: SNAGHTML1afa967c This is great, but what about the Attribute column.  This is the whole big pain here about using a table, in that we don’t have dates.  Yes we could have hard coded them, but then it would be very painful to update our query when our year changes.  So while we have something flexible (in a way) here, it isn’t really all that readable.  How can we change that? Save and close the query, and let’s deal with this.

Step 2: Create Another Power Query

Let’s add the other table we built:

  • Click inside the DateTranslation table we created
  • Go to Power Query –> From Table
  • Click Close & Load –> Load To…
  • Click Only Create Connection –> Load

This will create a new Power Query that basically just reads your original table on demand.  It won’t refresh unless it’s called from another source.  Now we’re set to do something interesting.

Step 3: Re-Open The Budget Power Query

If the Workbook Queries pane isn’t open on the right, then go to Power Query –> Workbook Queries to show it.

  • Right click the Budget query and click Edit
  • On the Power Query Home tab, click Merge Queries (in the Combine group)
  • Select the Attribute column
  • From the drop down box, choose Date Translation
  • Select the Period column
  • Make sure “Only Include Matching Rows” is checked and click OK

image At this point you’ll get a new column of data in your Query.  Click the icon in the top right of the column to expand it: image Excellent… now we only need to pick the column we need, which is Date.  So uncheck the Period column and click OK. Finally we can remove the Attribute column and rename the “NewColumn.Date” column to Date and we’ve got a pretty clean query: SNAGHTML1b05acce At this point we could call it a day, as we’ve pretty much accomplished the original goal.  I can update the Year cell in B1 and my Table’s “Headers” will update.  In addition, my Power Query will show the correct values for the dates as well.  Pretty cool, as I could now link this into Power Pivot and set a relationship against a calendar table without having to worry about how it would be updated.

Going One Level Deeper

One thing I don’t like about this setup is the need for the extra query.  That just seems messy, and I’d prefer to see just one query in my workbook.  The issue though, is that I’m pulling data from two tables.  What’s cool though, is that with a little editing of the M code, I can fix that. Here’s the M for the query I’ve built, with a key line highlighted (twice): image As you can see, the “Merge” line is coloured yellow, but the name of the Query being merged is in orange.  Well guess what, we don’t need to reach to an external query here, we can reach to another named step in M.  Try this: Immediately after the “Let” line, enter the following:

TranslationTable = Excel.CurrentWorkbook(){[Name="DateTranslation"]}[Content],

Now, modify the “Merge” line to update the name of the table from “DateTranslation” to “TranslationTable”.  (The reason we’re doing this is that the original query still exists, so we can’t just name the first step “DateTranslation”, as it will conflict. Once we’ve made our modifications, the script will look as follows: image When you click “Done”, the query will reload and you’ll see an extra step in the “Applied Steps” box on the right.  What you won’t see though, are any changes, as the data comes out the same.  Very cool, as we are now referencing both tables in a single step.  To prove this out, save this query, drop back to Excel and delete the “DateTranslation” query.  It will still work! (The completed file can be downloaded here.)

Ending Thoughts

I really like this technique.  It let’s me dynamically change the column names, yet still use those to link them into my data model tables.  But even more I like the ability that, with a minor edit to the M code, I can keep my workbook from being littered with extra queries.  :)

Using Non-Contiguous Data Ranges in Power Query

One of the methods we use when building business solutions is creating specific input sheets for our models, as it separates our data from our business logic and ultimately our reporting layers.  This strategy is a key piece of building stable models, and is one of the fundamental things I teach when I’m leading modelling courses.  (For more information on having me at your site, click here.)  Since many data entry points consist of non-continguous ranges in Excel, it make sense to look at how using non-contiguous data ranges in Power Query can help us in our modelling.

To that end, in today’s blog post, I’m going to look at a technique to take a standard data entry worksheet and turn it into a data source, which can then be linked in to the model (or just used as the basis of reports.)  This is ideal, as we then only have one place to update our data for our solution.

Background:

What we’re going to start with is this:

image

A few key things you might want to know are:

  • You can download the sample file from this link if you want to follow along.
  • This is hypothetical budget data for golf rounds.
  • All the data entry cells are light green (I tell my users “Green means go” and they are now conditioned to stay away from anything that isn’t that light green colour.
  • The year in cell B1, when changed, will update all of the headers on the input sheet to the current year

Now, what I’d like to do is un-pivot this data into a useable table.  If I could do that, then I’d be able to use it in any of many methods, such as PivotTable, PivotCharts, charts or VLOOKUP solutions.

The initial setup

Your first temptation might be to layer a table over the data.  I’ve got a few reasons why I don’t want to do that:

  1. If I did layer a table over my input form it would lose the intuitive “enter data here” setup that I worked so hard to create.  Users would end up with a banded table that didn’t indicate where data should go.  That’s not ideal.
  2. If I set up the table with headers in row 3, it would convert the dates to hard numbers, thereby blowing apart the ability to easily update the dynamic column headers next year.  That would defeat the purpose of making the input sheet dynamic in the first place.
  3. If I set up the table with headers above that, I’d end up with a bunch of ugly Column1, Column2 headers.  I suppose I could hide them, but again, my table would blow apart my formatting as mentioned in 1.

So no… that’s not what I’m after.  So now what…?

There’s actually a few different ways to handle this.  I’ll look at some others in later blog posts, but for now let’s piggyback on the technique about using named ranges that I covered in my last post.

Creating the Named Range

The first thing I’m going to do is create a named range (not a Table) to cover the entire range that holds my budget data: A3:N22.  That includes the headers, all blank rows ,and all data right up to the totals on the right and bottom.  The reason for this is that, should I end up adding any new categories later by inserting rows or columns, it’s fairly likely that I’ll be doing it before the totals, so they’ll get picked up.  (It’s not quite the auto-expansion feature of a table, but it’s the best I’m going to get with a named range.)

So I created my named range and give it a sensible name like rngBudgetData.

Pulling the data into Power Query

Now let’s look at the easiest way to get our named range into Power Query.

If we click a single cell in our data range, then told Power Query to get the data from a Table, we’d be given a range for that expanded to cover only the contiguous block of data.  Because we’ve got blank rows in our data, that wouldn’t really work for us.  For example, assume we click B3, then tell Power Query to pull data From Table.  We’d be given the range A12:N15, as shown below.

image

Alternately, we could create a blank query, then type =Excel.CurrentWorkbook() in the formula bar, and choose our table, as outlined in the last blog post:

image

Okay, so that WOULD work.  As it turns out though, there is still yet another way to do this:

  • From the Name drop down in Excel, select the rngBudgetData named range

SNAGHTML3169d992

  • A3:N22 will now be selected, so go to Power Query –> From Table

And look at that, we’re now in Power Query using our named range!

The secret to this is, in order for Power Query to pull in the named range, the entire named range must be selected.  If that happens, Excel will grab it as the Power Query source.  If you’re using named ranges that cover contiguous blocks of data, and you click somewhere in that block of data, then Power Query very well may grab your entire named range.  So long as the auto selection boundaries line up with your named range you should be good.  But in the case of my data, those blank rows blow that apart, which is why I selected it from the name drop-down first.

(It is worth noting that if a named range and a table’s boundaries match exactly, the Table will be used as the Power Query source, not the named range.)

Manipulating the data in Power Query – Issue 1

Upon pulling the data into Power Query, we end up with a table like this:

SNAGHTML2d2a36c

Naturally, the first thing we want to do is promote the first row to headers, so that we can get to un-pivoting the data.  But look what happens when we do:

SNAGHTML2d3d005

What the heck?  The first column was renamed to Month (from the first row of data), but the remaining didn’t change! Not only that, but we lost our date time stamp.  That’s not good.

So it appears that Power Query can’t promote a date/time into a column header.  We’d better remove that step and try another approach.

Removing Irrelevant Data – Cut 1

Before I get into dealing with the harder stuff, I’m going to do a quick bit of cleanup here.  I’ve got some bank rows of data in my table, so I’m going to knock those off first.  To do that I:

  • Filter Column1 and uncheck (null) values

I’ve also got some header rows, which I can identify by the null values in Column2-Column14.  I can knock those off as follows:

  • Filter Column2 and uncheck (null) values

Easy enough so far.  I also don’t need the Total column, so I’ll scroll all the way to the right and:

  • Right click Column14
  • Choose Remove

Now I need to get rid of all the rows that contain totals (as I can always re-create those with a Pivot Table:

  • Filter Column1
  • Text Filters –> Does Not Contain –> “Total “ –> OK

Now, you’ll notice that I used Total with a space.  This is just to make sure that I don’t accidentally remove a row that contains a word which has total in it – like “totally”.  It’s always safer to make your text pattern as exact as possible.

After doing all of the above, I’m left with this:

SNAGHTML2e492f9

So the data is all clean, but I still can’t promote my headers so I can un-pivot this data.  Now what?

Busting Out Transpose

Transpose is quickly becoming one of my favourite functions.  Let’s hit it now:

SNAGHTML2e6791e

Hmm… originally I was going to convert my dates to text, Transpose it back, promote the text to headers, then un-pivot it.  But I don’t think I even need to do that at all.  Try this:

  • Transform –> Use First Row as Headers
  • Select the Month column
  • Transform –> Unpivot Columns –> Unpivot Other Columns

Bingo!

SNAGHTML2e8de09

Now just for the final cleanup:

  • Select the Month column –> Transform –> Data Type –> Date  (You need to do this, or Excel will return date serial numbers to the table, not dates formatted as dates.)
  • Rename the Attribute column to “Round Type”
  • Rename the Value column to “Amount”

Finally we can choose to Close and Load the Power Query and it will turn it into a nice Excel table.

Some Thoughts

As I was developing this blog post I found about 10 different ways through this process to end up at the same goal.  This is actually one of the things I enjoy about Power Query is the creativity that you can employ coming out to the same end result.

One feature I do hope to see eventually is the ability to filter for text that “starts with” or “does not start with”.  That would allow me to make my text searches even tighter than the “contains” portion I used here.  In one version I actually wrote an if function to do that:

=if Text.Start([Column1],5)=”Total’ then “remove” else “keep”

The loss of the months when originally promoting the header row shocked me a bit.  I would have expected this to be automatically converted to text and those values used.  Alas that’s not the case, so we have to do a bit more work.  Certainly not the end of the world, but not what I’d expect to see.

At the end of the day though, it’s nice to know that there is a way to get useful non-Table data into Power Query and turn it into something useful.  :)

Pulling Excel Named Ranges into Power Query

The comments of my last post collected a tip that I thought it was worth exploring on pulling Excel named ranges into Power Query.

I made the claim that Excel MUST have the data in an official Excel table.  As LoganEatsWorld pointed out, that’s actually not true any more.  If you’d like to give this a go, you can download this workbook to follow along.

What’s in the file?

The file is very basic.  It simply contains one table, and one named range of data:

image

The table is highlighted in the blue table style, and bears the name “Stats”.  The named range is surrounded by the black outline, and is called “Breeds”.

Connecting to Excel Data:

The reason I never found this is that my method was always to go to the Power Query tab and click –> From Table.  That will work great to get the data out of a table, but it won’t work for the named range.  So let’s try this a different way…

  • Go to Power Query –> From Other Sources –> Blank Query
  • Click in the formula bar and type the following:

= Excel.CurrentWorkbook()

(Yes, it’s case sensitive… I’m starting to reconcile myself to the fact that it’s Power Query so I’m just going to have to get over it.)

What ends up happening is a bit of magic:

image

Interesting… we have two tables listed!  The first is our official table, the second is our named range.  Cool!

Let’s click in the blank space to the right of the green “Table” text in the Breeds row:

image

The preview pops up and, sure enough, that’s our named range data:

image

Working With The Data

All right, let’s click the green Table text and break open that named range:

image

One notable difference here (in fact really the only one), is that Power Query doesn’t automatically recognize the header row.  This is due to the fact that an Excel table actually has a named header row to promote, where a named range does not.  No big deal though, as we can easily deal with that:

  • Go to Transform –> Use First Row As Headers

At this point, we could save the table to the worksheet or data model, as we need.

Observations

So this is cool.  It’s awesome that we can get to named ranges, as I have a LOT of workbooks that use these, and there are occasions where I don’t want to convert them to official Excel tables.  Despite the fact that we can, however, you pretty much need a secret decoder ring to find it, and that’s not so good.

It would sure be nice if there was a more discoverable way to pull in a named range… but where?

Suggested Accessibility Option 1

When I look at Power Query’s “Get External Data” function, it seems logical to me that it should end up somewhere in that area.  Looking at the group:

image

I kind of like the ability that comes with the “From Table” feature which works from the table you’re in (if you are), and lets you create a table if you’re not inside a table when you click that button.  But I wonder if it would be better served as a SplitButton/menu/submenu structure that offered the following options:

  • From Table
    • Current Table
    • Other Table
      • List of other tables in the workbook
    • Named Range
      • List of named ranges in the workbook
    • Create New Table
    • Create New Named Range

Actually, there is another change I would make to that group, and that’s to move the “From Blank Query” out of the “From Other Sources”, and give it it’s own button.  (I create a LOT of queries from scratch now, and it’s just extra clicks in my way to do so.)

Suggested Accessibility Option 2

I’m not sure this is so much of an alternate as something additional I’d like to see, actually.  An “additional sources” button on the Home tab would be awesome.  If that had the ability to pull up all the existing tables or named ranges in the workbook, and add them to the Power Query script as a “Source2=…”

I think the implications of this would be two fold:

  1. It would allow you to add a data source after creating a blank query, and/or
  2. It would allow you to add additional data sources into the same query.

The latter is certainly something I do semi frequently, as I don’t want to have multiple Power Queries created that are then merged together.  I’ll add both sources manually in the same query then merge them.

At any rate, just some thoughts.  If you have any on the subject please feel free to leave them in the comments.  :)

Un-pivoting With Subcategories in Power Query

My last few posts have been relatively technical, so this time I figured I’d look at something practical.  I can’t believe it’s been almost a year since I blogged about Un-Pivoting data in Power Query, so it’s about time we looked at that again… but this time with a twist.  This time we’ll look at un-pivoting with subcategories in Power Query.

Background

The data we’re going to start with looks like a typical financial report.  Whether a restaurant or a shoe store, in the manager’s office you’re liable to come up with a report that looks something along the lines of this:

SNAGHTMLf92afc

Now for the challenge… someone decides they need an alternate view of this data.  So how do you quickly un-pivot this into a format that you can use for other things?

If you want to follow along, you can download the file from my OneDrive.

Issue 1 – Getting the Data into Power Query

The first issue we come across is that, while Power Query can consume data from inside an Excel file, it MUST be formatted as a table.  But this hardly looks like it’s conducive to a table format with all those blank rows and such.  But what the heck… let’s apply a table to it anyway, and see what happens.

  • Click anywhere
  • Choose Power Query –> From Table
  • Adjust the range to cover all of the data (A4:H17)
  • Uncheck the box that indicates your table has headers

When you’re done, the box should look as follows:

image

And when you say OK, you should be taken into Power Query.

If you take a quick peek back at Excel, you can see that the table has indeed been applied, and that there are generic column headers above each column:

SNAGHTML100f05e

This is also reflected in Power Query:

image

Data Cleanup

Before we get into the trick of how to deal with subcategory columns, let’s clean up some of the garbage here.  Ideally what we’d like to get here is a nice pure table that we can easily un-pivot, just like we did in the prior article.

Cleanup Step 1:

Looking at the first column, we’ve got a bunch of null values in there, as well as some section headers.  What we really need is those section headers repeated on the lines below them.  So let’s make that happen.

  • Select Column1
  • Go to Transform –> Fill –> Down

You’ll see that the section headers are filled into any of the null areas. As soon as they encounter data however, they stop. (I’ve drawn a box around the Revenue lines below – notice how they fill until they reach Total Revenues, which then fill until they reach Expenses, and so on.)

image

Cleanup Step 2:

Now, we don’t really need any of the rows that are showing null values in Column3 through Column 8.  Let’s filter Column3 to remove those.  Click the drop down on Column3 and uncheck (null).  The result looks like this:

image

So… why didn’t I filter the null values out of Column2?  After all, there are blank data rows, and with a PivotTable we can recreate the subtotals…  The answer is that I’m not ready to lose the first two rows yet.  I need those in order to un-pivot my data. ;)

Issue 2 – Un-Pivoting the Data

If things were lined up perfectly, we could just select Column3 through Column 8 and un-pivot it now.  Unfortunately, if we do we’ll get some really wonky results.  (Go ahead and try it if you like. Remember to click the x to the left of the “Unpivoted Columns” step once you confirm you’ve made a disaster of it!)

image

Preparing to Un-Pivot

Okay, so what do we need to do… well, the first thing we need to do is fill the first row (containing April and May) across the columns.  Here’s the rub though… there is no Fill—>Across feature.  So how do we do it?

Transposing the Table

To an accountant, transposition is an evil word that means you made a mistake and flipped two digits around.  It’s nasty and something we never look forward to.  But to Power Query it’s simply awesome.  Check this out…

  • Go to Transform—>Transpose

This instantly flips the columns to rows and rows to columns!

SNAGHTML1147243

And would you look at that… April and May are in Column1 and below them… null values!  We know what to do with those now!

  • Select Column1
  • Go to Transform –> Fill –> Down

Is this enough though?  Nope, sorry.  You might be tempted to “un-transform” and then un-pivot it, but you’d still end up with garbage.  We still need to do a bit more.

Concatenating the Category and Subcategory

This is the trick to un-pivoting with subcategories: you need to concatenate them first, then un-pivot, then split them up.  So let’s get to it.  Using the tip from Merging Columns with Power Query, let’s join up Column1 and Column2.

  • Select Column1
  • Hold down CTRL or SHFT and select Column2
  • Click Transform –> Merge Columns
  • Choose the Colon for the separator (or whichever you prefer)

Note:  If you don’t have the Merge Columns feature, you’re using an old version of Power Query. Either download the latest version, or you’ll need to manually join the columns by:

  • Adding a new column
  • Using the formula =[Column1]&”:”&[Column2]
  • Delete Column1 and Column2
  • Move the new column to the first position

Once you’ve got it done, the output should look as follows:

SNAGHTML120c673

Un-transposing the Table

Awesome… we’ve got concatenated headers now.  We just need to flip the table back right side up and we’re almost ready to un-pivot it:

  • Go to Transform –> Transpose

Final Preparations

The very last thing we need to do before we un-pivot our table is provide some decent headers.  This will ensure that the data will make sense when it is un-pivoted.  To that end:

  • Go to Transform –> Use First Row as Headers
  • Rename the first column to “Class”
  • Rename the second column to “Category”
  • Filter “Category” to remove the null values

Our table now looks nice and clean:

image

And we’re ready!

Un-Pivot It!

We now follow the steps of a regular un-pivot operation, with only a minor extra step:

  • Select the Class column
  • Hold down CTRL or SHFT
  • Select the Category column
  • Go to Transform –> Unpivot Columns –> Unpivot Other Columns
  • Rename the “Value” column to “Amount”

And now the extra step:

  • Select the “Attribute” column
  • Go to Transform –> Split Column –> By Delimiter –> Colon
  • Rename the Attribute.1 column to “Month”
  • Rename the Attribute.2 column to “Measure”

The results:

SNAGHTML12d5ffa

That’s pretty much it.  The last thing I’d do is change the Query name from Table1 to something more intelligible… maybe Data or something… then load it to the worksheet.

From a Static Report to a Data Source

Now that we’ve got our report reformatted into a data source, we can click anywhere in the table and pivot it to our heart’s content!

A Quick Recap

To be fair, this post has been pretty long, but only because I included a LOT of pictures and detailed instructions.  Once you’ve got the process nailed down, it doesn’t take all that long at all.  Remember, the big key to this is:

  • Suck your report into a table (without headers)
  • Fill any rows you can
  • Transpose the data
  • Concatenate your category and subcategory fields together
  • Un-transpose it
  • Un-pivot it
  • Split the category and subcategory back into their pieces

Enjoy!  :)

Power Query’s Round Function

The other day I asked one of my co-workers how many ways he knew of to round a number.  His answer was one… if it ends in .4 it rounds down and if it ends in .5 it rounds up.  My guess is that most people would answer along similar lines, but that’s NOT how Power Query’s Round function actually works…

Interestingly though, there are a bunch of different ways to round, depending on your needs, and Excel has a bunch of functions to support them: ROUND, ROUNDUP, ROUNDDOWN, FLOOR, CEILING, EVEN, ODD, TRUNC, INT and MROUND.

Power Query also has a bunch of rounding formulas as well but, given that the function names can be somewhat different in Power Query (as we first saw here), I thought it might be interesting to see how to convert each of those functions from Excel to Power Query’s structure.

Background Setup

To start with, I created a very simple structure: a two column table with some random values in the “Value” column and then rounded them to 2 decimals using the formula =ROUND([@Value],2)  The output, after feeding it through Power Query, looks like this:

SNAGHTML20a5291b

The blue table on the left is the Excel table, and the green table on the right is the Power Query output.  (There is a completed example file available here example file available here.)

Creating the Round function

I love the ROUND function in Excel.  I use it constantly – rounding everything that uses multiplication or division – and pretty much have it burned into muscle memory.  So to me this was a logical place to start with Power Query.  Naturally, the syntax is just a bit different from Excel though:

Syntax
Excel =ROUND(number,digits)
Power Query =Number.Round(value, digits, roundingMode)

Hmm… we know that the Power Query function will be case sensitive.  In addition, it has an extra parameter.  The valid options are:

  • RoundingMode.Up
  • RoundingMode.Down
  • RoundingMode.AwayFromZero
  • RoundingMode.TowardZero
  • RoundingMode.ToEven

Let’s see what we can do with this.

I open up my query, select Add Column, and put in the formula as shown below:

image

Pretty easy, just Number.Round([Value],2).  In fact, it’s so similar to Excel it’s awesome!

So I click OK, save the query, and have a look at my results.  And that’s when I notice something isn’t quite right.  I’ve added some conditional formatting to the table below so you can see it easily:

image

This is the default?

Notice all the numbers that don’t match?  Can you spot the pattern?  It’s the oddest damn thing I’ve ever seen, to be honest, and I’ve never heard of anyone rounding in this way.

The default “RoundingMode” for Power Query is “Round to Even”.  What that means is that if there is a tie in the numbers it will round up or down to the closest even number.  So in the case of 1.645, it will round down to 1.64 as that is closer than 1.66.  (1.64 is .05 away from 1.645, where 1.66 is 0.15 away from 1.645).

I find this deeply disturbing.  I personally think that every user would expect Excel and Power Query’s default rounding methods to line up exactly, and this doesn’t.  How serious is this?  I’m not sure.  I think I’ll let someone from the scientific community ponder that.

Using RoundingMode.Up

Since the default plainly doesn’t work for us, it looks like it’s time to figure out which of the additional parameters we need.  Let’s try adding RoundingMode.Up to see if that will fix it.

I open Power Query again, and added a new custom column with the following formula:

=Number.Round([Value],2,RoundingMode.Up)

And the results are as follows:

image

Um… uh oh.  It seems to work above 0, but below is another matter.  That –5.245 is rounding down, not up! (Yes, from a technical perspective I am aware you can argue the words I used, but you get the idea.)

Using RoundingMode.Down

Now I’d be surprised if this came up with numbers consistent with the Excel formula, but let’s just check it for good measure.  The formula is:

=Number.Round([Value],2,RoundingMode.Down)

And the results:

image

So now numbers greater than 0 get rounded down, where numbers less than 0 are being rounded up (away from zero).

Let’s try another:

Using RoundingMode.AwayFromZero

Here’s our next option:

=Number.Round([Value],2,RoundingMode.AwayFromZero)

And these results are pleasing!

image

Look at that… we finally found the one that works!

Using RoundingMode.TowardZero

We’ve only got one other option we haven’t explored, so we might as well use it too, just for the sake of completeness:

=Number.Round([Value],2,RoundingMode.TowardZero)

For some reason, I’m incapable of typing TowardZero the first time I type this.  I always type TowardsZero and end up with an error!  At any rate, the results:

image

Thoughts

As a tool that is built for the Excel audience, I am having some real difficultly accepting the default parameter for this function in Power Query.  I HOPE that this is a bug, and not a design choice, although the documentation would suggest it is the latter.  If that’s the case, I think it’s a HUGE mistake.

Excel’s ROUND formula defaults to round away from zero.  Power Pivot’s DAX ROUND formula defaults to round away from zero.  VBA’s Application.Round function defaults to round away from zero. (As pointed out by Rory Archibald on Twitter, VBA’s Round function – without the application. prefix – does use banker’s rounding though.)

In my impression, if the Power Query formula holds the same name (at least after the Number. portion) it should return the same results as the Excel function.  In fact, I would venture to say that virtually every Excel pro would expect this.

My bigger concern would be that, with one of Power Query’s big selling features being it’s ability to re-shape and process large volumes of data, how quickly will a user realize that the Rounding function they thought they had is NOT working the way they expected?  Not good news at all.

I’m curious to hear your impressions.  Please leave a comment!

Want to see if for yourself?

Download the example file with all the formulas already in place.

Merging Columns with Power Query

The August update for Power Query was finally made available on Sept 1, and it has some pretty cool stuff in it.  In this week’s segment I thought we’d cover off one of the features that I’m most excited about as an Excel Pro: Merging columns with Power Query.

The old way

It’s been possible to merge two or more columns together in the past, but you had to write a formula to do it.  Honestly, it wasn’t a huge deal, but it still took a bit of know-how and work.  Assume, for example, we had this:

SNAGHTML702fae4

And our goal is to concatenate the Account and Dept columns together with a hyphen between them.  Here’s what you had to do:

  • Insert a New Column (the steps for this varied depending on the version of Power Query you are running.  Currently it is Add Column –> Add Custom Column)
  • When the prompt pops up you had to provide a formula like shown below:

image

Okay, so not a huge deal.  Just =[Column1] & “-“ & [Column2]

But you still had to write it.  I’ve lost count of how many people to whom I’ve taught the simple & shortcut for Excel formulas, but it’s enough to say that it’s probably not intuitive.

So it worked, but could it become easier?  We now know the answer is Hell Yeah!

The new way

This time we’ll do it differently…

  • Select the Account column
  • Hold down CTRL (or SHFT) and select the Dept column
  • On the Add Column tab, click Merge Columns

image

  • Choose your separator.  The default is –None- (meaning it will just mash them together), but other pre-defined options include Comma, Colon, Equals Sign, Semicolon, Space, Tab
  • What I want (a minus sign) isn’t there, so I’m going to choose –Custom–

image

  • Now I’ll enter a – (minus) sign and click OK

image

And that’s it!  My output comes together nicely:

SNAGHTML71194af

Now, to be fair, I still have to rename the column.  I do wish this interface had a way to name the column in advance (like exists when you create a custom column.)  Hopefully the PQ team will retrofit us with that ability at some point in the future.

In the mean time, we can either right click the column header and rename it there, or we can edit the column directly in the formula bar.  Just change the highlighted part shown below:

image

Like this:

image

So honestly, it’s not that much more efficient, why do I think this is cool?  Well, it’s not that much more efficient with 2 columns.  But try 4.  Or when you just need to put 4 columns back together with no spaces in between.  Then it starts to make life much easier.

Multi Condition Logic In Power Query

In my last post, we looked at creating an IF statement using Power Query.  This time we’re going to go a bit deeper and look at multi condition logic in Power Query, exploring a scenario where we need to choose between several options.

The base scenario

In the last post I talked about my billing code setup.  As a reminder, it’s a 10 digit code that is constructed as follows:

  • Char 1 – Alpha – Indicates the division (G = Golf, F = Fitness, M = Marina)
  • Char 2 – Alpha – Indicates the billing type (D = Dues, S = Pass, A = Annual Moorage, P = Periodic Moorage)
  • Char 3-4 – Numeric – Indicates the number of months of coverage for the product (1-12)
  • Char 5-6 – Numeric – Indicates the start month (and subsequent anniversary) for the customer’s product
  • Char 7-8 – Variable – Slip length (in feet) for a boat in the case of marina customers, or SG, CP or CS for golf (indicating single, couple primary or couple spouse)
  • Char 9 – Text – A variety of single letter codes indicating specific things we want to know. (Outlined below)
  • Char 10 – Text – Indicates the payment method (F = Financed, P = Paid up front, C = Comp/Honorary)

(Note that the sample data only includes records for Marina data)

Sample file

If you’d like to follow along, download the sample file here.

Multi condition logic – Using IF

So, building on my previous two posts (using text functions and creating IF statements), we could easily break the first character into pieces by nesting 2 IF tests together:

=if Text.Start([BillingCode],1)=”G” then “Golf” else “two options left”

=if Text.Start([BillingCode],1)=”F” then “Fitness” else “Marina”

into

if Text.Start([BillingCode],1)=”G” then “Golf” else if Text.Start([BillingCode],1)=”F” then “Fitness” else “Marina”

Not too hard really.  In fact, we can even build each IF statement separately, then just copy the second to replace the “two options left” part without making any other changes at all.  No parentheses or anything needed.

More than 3 options

But what if you have a whole bunch of options that you need to work with?  Let’s look at the 9th character in our billing code.  I haven’t given the details yet for that one, but here are the options:

E = Employee, S = Yacht Club, N = Non-Taxable, R = Restricted, I = Inactive, L = Social, M = Medical, U = Regular

Wow.  That’s a whole lot of possibilities, and would make for one monster nested IF statement.  That wouldn’t be a lot of fun to write, nor maintain.  So how would we deal with it?

In Excel proper, we would probably separate these options into a separate table, then use VLOOKUP to pull the appropriate value into the table.  So we just need a Power Query VLOOKUP function… except there isn’t one.

We actually have a couple of different methods to deal with this.  We could either:

  1. Split the first character into one column, create an Excel table with the first letter in column 1 and the appropriate match in column 2, then merge the two using Power Query’s merge function. (Maybe I’ll write a future post on it.)
  2. Build our own CHOOSE function inside Power Query (or SWITCH if you prefer Power Pivot’s DAX version.)  This is way more fun, so let’s do that.  :)

Building a CHOOSE function

This actually isn’t too hard once you know the basic structure.  It basically goes like this:

function_name = (input) => let
values = {
{result_1, return_value_1},
{input, “Undefined”}
},
Result = List.First(List.Select(values, each _{0}=input)){1}
in
Result,

The key parts to recognize here are:

  • We can change the “function_name” part to be whatever we want/need,
  • result_1 is the first of the possibilities that we may pass TO the function
  • return_value_1 is the value that we’d like to return if the first value is result_1
  • if we need more values, we just insert another comma after the value_1 section and put in a value_2 section
  • we can keep adding as many values as we need.
  • the “Undefined” value will return the text “Undefined” if the value you pass isn’t in your provided list of options (it’s the Else portion of the CHOOSE statement)

Using this structure, we could write a CHOOSE function for our scenario as follows:

fnChoose_CustCode = (input) => let
values = {
{“E”, “Employee”},
{“S”, “SCYC”},
{“N”, “Non-Taxable”},
{“R”, “Restricted”},
{“I”, “Inactive”},
{“L”, “Social”},
{“M”, “Medical”},blah
{“U”, “Regular”},
{input, “Undefined”}
},
Result = List.First(List.Select(values, each _{0}=input)){1}
in
Result,

Notice that I changed a couple of things:

  1. I gave the function a name so that I can recognize it, and also so that I can create more than one function with different names.  This one is fnChoose_CustCode.
  2. I created a list of all the options I needed.

Implementing the CHOOSE function

Okay, so now we have this, how do we use it?  Again, we’ve got two options.  I’ll focus on the other option at some other time, but for this scenario I want to build this right into an existing query.  So here’s how I do it.

First I created a brand new query that just pulls my table into Power Query, resulting in the following:

SNAGHTML76c03ee

Let’s go and inspect the code that pulls this in.  We need to click View –> Advanced Editor.  That will bring up the following code:

let
Source = Excel.CurrentWorkbook(){[Name="Customers"]}[Content]
in
Source

Not too complicated (yet).  Let’s paste in our code just before the Source = line:

let

fnChoose_CustCode = (input) => let
values = {
{“E”, “Employee”},
{“S”, “SCYC”},
{“N”, “Non-Taxable”},
{“R”, “Restricted”},
{“I”, “Inactive”},
{“L”, “Social”},
{“M”, “Medical”},blah
{“U”, “Regular”},
{input, “Undefined”}
},
Result = List.First(List.Select(values, each _{0}=input)){1}
in
Result,

Source = Excel.CurrentWorkbook(){[Name="Customers"]}[Content]
in
Source

Perfect.  And yet it doesn’t exactly look like much.  In fact, beyond adding a new line in the Steps section of the Editor, we don’t see any changes:

image

So what good did that do then?

As it turns out, we’ve only pasted in our function to make it available to the Power Query engine.  We haven’t actually told Power Query to do anything with it.  So why don’t we do that now?

Using our new CHOOSE function

You’re going to be amazed how easy this is…

First we’re going to add a new column (Add Column –> Add Custom Column).  When the dialog pops up, we’ll create a formula to return the letter we want to pass to the function:

=Text.Range([BillingCode],8,1)

And that gives us the following result (assuming we provided the column name of Status):

SNAGHTML29d7316

Cool stuff.  Now, let’s amp this up and use our function.  We’ll click the gear next to the Status step and wrap the existing formula with our function call.  (Don’t forget the extra  parenthesis needed at the end):

=fnChoose_CustCode(Text.Range([BillingCode],8,1))

Which gives us the following:

SNAGHTML2a06cbf

The end effect

You’ll find that all the sample codes in the data work just fine, and that nothing comes back as undefined.  If you’d like to see how the query reacts to different items, go back to the Customers table and try changing the second to last letter to something else.  When you refresh the table, you’ll find that it will evaluate the new character and return the appropriate result.

Caveat

It should be noted that the function as written above is case sensitive, meaning that a code of MP010450uP would return “Undefined”.  This is expected in my case, as valid codes are made up of upper case letters.

If I wanted to accept either case I would need to modify my Text.Range function to force it to upper case.  This would result in a function call that reads as follows:

=fnChoose_CustCode(Text.Upper(Text.Range([BillingCode],8,1)))

Which would work, as you can see here:

SNAGHTML2be7f6a

The IF Function In Power Query

In my last post I talked about useful text functions, and how they differed between Excel and Power Query.  Today we’re going to look at another compare/contrast scenario, but this time it’s going to be the IF function in Power Query.

Critical background

The only important thing we need to remember here is that all functions in Power Query, whether text, logic or anything else, are case sensitive.  That may strike you as weird in this one, but we need to remember that “if” is not the same as “IF”, and that Power Query will gag on the latter.

The base scenario

For this example I’m going to work with a table of data that holds a customer number, a boat type and a billing code schema.  While the data has been scrambled, this represents a real structure that we use in my day job.

There’s no real mystery to the first two items, but the billing code schema we designed holds a ton of info.  It’s always 10 characters long, and breaks down like this:

  • Char 1 – Alpha – Indicates the division (G = Golf, F = Fitness, M = Marina)
  • Char 2 – Alpha – Indicates the billing type (D = Dues, P = Pass, A = Annual Moorage, P = Periodic Moorage)
  • Char 3-4 – Numeric – Indicates the number of months of coverage for the product (1-12)
  • Char 5-6 – Numeric – Indicates the start month (and subsequent anniversary) for the customer’s product
  • Char 7-8 – Variable – Slip length (in feet) for a boat in the case of marina customers, or SG, CP or CS for golf (indicating single, couple primary or couple spouse)
  • Char 9 – Text – A variety of single letter codes indicating specific things we want to know. (Will factor in to a future post.)
  • Char 10 – Text – Indicates the payment method (F = Financed, P = Paid up front, C = Comp/Honorary)

So a table of customer data could look like this:

SNAGHTML153bc957

Turning data into more useful data

So great, we’ve got this awesome billing code schema, but it doesn’t really tell me anything when I look at it, as it’s too complicated to read.  I really need to break this into separate pieces, and make useful and readable columns out of it.  So that’s what I’m going to start doing now.

The first step is, of course, to click in the table and go to Power Query –> From Table.

My goal here is to make a column that says “Annual” if the second character is an “A”, or “Periodic” if the second character is a “P”.  To start, I’m just going to reach back to last week’s article and make sure I can identify which character I’m looking at.  So first I’ll click “Add Custom Column”.

I’ll call my new column “Seasonality”, and use a formula to extract just the 2nd character:

SNAGHTML16328557

And with that in place we can now focus in on the important data here:

SNAGHTML163368c2

Writing IF functions in Power Query

Assuming the data was in a table that started in row 2 of the worksheet, either of the following formulas would work to convert “A” to “Annual” or “P” to “Periodic”:

=IF([@Seasonality]=”A”,”Annual”,”Periodic”)
=IF(D2=”A”,”Annual”,”Periodic”)

Easy enough, right?  But look at how the signatures differ from Excel to Power Query:

Excel =IF(test, value_if_true, value_if_false)
Power Query =if test then value_if_true else value_if_false

Notice that there are no parenthesis or commas in the Power Query version, but you need to actually type out the “then” and “else” portions.  So to create the same thing in Power Query, we’d need a new column that uses the formula:

=if [Seasonality]=”A” then “Annual” else “Periodic”

Or, as is my preference, we modify the Seasonality column we already built, wrapping the text extraction with the IF function as follows:

=if Text.Range([BillingCode],1,1)=”A” then “Annual” else “Periodic”

Once we modify the original formula, our table now correctly shows the different values all the way down:

SNAGHTML1642027d[5]

Observations

Once again, I find this a bit of a departure from regular Excel formulas.  Although it’s not hard to make the transition once you understand it, it would still be nice if the language could leverage the skill set we’ve worked so hard to master.  You could argue that the verboseness of the Power Query IF function is easier to read, but it’s still inconsistent with the formulas we know and love.

I still feel it would be nice if we could have an alternate pointer into the same function so that I could type this in Power Query too:

=IF([Seasonality]=”A”,”Annual”,”Periodic”)

I think that would just make it so much easier to get off the ground running for Excel pros.

I’ll also point out that the error message Power Query gives you when you create an IF function or formula is not exactly helpful:

image

Most Excel pros aren’t going to understand what “Token Eof expected.” means, and I really have to question how it is telling me anything that I need to do to fix the formula.  Hopefully, in future versions of Power Query we get a more helpful message that says something like “It looks like you typed an upper case formula name.  Can I fix that for you?” (Maybe that will come with Intellisense and auto-complete…)

Taking this further

Next blog post we’ll look at how to take this a bit further… extending our conditional logic to look up a corresponding value in a list, avoiding having to nest several IF functions within each other.