Combine Multiple Worksheets Using Power Query

In last week’s post we looked at how to combine multiple files together using Power Query.  This week we’re going to stay within the same workbook, and combine multiple worksheets using Power Query.

Background Scenario

Let’s consider a case where the user has been creating a transactional history in an Excel file.  It is all structured as per the image below, but resides across multiple worksheets; one for each month:

image

As you can see, they’ve carefully named each sheet with the month and year.  But unfortunately, they haven’t formatted any of the data using Excel tables.

Now the file lands in our hands (you can download a copy here if you’d like to follow along,) and we’d like to turn this into one consolidated table so that we can do some analysis on it.

Accessing Worksheets in Power Query

Naturally we’re going to reach to Power Query to do this, but how do we get started?  We could just go and format the data on each worksheet as a table, but what if there were hundreds?  That would take way too much work!

But so far we’ve only seen how to pull Tables, Named Ranges or files into Power Query.  How do we get at the worksheets?

Basically, we’re going to start with two lines of code:

  • Go to Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor

You’ll now see the following blank query:

let
Source = “”
in
Source

What we need to do is replace the second line (Source = “”) with the following two lines of code:

FullFilePath = “D:\Temp\Combine Worksheets.xlsx”,
Source = Excel.Workbook(File.Contents(FullFilePath))

Of course, you’ll want to update the path to the full file path for where the file is saved on your system.

Once you click Done, you should see the following:

image

Cool!  We’ve got a list of all the worksheets in the file!

Consolidating the Worksheets

The next step is to prep the fields we want to preserve as we combine the worksheets.  Obviously the Name and Item columns are redundant, so let’s do a bit of cleanup here.

  • Remove the Kind column
  • Select the Name column –> Transform –> Data Type –> Date
  • Select the Name column –> Transform –> Date –> Month –> End of Month
  • Rename the Name column to “Date”

At this point, the query should look like so:

image

Next we’ll click the little double headed arrow to the top right of the data column to expand our records, and commit to expanding all the columns offered:

SNAGHTML52dcfe5

Hmm… well that’s a bit irritating.  It looks like we’re going to need to promote the top row to headers, but that means we’re going to overwrite the Date column header in column 1.  Oh well, nothing to be done about it now, so:

  • Transform –> Use First Row As Headers –> Use First Row As Headers
  • Rename Column1 (the header won’t accept 1/31/2008 as a column name) to “Date” again
  • Rename the Jan 2008 column (far right) to “Original Worksheet”

Final Cleanup

We’re almost done, but let’s just do a bit of final cleanup here.  As we set the data types correctly, let’s also make sure that we remove any errors that might come up from invalid data types.

  • Select the Date column
  • Home –> Remove Errors
  • Set Account and Dept to Text
  • Set Amount to Decimal Number
  • Select the Amount column
  • Home –> Remove Errors
  • Set Original Worksheet to Text

Rename the query to “Consolidated”, and load it to a worksheet.

Something Odd

Before you do anything else, Save the File.

To be fair, our query has enough safe guards in it that we don’t actually have to do this, but I always like to play it safe.  Let’s review the completed query…

Edit the Consolidated query, and step into the Source line step.  Check out that preview pane:

image

Interesting… two more objects!  This makes sense, as we created a new table and worksheet when we retrieved this into a worksheet.  We need to filter those out.

Getting rid of the table is easy:

  • Select the drop down arrow on the Kind column
  • Uncheck “Table”, then confirm when asked if you’d like to insert a step

Select the next couple of steps as well, and take a look at the output as you do.

Aha!  When you hit the “ChangedType” step, something useful happens… we generate an error:

image

Let’s remove that error from the Name column.

  • Select the Name column –> Home –> Remove Errors

And we’re done.  We’ve managed to successfully combine all the data worksheets in our file into one big table!

Some Thoughts

This method creates a bit of a loop in that I’m essentially having to reach outside Excel to open a copy of the workbook to pull the sheet listing in.  And it causes issues for us, since Power Query only reads from the last save point of the external file we’re connecting to (in this case this very workbook.)  I’d way rather have an Excel.CurrentWorkbook() style method to read from inside the file, but unfortunately that method won’t let you read your worksheets.

It would also be super handy to have an Excel.CurrentWorkbookPath() method.  Hard coding the path here is a real challenge if you move the file.  I’ve asked Microsoft for this, but if you think it is a good idea as well, please leave a comment on the post.  (They’ll only count one vote from me, but they’ll count yours if you leave it here!)

Merge Multiple Files With Properties

This post illustrates a cool technique that I learned at the MVP summit last week, allowing us to use Power Query to merge multiple files with properties from the file in the output.  A specific example of where this is useful is where you have several files with transactional data, saved with the month as the file name, but no date records in the file itself.  What we’d want to do is merge all the contents, but also inject the filename into the records as well.

The funny thing about this technique is that it’s eluded me for a long time, mainly because I way over thought the methods needed to pull it off.  Once you know how, it’s actually ridiculously simple, and gives us some huge flexibility to do other things.  Let’s take a look at how it works.

If you’d like to download the files I’m using for this example, you can get them here.  You’ll find that there are 3 files in total: Jan 2008.csv, Feb 2008.csv and Mar 2008.csv.

Step 1: Create your query for one file

The first thing we need to do is connect to the Jan 2008.csv file and pull in it’s contents.  So let’s do that:

  • Power Query –> From File –> From CSV
  • Browse to the Jan 2008.csv file and import it
  • Rename the “Sum of Amount” column to “Amount”

Perfect, we now have a basic query:

SNAGHTML63ab03

Notice here how the January file has no dates in it?  That’s not good, so that’s what we’re trying to fix.

Step 2: Turn the query into a function

At this point we need to do a little code modification.  Let’s go into the Advanced editor:

  • View –> Advanced Editor

We need to do two things to the M code here:

  • Insert a line at the top that reads:  (filepath) =>
  • Replace the file path in the Source step (including the quotes) with:  filepath

At that point our M will read as follows:

image

We can now:

  • Click Done
  • Rename our Query to something like fnGetFileContents
  • Save and close the query

Power Query doesn’t do a lot for us yet, just giving us this in the Workbook Queries pane:

image

Step 3: List all files in the folder

Now we’re ready to make something happen.  Let’s create a new query…

  • Power Query –> From File –> From Folder
  • Browse to the folder that holds the CSV files
  • Remove all columns except the Name and Folder Path

Wait, what?  I removed the column with the binary content?  The column that holds the details of the files?  You bet I did!  You should now have a nice, concise list of files like this:

image

Next, we need to add a column to pull in our content, via our function.  So go to:

  • Add Column –> Add Custom Column
  • Enter the following formula:  =fnGetFileContents([Folder Path]&[Name])

Remember it is case sensitive, but when you get it right, some magic happens.  We get a bunch of “Table” objects in our new column… and those Table objects hold the contents of the files!

I know you’re eager to expand them, but let’s finish prepping the rest of the data first.

Step 4: Prep the rest of the data you want on the table rows

Ultimately, what I want to do here is convert the file name into the last day of the month.  In addition, I don’t need the Folder Path any more. So let’s take care of business here:

  • Remove the “Folder Path” column
  • Select the “Name” column –> Transform –> Replace Values –> “.csv” with nothing
  • Select the “Name” column –> Transform –> Date Type –> Date
  • Select the “Name” column –> Transform –> Date –> Month –> End of Month

And we’ve now got a pretty table with our dates all ready to go:

image

Step 5: Expand the table

The cool thing here is that, when we expand the table, each row of the table will inherit the appropriate value in the first column.  (So all rows of the table in row 1 will inherit 2/29/2008 as their date.)

  • Click the little icon to the top right of the Custom column
  • Click OK (leaving the default of expanding all columns)
  • Rename each of the resulting columns to remove the Custom. prefix

And that’s it!  You can save it and close it, and it’s good to go.

A little bit of thanks

I want to throw a shout out to Miguel Llopis and Faisal Mohamood from the Power Query team for demonstrating this technique for the MVP’s at the summit.  I’ve been chasing this for months, and for some reason tried to make it way more complicated than it needs to be.

What’s the next step?

The next logical step is to extend this to working with Excel files, consolidating multiple Excel files together; something we can’t do through the UI right now.  Watch this space, as that technique is coming soon!

Name Columns During a Merge

The timing for the release of the October Power Query update couldn’t really have been much better for me, especially since it’s got a cool new feature in that you can now name columns during a merge.  Why is this timing so cool?

The reason is that I’m at the MVP Summit in Redmond this week, an event where the MVP’s and Microsoft product teams get together to discuss the things that are/aren’t working in the products, and discuss potential new features.  The tie in here is that I already blogged about the new method for Merging Columns With Power Query that came into effect with the August update and, in that blog post, I said:

I do wish this interface had a way to name the column in advance (like exists when you create a custom column.)  Hopefully the PQ team will retrofit us with that ability at some point in the future.

The October Power Query update (released Oct 27, 2014), includes the ability to name columns during a merge, rather than making you do this in two steps.  How cool is that?

image

While I’m under no illusions that this change was made based on my feedback alone, I do know that the Power Query team reads my blog, so it played a part.  This is one of the big reasons I go to the summit every year, to share ideas that make the product more intuitive/usable, so it’s cool to see one of those changes make it into the product a few days before I arrive.

By the time this post is published on Wednesday morning, we’ll already be into our final day of the 2014 Summit.  I’m going in pretty jazzed though, as I know that the product teams are listening and reacting to our feedback.  :)

Tame Power Query Workbook Privacy Settings

I recently built a cool Excel solution at work that uses Power Query to reach out and grab some weather data, then upload it into database.  We use that data in our dashboards, as weather is pretty important to the golf industry in which I work.  But then I went to deploy the file, and needed to find a way to tame the Power Query Workbook Privacy settings.

The Issue

What happens is, every time a user runs a Power Query that was last saved by another user, they are prompted to set the Workbook’s Privacy level.  This is maddening, as we have two staff that use this workbook, where one covers for the other while they’re away.  Naturally, long lapses of time can occur in between… just long enough to forget what to do when you’re prompted by this frustrating message:

image

So while I can (and have) set the privacy level for the web data they are going to retrieve (see my tool here), I have no way to permanently set the Workbook’s Privacy level.  Worse is that, if the user clicks Cancel, or chooses the wrong privacy level, (even trying to protect the output table structure using Chris Webb’s technique here,) fails.  The table generates an error, and all business logic in the workbook is blown apart.  The only recourse is to exit the file without saving and try again.

Naturally, this concerns them, and they call to make sure they do the right thing.  That’s awesome, and I wouldn’t change that at all.  But the deeper underlying problem is that Power Query’s Workbook security is engineered to require developer interaction.  And that is bad news!

How to Tame Power Query Workbook Privacy Settings

Unfortunately I can’t use VBA to set this in the workbook (or at least, I haven’t tried, anyway), but I can work a little trick to at least warn my users when they are about to be prompted, and remind them which privacy level they need to select.  Here’s what I need to do in order to make that work:

Step 1: Create a range to hold the last user

  • Create a named range somewhere in the workbook called “rngLastUser”  (I put mine on the “Control Panel” worksheet and hid it.)
  • Drop the following code in a Standard VBA Module:

Public Sub UpdateLastUser()
With This
Workbook.Worksheets(“Control Panel”)
.Range(“rngLastUser”) = Application.UserName
End With
End Sub

Step 2: Create a macro to update your Power Queries

Public Sub GetWeather()
With ThisWorkbook
‘Check if privacy level will need to be set
If .Worksheets(“Control Panel”).Range(“rngLastUser”) <> Application.UserName Then
MsgBox “You are about to be prompted about Privacy Levels” & vbNewLine & _
“for the Current Workbook. When the message pops up, ” & vbNewLine & _
“you’ll see an option to ‘Select’ to the right side of the Current Workbook.” & vbNewLine & _
vbNewLine & _
“Please ensure you choose PUBLIC from the list.”, vbOKOnly + vbInformation, _
“Hold on a second…”
End If

‘Refresh the Power Query table
.Worksheets(“Weather”).ListObjects(“WeatherHistory”).QueryTable.Refresh BackgroundQuery:=True
End With
Call UpdateLastUser
End Sub

Step 3: Link the macro to a button for your users

  • Link my GetWeather() routine to a button
  • And we’re good!

What Did We Do?

So basically, what I did here was this:

  • Every time the user clicks the button…
  • Excel checks the contents of rngLastUser to see if the username is the same as the current user
    • If it is, it just goes on to refresh the table
    • If it’s not, it kicks up the following message:

image

    • After the user clicks OK (if necessary), then it prompts the user to set the security level. Yes, they can still get it wrong, but at least they have a chance now!
    • Once the security level is set, the macro goes on to refresh the table
  • After the table is refreshed, Excel updates the rngLastUser cell to the name of the current user.

And that’s it.  We now have a system that will prompt our users with the correct answer, so that they don’t have to come back and ask us every time.

Thoughts On The Security Model

Ideally it would be nice to not have to do this, and there is – in fact – a way.  Microsoft’s answer is “Yes, just enable Fast Combine.”  That’s great and all, but then it ignores all privacy levels and runs your query.  What if, as a developer, I need a way to ensure that what I’ve built stays Public, Private or Organizational?  What it if actually matters?

To me, Fast Combine is akin to setting Macro Security to Low in Excel 2003 and earlier.  Basically, we’re saying “I don’t like nagging messages, so let’s run around in a war zone with no bullet proof vest.”  Sure, you might come out alive, but why should you take that risk?

In my opinion, the Power Query security model needs work.  Even if we could assign a digital certificate to the query to prove it (and the privacy levels) had not been modified, that would be great.  I’m not sure exactly what it is we need, but we need something better than what we have today.

Create Dynamic Table Headers With Power Query

In my last post, I looked at how to pull a named range into Power Query and turn it into a table of data.  Today we’re going to look at how to create dynamic table headers with Power Query, but using a slightly different, slightly more complicated way to do the same task.  Why?  Well, the reality is that sometimes the simple method won’t work for us.

Background

In this scenario, we’re going to use the same file I used in my last post, we’re just going to build the output differently.  The key that I’m after here is that I want to pull the data range into Power Query, but I want to use a table.  The issue, however, as I mentioned in my last post, is:

If I set up the table with headers in row 3, it would convert the dates to hard numbers, thereby blowing apart the ability to easily update the dynamic column headers next year.  That would defeat the purpose of making the input sheet dynamic in the first place.

I’ve really struggled with this feature in Tables, where it converts your headers to hard values.  So many of the tabular setups I create use dynamic headers, it’s actually more rare that they don’t.  So how do we work around this?  How do we create dynamic table headers in Excel?

Setting Up The Table For Success

It’s actually a LOT easier than you might think.  I use this very simple trick with both PivotTables and regular Tables, allowing me to take advantage of their power, but still control my headers and make them dynamic.  Here’s how:

Step 1: Add Static Headers Manually

  • Insert 2 rows below the current dynamic date headers in Row 3
  • Put some static headers in Row 5 that are generic but descriptive.  (In this case, CYM# means “Current Year, Month #”)

SNAGHTML1a520e81 The important part here is to make sure these static headers are each unique so that you can “unwind” them later with Power Query.

Step 2: Create The Table

Next, we need to create the table.  It’s going to cover A5:N24, and will therefore inherit the CYM column headers.  Since they are static values, it won’t make any changes to them, and my dynamic dates are still showing up top.

Step 3: Build a Translation Table

Huh?  A what?  Bear with me, as this will come clear a bit later.  Here’s how we do it:

  • Enter the following in B4:  =B3
  • Copy this across to cover B4:N4
  • Make sure B4:M4 is selected and go to Home—>Editing—>Find & Select –> Replace
  • Using the dialog, make the following two replacements:
    • Replace = with =$
    • Replace 3 with $3

This has the effect of making the formulas all absolute, which is important for our next step.

  • Select B4:N5 –> Right Click –> Copy
  • Select a cell down below your data range somewhere (I used A34)
  • Right click the cell and choose Paste Special
  • In the Paste Special options, choose the following:
    • Paste:  All
    • Check the Transpose box

Very cool, we’ve now got the beginnings of a table that is linked to the formulas in row 3.  We just need to finish it.

  • Add the column header of “Date” in A33
  • Add the column header of “Period” in B33
  • Format the data range from A33:B46 as a table
  • Rename the table to “DateTranslation”

SNAGHTML1a6575f9

Step 4: Final Header Cleanup

Row 4 has now served it’s purpose for us, so you can delete that, and then hide (the new) row 4.  The end result is that we have a header on our table that looks like it’s part of the table, but isn’t really.  The benefits here are that we can preserve the dynamic nature of it, and we have a wider variety of formatting options. SNAGHTML1a67304b We also have a completely separate date translation table too…

Setting Up The Power Query Scripts

So now we need to get the data into Power Query.  Let’s do that.

Step 1: Import and Reformat the Rounds Table

To do this we will:

  • Click somewhere in the main table
  • Power Query –> From Table
  • Remove the “TOTAL” column
  • Filter the first column to remove text that begins with “Total” and values that equal null
  • Filter the second column to remove null values
  • Right click the first column and Un-Pivot Other Columns
  • Rename the “Month” column to “Round Type”
  • Rename the query to “Budget”

At this point, you should have the following: SNAGHTML1afa967c This is great, but what about the Attribute column.  This is the whole big pain here about using a table, in that we don’t have dates.  Yes we could have hard coded them, but then it would be very painful to update our query when our year changes.  So while we have something flexible (in a way) here, it isn’t really all that readable.  How can we change that? Save and close the query, and let’s deal with this.

Step 2: Create Another Power Query

Let’s add the other table we built:

  • Click inside the DateTranslation table we created
  • Go to Power Query –> From Table
  • Click Close & Load –> Load To…
  • Click Only Create Connection –> Load

This will create a new Power Query that basically just reads your original table on demand.  It won’t refresh unless it’s called from another source.  Now we’re set to do something interesting.

Step 3: Re-Open The Budget Power Query

If the Workbook Queries pane isn’t open on the right, then go to Power Query –> Workbook Queries to show it.

  • Right click the Budget query and click Edit
  • On the Power Query Home tab, click Merge Queries (in the Combine group)
  • Select the Attribute column
  • From the drop down box, choose Date Translation
  • Select the Period column
  • Make sure “Only Include Matching Rows” is checked and click OK

image At this point you’ll get a new column of data in your Query.  Click the icon in the top right of the column to expand it: image Excellent… now we only need to pick the column we need, which is Date.  So uncheck the Period column and click OK. Finally we can remove the Attribute column and rename the “NewColumn.Date” column to Date and we’ve got a pretty clean query: SNAGHTML1b05acce At this point we could call it a day, as we’ve pretty much accomplished the original goal.  I can update the Year cell in B1 and my Table’s “Headers” will update.  In addition, my Power Query will show the correct values for the dates as well.  Pretty cool, as I could now link this into Power Pivot and set a relationship against a calendar table without having to worry about how it would be updated.

Going One Level Deeper

One thing I don’t like about this setup is the need for the extra query.  That just seems messy, and I’d prefer to see just one query in my workbook.  The issue though, is that I’m pulling data from two tables.  What’s cool though, is that with a little editing of the M code, I can fix that. Here’s the M for the query I’ve built, with a key line highlighted (twice): image As you can see, the “Merge” line is coloured yellow, but the name of the Query being merged is in orange.  Well guess what, we don’t need to reach to an external query here, we can reach to another named step in M.  Try this: Immediately after the “Let” line, enter the following:

TranslationTable = Excel.CurrentWorkbook(){[Name=”DateTranslation”]}[Content],

Now, modify the “Merge” line to update the name of the table from “DateTranslation” to “TranslationTable”.  (The reason we’re doing this is that the original query still exists, so we can’t just name the first step “DateTranslation”, as it will conflict. Once we’ve made our modifications, the script will look as follows: image When you click “Done”, the query will reload and you’ll see an extra step in the “Applied Steps” box on the right.  What you won’t see though, are any changes, as the data comes out the same.  Very cool, as we are now referencing both tables in a single step.  To prove this out, save this query, drop back to Excel and delete the “DateTranslation” query.  It will still work! (The completed file can be downloaded here.)

Ending Thoughts

I really like this technique.  It let’s me dynamically change the column names, yet still use those to link them into my data model tables.  But even more I like the ability that, with a minor edit to the M code, I can keep my workbook from being littered with extra queries.  :)

Using Non-Contiguous Data Ranges in Power Query

One of the methods we use when building business solutions is creating specific input sheets for our models, as it separates our data from our business logic and ultimately our reporting layers.  This strategy is a key piece of building stable models, and is one of the fundamental things I teach when I’m leading modelling courses.  (For more information on having me at your site, click here.)  Since many data entry points consist of non-continguous ranges in Excel, it make sense to look at how using non-contiguous data ranges in Power Query can help us in our modelling.

To that end, in today’s blog post, I’m going to look at a technique to take a standard data entry worksheet and turn it into a data source, which can then be linked in to the model (or just used as the basis of reports.)  This is ideal, as we then only have one place to update our data for our solution.

Background:

What we’re going to start with is this:

image

A few key things you might want to know are:

  • You can download the sample file from this link if you want to follow along.
  • This is hypothetical budget data for golf rounds.
  • All the data entry cells are light green (I tell my users “Green means go” and they are now conditioned to stay away from anything that isn’t that light green colour.
  • The year in cell B1, when changed, will update all of the headers on the input sheet to the current year

Now, what I’d like to do is un-pivot this data into a useable table.  If I could do that, then I’d be able to use it in any of many methods, such as PivotTable, PivotCharts, charts or VLOOKUP solutions.

The initial setup

Your first temptation might be to layer a table over the data.  I’ve got a few reasons why I don’t want to do that:

  1. If I did layer a table over my input form it would lose the intuitive “enter data here” setup that I worked so hard to create.  Users would end up with a banded table that didn’t indicate where data should go.  That’s not ideal.
  2. If I set up the table with headers in row 3, it would convert the dates to hard numbers, thereby blowing apart the ability to easily update the dynamic column headers next year.  That would defeat the purpose of making the input sheet dynamic in the first place.
  3. If I set up the table with headers above that, I’d end up with a bunch of ugly Column1, Column2 headers.  I suppose I could hide them, but again, my table would blow apart my formatting as mentioned in 1.

So no… that’s not what I’m after.  So now what…?

There’s actually a few different ways to handle this.  I’ll look at some others in later blog posts, but for now let’s piggyback on the technique about using named ranges that I covered in my last post.

Creating the Named Range

The first thing I’m going to do is create a named range (not a Table) to cover the entire range that holds my budget data: A3:N22.  That includes the headers, all blank rows ,and all data right up to the totals on the right and bottom.  The reason for this is that, should I end up adding any new categories later by inserting rows or columns, it’s fairly likely that I’ll be doing it before the totals, so they’ll get picked up.  (It’s not quite the auto-expansion feature of a table, but it’s the best I’m going to get with a named range.)

So I created my named range and give it a sensible name like rngBudgetData.

Pulling the data into Power Query

Now let’s look at the easiest way to get our named range into Power Query.

If we click a single cell in our data range, then told Power Query to get the data from a Table, we’d be given a range for that expanded to cover only the contiguous block of data.  Because we’ve got blank rows in our data, that wouldn’t really work for us.  For example, assume we click B3, then tell Power Query to pull data From Table.  We’d be given the range A12:N15, as shown below.

image

Alternately, we could create a blank query, then type =Excel.CurrentWorkbook() in the formula bar, and choose our table, as outlined in the last blog post:

image

Okay, so that WOULD work.  As it turns out though, there is still yet another way to do this:

  • From the Name drop down in Excel, select the rngBudgetData named range

SNAGHTML3169d992

  • A3:N22 will now be selected, so go to Power Query –> From Table

And look at that, we’re now in Power Query using our named range!

The secret to this is, in order for Power Query to pull in the named range, the entire named range must be selected.  If that happens, Excel will grab it as the Power Query source.  If you’re using named ranges that cover contiguous blocks of data, and you click somewhere in that block of data, then Power Query very well may grab your entire named range.  So long as the auto selection boundaries line up with your named range you should be good.  But in the case of my data, those blank rows blow that apart, which is why I selected it from the name drop-down first.

(It is worth noting that if a named range and a table’s boundaries match exactly, the Table will be used as the Power Query source, not the named range.)

Manipulating the data in Power Query – Issue 1

Upon pulling the data into Power Query, we end up with a table like this:

SNAGHTML2d2a36c

Naturally, the first thing we want to do is promote the first row to headers, so that we can get to un-pivoting the data.  But look what happens when we do:

SNAGHTML2d3d005

What the heck?  The first column was renamed to Month (from the first row of data), but the remaining didn’t change! Not only that, but we lost our date time stamp.  That’s not good.

So it appears that Power Query can’t promote a date/time into a column header.  We’d better remove that step and try another approach.

Removing Irrelevant Data – Cut 1

Before I get into dealing with the harder stuff, I’m going to do a quick bit of cleanup here.  I’ve got some bank rows of data in my table, so I’m going to knock those off first.  To do that I:

  • Filter Column1 and uncheck (null) values

I’ve also got some header rows, which I can identify by the null values in Column2-Column14.  I can knock those off as follows:

  • Filter Column2 and uncheck (null) values

Easy enough so far.  I also don’t need the Total column, so I’ll scroll all the way to the right and:

  • Right click Column14
  • Choose Remove

Now I need to get rid of all the rows that contain totals (as I can always re-create those with a Pivot Table:

  • Filter Column1
  • Text Filters –> Does Not Contain –> “Total “ –> OK

Now, you’ll notice that I used Total with a space.  This is just to make sure that I don’t accidentally remove a row that contains a word which has total in it – like “totally”.  It’s always safer to make your text pattern as exact as possible.

After doing all of the above, I’m left with this:

SNAGHTML2e492f9

So the data is all clean, but I still can’t promote my headers so I can un-pivot this data.  Now what?

Busting Out Transpose

Transpose is quickly becoming one of my favourite functions.  Let’s hit it now:

SNAGHTML2e6791e

Hmm… originally I was going to convert my dates to text, Transpose it back, promote the text to headers, then un-pivot it.  But I don’t think I even need to do that at all.  Try this:

  • Transform –> Use First Row as Headers
  • Select the Month column
  • Transform –> Unpivot Columns –> Unpivot Other Columns

Bingo!

SNAGHTML2e8de09

Now just for the final cleanup:

  • Select the Month column –> Transform –> Data Type –> Date  (You need to do this, or Excel will return date serial numbers to the table, not dates formatted as dates.)
  • Rename the Attribute column to “Round Type”
  • Rename the Value column to “Amount”

Finally we can choose to Close and Load the Power Query and it will turn it into a nice Excel table.

Some Thoughts

As I was developing this blog post I found about 10 different ways through this process to end up at the same goal.  This is actually one of the things I enjoy about Power Query is the creativity that you can employ coming out to the same end result.

One feature I do hope to see eventually is the ability to filter for text that “starts with” or “does not start with”.  That would allow me to make my text searches even tighter than the “contains” portion I used here.  In one version I actually wrote an if function to do that:

=if Text.Start([Column1],5)=”Total’ then “remove” else “keep”

The loss of the months when originally promoting the header row shocked me a bit.  I would have expected this to be automatically converted to text and those values used.  Alas that’s not the case, so we have to do a bit more work.  Certainly not the end of the world, but not what I’d expect to see.

At the end of the day though, it’s nice to know that there is a way to get useful non-Table data into Power Query and turn it into something useful.  :)

Pulling Excel Named Ranges into Power Query

The comments of my last post collected a tip that I thought it was worth exploring on pulling Excel named ranges into Power Query.

I made the claim that Excel MUST have the data in an official Excel table.  As LoganEatsWorld pointed out, that’s actually not true any more.  If you’d like to give this a go, you can download this workbook to follow along.

What’s in the file?

The file is very basic.  It simply contains one table, and one named range of data:

image

The table is highlighted in the blue table style, and bears the name “Stats”.  The named range is surrounded by the black outline, and is called “Breeds”.

Connecting to Excel Data:

The reason I never found this is that my method was always to go to the Power Query tab and click –> From Table.  That will work great to get the data out of a table, but it won’t work for the named range.  So let’s try this a different way…

  • Go to Power Query –> From Other Sources –> Blank Query
  • Click in the formula bar and type the following:

= Excel.CurrentWorkbook()

(Yes, it’s case sensitive… I’m starting to reconcile myself to the fact that it’s Power Query so I’m just going to have to get over it.)

What ends up happening is a bit of magic:

image

Interesting… we have two tables listed!  The first is our official table, the second is our named range.  Cool!

Let’s click in the blank space to the right of the green “Table” text in the Breeds row:

image

The preview pops up and, sure enough, that’s our named range data:

image

Working With The Data

All right, let’s click the green Table text and break open that named range:

image

One notable difference here (in fact really the only one), is that Power Query doesn’t automatically recognize the header row.  This is due to the fact that an Excel table actually has a named header row to promote, where a named range does not.  No big deal though, as we can easily deal with that:

  • Go to Transform –> Use First Row As Headers

At this point, we could save the table to the worksheet or data model, as we need.

Observations

So this is cool.  It’s awesome that we can get to named ranges, as I have a LOT of workbooks that use these, and there are occasions where I don’t want to convert them to official Excel tables.  Despite the fact that we can, however, you pretty much need a secret decoder ring to find it, and that’s not so good.

It would sure be nice if there was a more discoverable way to pull in a named range… but where?

Suggested Accessibility Option 1

When I look at Power Query’s “Get External Data” function, it seems logical to me that it should end up somewhere in that area.  Looking at the group:

image

I kind of like the ability that comes with the “From Table” feature which works from the table you’re in (if you are), and lets you create a table if you’re not inside a table when you click that button.  But I wonder if it would be better served as a SplitButton/menu/submenu structure that offered the following options:

  • From Table
    • Current Table
    • Other Table
      • List of other tables in the workbook
    • Named Range
      • List of named ranges in the workbook
    • Create New Table
    • Create New Named Range

Actually, there is another change I would make to that group, and that’s to move the “From Blank Query” out of the “From Other Sources”, and give it it’s own button.  (I create a LOT of queries from scratch now, and it’s just extra clicks in my way to do so.)

Suggested Accessibility Option 2

I’m not sure this is so much of an alternate as something additional I’d like to see, actually.  An “additional sources” button on the Home tab would be awesome.  If that had the ability to pull up all the existing tables or named ranges in the workbook, and add them to the Power Query script as a “Source2=…”

I think the implications of this would be two fold:

  1. It would allow you to add a data source after creating a blank query, and/or
  2. It would allow you to add additional data sources into the same query.

The latter is certainly something I do semi frequently, as I don’t want to have multiple Power Queries created that are then merged together.  I’ll add both sources manually in the same query then merge them.

At any rate, just some thoughts.  If you have any on the subject please feel free to leave them in the comments.  :)

Un-pivoting With Subcategories in Power Query

My last few posts have been relatively technical, so this time I figured I’d look at something practical.  I can’t believe it’s been almost a year since I blogged about Un-Pivoting data in Power Query, so it’s about time we looked at that again… but this time with a twist.  This time we’ll look at un-pivoting with subcategories in Power Query.

Background

The data we’re going to start with looks like a typical financial report.  Whether a restaurant or a shoe store, in the manager’s office you’re liable to come up with a report that looks something along the lines of this:

SNAGHTMLf92afc

Now for the challenge… someone decides they need an alternate view of this data.  So how do you quickly un-pivot this into a format that you can use for other things?

If you want to follow along, you can download the file from my OneDrive.

Issue 1 – Getting the Data into Power Query

The first issue we come across is that, while Power Query can consume data from inside an Excel file, it MUST be formatted as a table.  But this hardly looks like it’s conducive to a table format with all those blank rows and such.  But what the heck… let’s apply a table to it anyway, and see what happens.

  • Click anywhere
  • Choose Power Query –> From Table
  • Adjust the range to cover all of the data (A4:H17)
  • Uncheck the box that indicates your table has headers

When you’re done, the box should look as follows:

image

And when you say OK, you should be taken into Power Query.

If you take a quick peek back at Excel, you can see that the table has indeed been applied, and that there are generic column headers above each column:

SNAGHTML100f05e

This is also reflected in Power Query:

image

Data Cleanup

Before we get into the trick of how to deal with subcategory columns, let’s clean up some of the garbage here.  Ideally what we’d like to get here is a nice pure table that we can easily un-pivot, just like we did in the prior article.

Cleanup Step 1:

Looking at the first column, we’ve got a bunch of null values in there, as well as some section headers.  What we really need is those section headers repeated on the lines below them.  So let’s make that happen.

  • Select Column1
  • Go to Transform –> Fill –> Down

You’ll see that the section headers are filled into any of the null areas. As soon as they encounter data however, they stop. (I’ve drawn a box around the Revenue lines below – notice how they fill until they reach Total Revenues, which then fill until they reach Expenses, and so on.)

image

Cleanup Step 2:

Now, we don’t really need any of the rows that are showing null values in Column3 through Column 8.  Let’s filter Column3 to remove those.  Click the drop down on Column3 and uncheck (null).  The result looks like this:

image

So… why didn’t I filter the null values out of Column2?  After all, there are blank data rows, and with a PivotTable we can recreate the subtotals…  The answer is that I’m not ready to lose the first two rows yet.  I need those in order to un-pivot my data. ;)

Issue 2 – Un-Pivoting the Data

If things were lined up perfectly, we could just select Column3 through Column 8 and un-pivot it now.  Unfortunately, if we do we’ll get some really wonky results.  (Go ahead and try it if you like. Remember to click the x to the left of the “Unpivoted Columns” step once you confirm you’ve made a disaster of it!)

image

Preparing to Un-Pivot

Okay, so what do we need to do… well, the first thing we need to do is fill the first row (containing April and May) across the columns.  Here’s the rub though… there is no Fill—>Across feature.  So how do we do it?

Transposing the Table

To an accountant, transposition is an evil word that means you made a mistake and flipped two digits around.  It’s nasty and something we never look forward to.  But to Power Query it’s simply awesome.  Check this out…

  • Go to Transform—>Transpose

This instantly flips the columns to rows and rows to columns!

SNAGHTML1147243

And would you look at that… April and May are in Column1 and below them… null values!  We know what to do with those now!

  • Select Column1
  • Go to Transform –> Fill –> Down

Is this enough though?  Nope, sorry.  You might be tempted to “un-transform” and then un-pivot it, but you’d still end up with garbage.  We still need to do a bit more.

Concatenating the Category and Subcategory

This is the trick to un-pivoting with subcategories: you need to concatenate them first, then un-pivot, then split them up.  So let’s get to it.  Using the tip from Merging Columns with Power Query, let’s join up Column1 and Column2.

  • Select Column1
  • Hold down CTRL or SHFT and select Column2
  • Click Transform –> Merge Columns
  • Choose the Colon for the separator (or whichever you prefer)

Note:  If you don’t have the Merge Columns feature, you’re using an old version of Power Query. Either download the latest version, or you’ll need to manually join the columns by:

  • Adding a new column
  • Using the formula =[Column1]&”:”&[Column2]
  • Delete Column1 and Column2
  • Move the new column to the first position

Once you’ve got it done, the output should look as follows:

SNAGHTML120c673

Un-transposing the Table

Awesome… we’ve got concatenated headers now.  We just need to flip the table back right side up and we’re almost ready to un-pivot it:

  • Go to Transform –> Transpose

Final Preparations

The very last thing we need to do before we un-pivot our table is provide some decent headers.  This will ensure that the data will make sense when it is un-pivoted.  To that end:

  • Go to Transform –> Use First Row as Headers
  • Rename the first column to “Class”
  • Rename the second column to “Category”
  • Filter “Category” to remove the null values

Our table now looks nice and clean:

image

And we’re ready!

Un-Pivot It!

We now follow the steps of a regular un-pivot operation, with only a minor extra step:

  • Select the Class column
  • Hold down CTRL or SHFT
  • Select the Category column
  • Go to Transform –> Unpivot Columns –> Unpivot Other Columns
  • Rename the “Value” column to “Amount”

And now the extra step:

  • Select the “Attribute” column
  • Go to Transform –> Split Column –> By Delimiter –> Colon
  • Rename the Attribute.1 column to “Month”
  • Rename the Attribute.2 column to “Measure”

The results:

SNAGHTML12d5ffa

That’s pretty much it.  The last thing I’d do is change the Query name from Table1 to something more intelligible… maybe Data or something… then load it to the worksheet.

From a Static Report to a Data Source

Now that we’ve got our report reformatted into a data source, we can click anywhere in the table and pivot it to our heart’s content!

A Quick Recap

To be fair, this post has been pretty long, but only because I included a LOT of pictures and detailed instructions.  Once you’ve got the process nailed down, it doesn’t take all that long at all.  Remember, the big key to this is:

  • Suck your report into a table (without headers)
  • Fill any rows you can
  • Transpose the data
  • Concatenate your category and subcategory fields together
  • Un-transpose it
  • Un-pivot it
  • Split the category and subcategory back into their pieces

Enjoy!  :)

Power Query’s Round Function

The other day I asked one of my co-workers how many ways he knew of to round a number.  His answer was one… if it ends in .4 it rounds down and if it ends in .5 it rounds up.  My guess is that most people would answer along similar lines, but that’s NOT how Power Query’s Round function actually works…

Interestingly though, there are a bunch of different ways to round, depending on your needs, and Excel has a bunch of functions to support them: ROUND, ROUNDUP, ROUNDDOWN, FLOOR, CEILING, EVEN, ODD, TRUNC, INT and MROUND.

Power Query also has a bunch of rounding formulas as well but, given that the function names can be somewhat different in Power Query (as we first saw here), I thought it might be interesting to see how to convert each of those functions from Excel to Power Query’s structure.

Background Setup

To start with, I created a very simple structure: a two column table with some random values in the “Value” column and then rounded them to 2 decimals using the formula =ROUND([@Value],2)  The output, after feeding it through Power Query, looks like this:

SNAGHTML20a5291b

The blue table on the left is the Excel table, and the green table on the right is the Power Query output.  (There is a completed example file available here example file available here.)

Creating the Round function

I love the ROUND function in Excel.  I use it constantly – rounding everything that uses multiplication or division – and pretty much have it burned into muscle memory.  So to me this was a logical place to start with Power Query.  Naturally, the syntax is just a bit different from Excel though:

Syntax
Excel =ROUND(number,digits)
Power Query =Number.Round(value, digits, roundingMode)

Hmm… we know that the Power Query function will be case sensitive.  In addition, it has an extra parameter.  The valid options are:

  • RoundingMode.Up
  • RoundingMode.Down
  • RoundingMode.AwayFromZero
  • RoundingMode.TowardZero
  • RoundingMode.ToEven

Let’s see what we can do with this.

I open up my query, select Add Column, and put in the formula as shown below:

image

Pretty easy, just Number.Round([Value],2).  In fact, it’s so similar to Excel it’s awesome!

So I click OK, save the query, and have a look at my results.  And that’s when I notice something isn’t quite right.  I’ve added some conditional formatting to the table below so you can see it easily:

image

This is the default?

Notice all the numbers that don’t match?  Can you spot the pattern?  It’s the oddest damn thing I’ve ever seen, to be honest, and I’ve never heard of anyone rounding in this way.

The default “RoundingMode” for Power Query is “Round to Even”.  What that means is that if there is a tie in the numbers it will round up or down to the closest even number.  So in the case of 1.645, it will round down to 1.64 as that is closer than 1.66.  (1.64 is .05 away from 1.645, where 1.66 is 0.15 away from 1.645).

I find this deeply disturbing.  I personally think that every user would expect Excel and Power Query’s default rounding methods to line up exactly, and this doesn’t.  How serious is this?  I’m not sure.  I think I’ll let someone from the scientific community ponder that.

Using RoundingMode.Up

Since the default plainly doesn’t work for us, it looks like it’s time to figure out which of the additional parameters we need.  Let’s try adding RoundingMode.Up to see if that will fix it.

I open Power Query again, and added a new custom column with the following formula:

=Number.Round([Value],2,RoundingMode.Up)

And the results are as follows:

image

Um… uh oh.  It seems to work above 0, but below is another matter.  That –5.245 is rounding down, not up! (Yes, from a technical perspective I am aware you can argue the words I used, but you get the idea.)

Using RoundingMode.Down

Now I’d be surprised if this came up with numbers consistent with the Excel formula, but let’s just check it for good measure.  The formula is:

=Number.Round([Value],2,RoundingMode.Down)

And the results:

image

So now numbers greater than 0 get rounded down, where numbers less than 0 are being rounded up (away from zero).

Let’s try another:

Using RoundingMode.AwayFromZero

Here’s our next option:

=Number.Round([Value],2,RoundingMode.AwayFromZero)

And these results are pleasing!

image

Look at that… we finally found the one that works!

Using RoundingMode.TowardZero

We’ve only got one other option we haven’t explored, so we might as well use it too, just for the sake of completeness:

=Number.Round([Value],2,RoundingMode.TowardZero)

For some reason, I’m incapable of typing TowardZero the first time I type this.  I always type TowardsZero and end up with an error!  At any rate, the results:

image

Thoughts

As a tool that is built for the Excel audience, I am having some real difficultly accepting the default parameter for this function in Power Query.  I HOPE that this is a bug, and not a design choice, although the documentation would suggest it is the latter.  If that’s the case, I think it’s a HUGE mistake.

Excel’s ROUND formula defaults to round away from zero.  Power Pivot’s DAX ROUND formula defaults to round away from zero.  VBA’s Application.Round function defaults to round away from zero. (As pointed out by Rory Archibald on Twitter, VBA’s Round function – without the application. prefix – does use banker’s rounding though.)

In my impression, if the Power Query formula holds the same name (at least after the Number. portion) it should return the same results as the Excel function.  In fact, I would venture to say that virtually every Excel pro would expect this.

My bigger concern would be that, with one of Power Query’s big selling features being it’s ability to re-shape and process large volumes of data, how quickly will a user realize that the Rounding function they thought they had is NOT working the way they expected?  Not good news at all.

I’m curious to hear your impressions.  Please leave a comment!

Want to see if for yourself?

Download the example file with all the formulas already in place.

Multi Condition Logic In Power Query

In my last post, we looked at creating an IF statement using Power Query.  This time we’re going to go a bit deeper and look at multi condition logic in Power Query, exploring a scenario where we need to choose between several options.

The base scenario

In the last post I talked about my billing code setup.  As a reminder, it’s a 10 digit code that is constructed as follows:

  • Char 1 – Alpha – Indicates the division (G = Golf, F = Fitness, M = Marina)
  • Char 2 – Alpha – Indicates the billing type (D = Dues, S = Pass, A = Annual Moorage, P = Periodic Moorage)
  • Char 3-4 – Numeric – Indicates the number of months of coverage for the product (1-12)
  • Char 5-6 – Numeric – Indicates the start month (and subsequent anniversary) for the customer’s product
  • Char 7-8 – Variable – Slip length (in feet) for a boat in the case of marina customers, or SG, CP or CS for golf (indicating single, couple primary or couple spouse)
  • Char 9 – Text – A variety of single letter codes indicating specific things we want to know. (Outlined below)
  • Char 10 – Text – Indicates the payment method (F = Financed, P = Paid up front, C = Comp/Honorary)

(Note that the sample data only includes records for Marina data)

Sample file

If you’d like to follow along, download the sample file here.

Multi condition logic – Using IF

So, building on my previous two posts (using text functions and creating IF statements), we could easily break the first character into pieces by nesting 2 IF tests together:

=if Text.Start([BillingCode],1)=”G” then “Golf” else “two options left”

=if Text.Start([BillingCode],1)=”F” then “Fitness” else “Marina”

into

if Text.Start([BillingCode],1)=”G” then “Golf” else if Text.Start([BillingCode],1)=”F” then “Fitness” else “Marina”

Not too hard really.  In fact, we can even build each IF statement separately, then just copy the second to replace the “two options left” part without making any other changes at all.  No parentheses or anything needed.

More than 3 options

But what if you have a whole bunch of options that you need to work with?  Let’s look at the 9th character in our billing code.  I haven’t given the details yet for that one, but here are the options:

E = Employee, S = Yacht Club, N = Non-Taxable, R = Restricted, I = Inactive, L = Social, M = Medical, U = Regular

Wow.  That’s a whole lot of possibilities, and would make for one monster nested IF statement.  That wouldn’t be a lot of fun to write, nor maintain.  So how would we deal with it?

In Excel proper, we would probably separate these options into a separate table, then use VLOOKUP to pull the appropriate value into the table.  So we just need a Power Query VLOOKUP function… except there isn’t one.

We actually have a couple of different methods to deal with this.  We could either:

  1. Split the first character into one column, create an Excel table with the first letter in column 1 and the appropriate match in column 2, then merge the two using Power Query’s merge function. (Maybe I’ll write a future post on it.)
  2. Build our own CHOOSE function inside Power Query (or SWITCH if you prefer Power Pivot’s DAX version.)  This is way more fun, so let’s do that.  :)

Building a CHOOSE function

This actually isn’t too hard once you know the basic structure.  It basically goes like this:

function_name = (input) => let
values = {
{result_1, return_value_1},
{input, “Undefined”}
},
Result = List.First(List.Select(values, each _{0}=input)){1}
in
Result,

The key parts to recognize here are:

  • We can change the “function_name” part to be whatever we want/need,
  • result_1 is the first of the possibilities that we may pass TO the function
  • return_value_1 is the value that we’d like to return if the first value is result_1
  • if we need more values, we just insert another comma after the value_1 section and put in a value_2 section
  • we can keep adding as many values as we need.
  • the “Undefined” value will return the text “Undefined” if the value you pass isn’t in your provided list of options (it’s the Else portion of the CHOOSE statement)

Using this structure, we could write a CHOOSE function for our scenario as follows:

fnChoose_CustCode = (input) => let
values = {
{“E”, “Employee”},
{“S”, “SCYC”},
{“N”, “Non-Taxable”},
{“R”, “Restricted”},
{“I”, “Inactive”},
{“L”, “Social”},
{“M”, “Medical”},blah
{“U”, “Regular”},
{input, “Undefined”}
},
Result = List.First(List.Select(values, each _{0}=input)){1}
in
Result,

Notice that I changed a couple of things:

  1. I gave the function a name so that I can recognize it, and also so that I can create more than one function with different names.  This one is fnChoose_CustCode.
  2. I created a list of all the options I needed.

Implementing the CHOOSE function

Okay, so now we have this, how do we use it?  Again, we’ve got two options.  I’ll focus on the other option at some other time, but for this scenario I want to build this right into an existing query.  So here’s how I do it.

First I created a brand new query that just pulls my table into Power Query, resulting in the following:

SNAGHTML76c03ee

Let’s go and inspect the code that pulls this in.  We need to click View –> Advanced Editor.  That will bring up the following code:

let
Source = Excel.CurrentWorkbook(){[Name=”Customers”]}[Content]
in
Source

Not too complicated (yet).  Let’s paste in our code just before the Source = line:

let

fnChoose_CustCode = (input) => let
values = {
{“E”, “Employee”},
{“S”, “SCYC”},
{“N”, “Non-Taxable”},
{“R”, “Restricted”},
{“I”, “Inactive”},
{“L”, “Social”},
{“M”, “Medical”},blah
{“U”, “Regular”},
{input, “Undefined”}
},
Result = List.First(List.Select(values, each _{0}=input)){1}
in
Result,

Source = Excel.CurrentWorkbook(){[Name=”Customers”]}[Content]
in
Source

Perfect.  And yet it doesn’t exactly look like much.  In fact, beyond adding a new line in the Steps section of the Editor, we don’t see any changes:

image

So what good did that do then?

As it turns out, we’ve only pasted in our function to make it available to the Power Query engine.  We haven’t actually told Power Query to do anything with it.  So why don’t we do that now?

Using our new CHOOSE function

You’re going to be amazed how easy this is…

First we’re going to add a new column (Add Column –> Add Custom Column).  When the dialog pops up, we’ll create a formula to return the letter we want to pass to the function:

=Text.Range([BillingCode],8,1)

And that gives us the following result (assuming we provided the column name of Status):

SNAGHTML29d7316

Cool stuff.  Now, let’s amp this up and use our function.  We’ll click the gear next to the Status step and wrap the existing formula with our function call.  (Don’t forget the extra  parenthesis needed at the end):

=fnChoose_CustCode(Text.Range([BillingCode],8,1))

Which gives us the following:

SNAGHTML2a06cbf

The end effect

You’ll find that all the sample codes in the data work just fine, and that nothing comes back as undefined.  If you’d like to see how the query reacts to different items, go back to the Customers table and try changing the second to last letter to something else.  When you refresh the table, you’ll find that it will evaluate the new character and return the appropriate result.

Caveat

It should be noted that the function as written above is case sensitive, meaning that a code of MP010450uP would return “Undefined”.  This is expected in my case, as valid codes are made up of upper case letters.

If I wanted to accept either case I would need to modify my Text.Range function to force it to upper case.  This would result in a function call that reads as follows:

=fnChoose_CustCode(Text.Upper(Text.Range([BillingCode],8,1)))

Which would work, as you can see here:

SNAGHTML2be7f6a