Load Power Query directly to Power Pivot in Excel 2010

One of the cool features in Excel 2013’s Power Query is being able to load to the Data Model (PowerPivot) directly.  But Excel 2010 doesn’t appear to have this feature.  Interestingly, you can still load Power Query directly to Power Pivot in Excel 2010, it just takes a bit of a careful workaround.

Let’s look at the required steps

Step 1: Create Your Connection

First, I’m going to load in the content of a text file.  So I:

  • Go to Power Query –> From File –> From Text
  • I browsed to the file I needed, and imported it into Power Query
  • I do whatever cleanup is needed and name the query Sales
  • Next, we go to the Home tab –> Close and Load –> Close and Load To…

And here’s the important part:

  • Choose “Only Create Connection” –> Load

And I’ve now got a basic connection to my sales table without landing it in a worksheet:

image

Step 2: Grab the Connection String

Now, here comes the secret.  We need to get the connection string that Excel uses to connect to the Power Query.  Here’s how:

  • Go to the Data tab –> Connections

In there, you’ll see the name of your new connection:

image

  • Select your Query and click Properties
  • Click the Definition tab

Now you’ll be looking at something like this:

image

Notice that this query is actually an OLE DB Query that is simply “SELECT * FROM [Sales]”  That seems easy to work with.  But the key for us is the connection string shown (#2 in the image above).

  • Select the ENTIRE connection string
  • Press CTRL + C to copy it
  • Click Cancel

Note:  Make sure you start at “Provider=” and highlight all the way to the end.  (It’s much longer than what you see in that little box.)

Load Power Query directly to Power Pivot

Finally, we’re going to pull this into Power Pivot.  To do this:

  • Go to the Power Pivot tab –> PowerPivot Window
  • From Other Sources –> Others (OLEDB/ODBC) –> Next

image

  • Name your table
  • Paste your Connection String in the box

image

  • Click Next –> Next –> Finish –> Close

And voila!  We have our Power Query linked directly into Power Pivot in Excel 2010!

image

Just remember… if you do this, NEVER modify this table in Power Pivot.  Always go back to modify the table in the Power Query stage.  Failure to do so could set the table into a non-refreshable state.

Native Database Query Security Prompts

In a comment on my Parameter Tables post, Talha asks why we get prompted with a Native Database Query message each time the value of the parameter used in a SQL query changes:

I had a question. The value I grab from my parameter table, I store it in a variable and use it in my SQL query. However, every time I change the parameter, I get the following prompt:

------------------------------------------------

NATIVE DATABASE QUERY

You are about to run a native database query. Native queries can make changes to the database. Would you like to run the following query?

------------------------------------------------

Is there any way I can make this go away?

To answer this, it helps to understand why this message comes up.

Replicating The Issue

When you connect to a SQL server database in Power Query, there are three parts to the initial dialog:

image

Server and Database are pretty straight forward, but the SQL Statement is marked as optional, and is actually hidden under the arrow at the left.  Now, for anyone who has written SQL in the past, you open up that little window, pop in your SQL and click OK.  And you’ll get prompted with a message about trusting a Native Database Query.

On the other hand… if you DON’T fill in a SQL statement, you’ll be taken to a list of all tables and views in the database that you have access to, and will be able to bring those in to start working with them.  Do that, and you’ll NEVER SEE THIS PROMPT.

Why Prompt About a Native Database Query?

A native database query is basically SQL code that you provide to pass to the database.  Power Query treats this as potentially dangerous, so gives you a warning when you do this.  And every time you change the underlying SQL query (pass one parameter to it to change it in some way), the Query is seen as “new” and again, potentially dangerous.

I tend to agree that this is a bit overkill.  I’m not a SQL expert, by any means, but if the query is just a SELECT query, then I’m not sure why we need to be warned about it.  You’d think that would be easy enough for the PQ team to parse out, looking for keywords like ALTER, DROP, UPDATE, DELETE and such, but regardless, that’s the way it is.

Avoiding the Error

Now that we know what is doing it, how do we avoid it?  The answer is actually more simple, but harder to swallow than you might like. Don’t use the Native Database Query functionality!

I really struggled with this at first.  I saw Power Query as a “SQL Sever Management Studio Light” for Excel pros.  It’s not.  It’s totally different. What the PQ team has built is a way for non-power users to be able to query, clean, restrict and reshape their data.  And if you do it their way, then you won’t get this irritating prompt.

In the case of grabbing data from SQL Server, you’d simply connect to the server, grab the table/view you want, then start setting your filters in Power Query to restrict down the data you want.  The UI makes this easy, and it’s a lot easier to tweak the data to get it right than forcing an Excel person to go back and tweak their SQL.

And here’s the cool part… once you’ve done this, Power Query takes advantage of a technology called “Query Folding”, which folds up all the reshaping commands you’ve issued.  It passes those to the SQL server as a single command to bring back your data.  In layman’s terms (although not perfectly technically correct), you can look at this as if Power Query is allowing you a graphical way to write your SQL for you behind the scenes, then pass it to the server.  And because Power Query knows it won’t do anything dangerous, it doesn’t prompt you about potential problems.

What About Performance?

A few months ago I asked someone at Microsoft this question… the thrust of my question was if Power Query would be more or less efficient than passing an optimal SQL query.  His reply was that no, if someone was a SQL Ninja, they could certainly come up with something that performed better in some situations.  Having said that, for the majority of Excel pros, Power Query will do a better job, as we don’t tend to be SQL optimization experts.

Further to this, the very worst thing you can do for performance is provide a base query then try to do further filtering.  Look at these three cases:

Case 1 Use Power Query to:
Connect to table Transactions
Filter to Dept <> 150
Case 2 Native Database Query:
Select * From Transactions Where Dept <> 150
Case 3 Native Database Query:
Select * From Transactions
Use Power Query to filter:
Dept <> 150

Case 1 is the preferred method when dealing with Power Query.  It will roll both steps up into a single statement, and retrieve the requested set from the server.

Case 2 returns the same data as Case 1, but prompts you with a Native Database Query on each computer you run the solution on.  (You can’t even save the connection with the workbook, it’s a user specific trust flag.)

Case 3 is the worst performing.  The reason is that the initial table has to be brought to Power Query in it’s entirety, and THEN you can start filtering.  Why?  Because Power Query cannot query fold it’s commands into the SQL you provided.

Recommendation

Even though connecting to a table, then re-creating all the filtering steps seems less graceful at first, this is the method I’d recommend you take.  In fact, I’ve actually made it a habit to remove any Native Database Queries from project I wrote in the past, replacing them with the recommended method.

To me this offers two benefits:

  1. No security prompting.  May seem minor, but when you put a Power Query error in front of non-tech users, they tend to freak out.  Why put them through that?
  2. It makes the query easier to follow/tweak when you need to debug it later.  And let’s face it, sooner or later you’ll need to do this.

Use Dynamic Ranges in Power Query

This was a great question that came up in my M is for Data Monkey session at the Amsterdam Excel summit:  Can we use dynamic ranges in Power Query?

If I recall correctly, this was asked by Gašper Kamenšek, after I demonstrated how to use the =Excel.CurrentWorkbook() method to consolidate worksheet tables.  So we gave it a shot to find out.

Setup:

Nick Hodge jumped up and built us a quick little dynamic named range solution that looked like this:

SNAGHTML21f653dd

And defined a named range call “Animals”, as follows:

=OFFSET(Sheet1!$A$4,0,0,COUNTA(Sheet1!$A$4:$A$12),2)

The next step was to pull it into Power Query, but I didn’t have a Table or normal Named Range to pull from, so I went to Power Query –> From Other Sources –> Blank Query.

The Normal Piece

As I’d shown earlier, I typed the following in the formula bar in the Power Query editor:

=Excel.CurrentWorkbook()

And at this point I was pretty chuffed as it looked pretty straight forward:

image

This is exactly what I was hoping I would see.  Fantastic!  So I did a little normal cleanup:

  • Expanded the content
  • Moved the first row to a header
  • Deleted the “Animals” column that was carried down the table

I then proudly announced that “Yes!  It works!”, clicked Close and Load to land it in a table… and it failed:

image

The Data Fun House of Mirrors

That was a bit of a shock.  It worked in the Query window, but when loading it failed.  There’s something not adding up here.  So I jumped back in to edit the query:

I was immediately presented with this:

image

Doesn’t make sense… so I started to step through the process. I clicked on the Source step:

image

Aha!

Use Dynamic Ranges in Power Query

What I forgot was that I’d used =Excel.CurrentWorkbook().  That  function returns a table which contains all tables, connections and named ranges in the workbook.  And when we create a new Power Query, it adds an output table and a connection.  So we basically got a circular reference.

No problem.  We can easily deal with this.  All we need to do is filter down to the Animals table that we need.  (That way we don’t have to worry about any new tables being created.)

So I added that filter immediately after the Source step.  And boom, everything works again because it is restricted to just the table I originally had.  And now, when I commit it the the worksheet, it loads:

SNAGHTML229c0b05

Proof Positive

Perfect.  It loaded, now lets just go add some data and make sure it’s working properly at update.  I added a new record to the table:

SNAGHTML22839637

And voila!

SNAGHTML22a09947

So it looks like we can use dynamic ranges in Power Query without any issues.

Alternate Method

When I was writing this up, I couldn’t replicate the error at first.  The method I used was just slightly different.  Here’s what I did instead:

  • Pulled in the table
  • Clicked the green Table to drill into it
  • Promoted headers
  • Landed it

It worked perfectly at update.  Why?  Because I drilled into a specific table, avoiding the issue at update.  And in all honesty, it’s probably a better method. 😉

Looking for Power Query samples

How would you like to make your data famous?  Or at least – how would you like a solution built for your own data using Power Query… for free?

I have a couple of Power Query related things I’m working on right now, like these:

Now, if you’ve ever been to one of my courses, you’ll know that one of the most important things to me is to use data that is modelled on real world examples.  That’s right, no canned AdventureWorks samples.  I like data that real users face in the real world.

To that end, I’m looking for Power Query samples.  Specifically, for sample files of data to use as examples for my Power Query blog posts, courses and other educational products.  And while I’ve got lots of great data sets of my own, I’m particularly interested in stuff that I haven’t seen or demonstrated on the blog before.

So if you’ve got a nasty file that’s painful to import, or you have a few files that you’re having trouble combining, or you just can’t figure out a starting point with Power Query, I’d like to have a crack at your scenario.  It’s super easy… you email me the file, I’ll take a look at it, and cook up a solution.  And any new or interesting techniques that are needed become the subject of a future blog post, article, or even sample in the courses I teach.

I hope it goes without saying, but for legal reasons I’m going to say it anyway… any files you send me can and will be used in public.  In order to use them in my materials, by sending them to my you are, by default, assigning me copyright to those specific source files.  Because of that, it is VERY important that the data you include is not sensitive.  To be fair, I’m not interested in your data specifics, as much as the form and shape of it, how it starts out, and what the end product should become.

“Okay, that’s great for you, Ken, but what do we get out of sending you our files?”

How about a working solution?  Does that sound reasonable?  You send me your data, and I send you back something that uses Power Query to complete the goal?

If you’re interested (and who wouldn’t be), here’s what you need to include:

  • A sanitized copy of your source data.  (i.e. Do a mass find/replace on employee names/numbers, phone numbers, addresses and such.  If you have particularly troubling data that is interesting, I may be willing to even help here.)
  • A copy of what you want the output to look like.  (I’ll build to that end goal.)
  • A summary of any “rules” you use along the way to determine if something needs to change one way or another.  (I’d rather not spend my time guessing about your end goals.)
  • Your direction as to whether you’d like your/your company name included or withheld should I decide to publish a post using your data.

In return for that, I will send you a copy of the solution once I’ve crafted it.  This may happen days/weeks before any solution is posted publicly, or it may come in the form of an email to let you know that the post is live, at my discretion.

Please be aware that I also reserve the right to NOT use your submission at my discretion.  This could happen if my inbox gets overloaded with requests, if I’ve already dealt with your scenario on the blog, if your data would be better handled in another fashion, if I get too busy with paying projects, if the data is just too difficult to work with, or any other reason that I haven’t listed. (I’ll do my best, but this is free work, so it’s totally at my schedule and discretion.)

If that sounds interesting to you, my email is ken at excelguru dot ca.  I’m going to limit this to the first 10 data sets that land in my inbox, so if you want to try and take advantage of it, act quick!

Create Running Totals in Power Query

I was presented with an interesting problem last week; how to create running totals in Power Query.  There doesn’t appear to be a built in method to do this, so I thought it would make an interesting use case study.

Background

Let’s take a look at the following table, which can be found in the sample file found here (if you’d like to follow along.)

SNAGHTML1ec4a255

Now, we could certainly use a Pivot Table to create running totals, but consider the scenario where we want to preserve the original list.  Pivot Tables aggregate data and apply running totals to the aggregation, so that might not suit our needs.  (In the case of the table above, it would aggregate both records for 1/2/2015 into a single line.)

We could also set up a formula on our table that summed the row about plus the row to the left, for example.  Of course that means you MUST land it in the worksheet, which may not be ideal either.

This solution is entirely possible to solve with Power Query, but with no one click feature in the user interface, how do we pull it off?

Creating the Staging Query

The first thing I did is create a query to pull in my data source and land it in a “Staging” table inside Power Query, as per this post.  To do that I:

  • Selected a cell in the table
  • Went to Power Query –> From Table
  • Selected the Date column –> Transform –> Data Type –> Date
  • Went to Add Column –> Add Index Column –> From 0

At this point the query looks like this:

image

Next I finalized the staging query.  I:

  • Changed the table name to “Staging-Sales”
  • Went to File –> Close and Load to… –> Only Create Connection
  • Right clicked the query in the Workbook Queries pane –> Move to Group –> Staging

Building a Running Total function - Groundwork

With the staging query in place, it was time to tackle the real problem; the method to create running totals in Power Query.  That work requires a bit of down and dirty M work, and here’s how I approached this to build up my end solution:

  • Right click the “Sales” staging query and choose Edit

This will land you back in the Power Query editor.  On the left side of the window, expand the vertical “Queries” window by click on the little arrow until you see this:

image

With this window expanded, we can begin creating our function:

  • Right click “Sales” –> Reference
  • Right click “Sales (2)” –> Move to Group –> Create Group –> “Functions”
  • Change the name from “Sales (2)” to “fnRunningTotal”

Things should now look as follows:

image

Perfect.  We now have the query set up the way we need to take our next steps.  Right now it’s just pointing to our Sales query and pulling the data in, but that will change.

From a logic point of view, here’s what I want to happen:

  1. I want to take a row from the Sales table
  2. I want to send a copy of the table to my function
  3. I want my function to remove all rows where the index is higher than the submitted row
  4. I want to sum the remaining values
  5. I want to return that total back to the calling row

So let’s start working through that process… we’ll come back to steps 1 and 2 in a bit… let’s pretend those have already happened, and focus on the rest.

  • Select the Sale and Index columns –> Home –> Remove Columns –> Remove Other Columns
  • Click the Index column filter –> Number Filters –> Less than or equal to
  • We’ll pick an arbitrary starting point like 2 for right nowimage
  • Rename the “Sale” column to “Temp”

Now we’re going to take the final step with this data before we convert it into a function.  We are going to group it, so go to:

  • Transform –> Group By

Now, we need to tweak the default a bit to get it right.  The very first thing we need to do is remove the “Group By” level that groups by Temp, as we want to group all remaining rows together.  To do that, you click the – button beside Temp.

Next we need to set our data correctly.  We are going to set it to:

  • New column name:  RunningTotal
  • Operation:  Sum
  • Column:  Temp

And at this point, our output should look like this:

image

Building a Running Total function – Finalization

With this in place, we now need to turn this into a real function.  And to do that, we need to step into the M code itself.  So go to View –> Advanced Editor to do that.

Now, at the very top of the M code (before the let line), we are going to insert the following code:

(tbl as table, sumcolumn as text, rowindex as number) =>

This code will allow us to pass in the table, which column we want to sum, and which row we are working with when we call the function.

Next we need to plumb these variable into our code.  To do that we will:

  • Remove the “Source = Sales,” line of the M query
  • Replace every instance of Source in query (there is only one) with the name of our table: tbl

image

  • Replace every instance of “Sale” in the query with the name of the variable which holds our column name; sumcolumn.

image

Note that we do not need to wrap sumcolumn in quotes, as Power Query knows that it is text based on the parameter definition at the head of the function.

  • In the #”Filtered Rows” step, replace the hard coded 2 with the name of the variable which holds our index number; rowindex.

image

  • Place a comma at the end of the #”Grouped Rows” line (to make way for the next step)
  • Replace the last two rows with the following to return the single value back to the caller:

   RunningTotal = Record.Field(#"Grouped Rows"{0},"RunningTotal")
in
RunningTotal

When complete the function looks as follows:

image

And now click Done.  At this point, your query should flip to appear as a function.  This is evident in three ways:

  1. The icon in the left pane changes to an fx logo
  2. The table disappears and is replaced with the function parameters and an invoke box
  3. The Applied Steps collapse to a single line (I wish this didn’t happen)

Commit it by choosing File –> Close and Load

Create Running Totals in Power Query

So now… did it work?  Let’s find out.

  • Right click the Sales query in the Workbook Queries pane and choose Edit
  • Expand the queries pane on the left
  • Right click Sales –> Reference
  • Right click Sales (2) –> Move to Group –> Other Queries
  • Rename Sales (2) to “Landing”
  • Go to Add Column –> Add Custom Column
  • Call the column “Running Total”

And now we need to enter our formula.  The actual formula itself will be:

=fnRunningTotal(Source,"Sale",[Index])

Here’s how it breaks down:

  • fnRunningTotal is the name of the function
  • Source is the name of the previous step in the Applied Steps section.  This allows us to pass the entire table that is generated up to that point
  • Sale” is the name of the column we want to summarize.  The quotes here indicated that we are going after the column’s name, not the data in that column’s row.
  • [Index] is surrounded in square brackets as we want to pass the value for that row of the index column.  That is the row that is then used to filter down the data.

And the result, if you followed along exactly, is this:

image

So at this point, we can remove the Index column and call it done.  You’ll find a complete version here if you’d like to review it.

Final Thoughts

This looks like a very long and cumbersome route, but overall it’s actually not that complicated once you understand it.  The key points that I want to highlight here are:

  • We started with a simple Staging query
  • We used that as a reference and stepped through the process of filtering down the data for one record
  • We converted the new query to a function by:
    • Defining a line at the beginning to receive 3 variables
    • Integrated the variables into the code in place of hard values
    • Added an ending to feed back a single value
  • We used our function

The first line of the query is essentially just () => and any variables we need go between the brackets.  And while I declared there types, you don’t actually even have to do that.

The last line of code before the in is the trickiest part.  And here’s a secret… I just come back to a blog post and copy this part:

RunningTotal = Record.Field(#"Grouped Rows"{0},"RunningTotal")

The secret is understanding that the #"Grouped Rows" portion is just the name of the previous step in the M code, and the final “RunningTotal” is the name of the column you want to return.  That lets me retrofit it pretty easily into any other function.

So… would it be faster to just flip it into a pivot and add running totals there?  Absolutely.  Would it be faster to just land the data in a table and create a running total formula down the table?  Absolutely.  But if those aren’t options, then Power Query may be a viable solution for you.  :)

Prompt for a Folder at Refresh

I got a comment on my blog the other day from David, asking if we could make Power Query prompt for a folder at refresh, allowing us to choose which folder to should be used to consolidate files.  That sounds like a pretty cool idea, and my first thought was “I think we can totally do that!”

I headed over to Chris Webb’s blog, as he’s done some cool things with Power Query prompts. Ultimately though, I got stuck on one thing… we have to go into the Query to invoke it.  My (albeit limited) tests left me unable to have that prompt when we refresh the query.

Not to be outdone by a lacking feature though, I cooked up a solution using some Power Query and VBA techniques that I’ve shared before.  This post wraps them up into a neat little package.

Background

The idea I’m going for here is to be able to click a button which:

  • Prompts the user to select a folder,
  • Feeds that folder into Power Query, and
  • Refreshes the Power Query output table.

To set it up, I used 3 techniques that I’ve shared before:

Initial Setup

To begin with, I created a blank workbook, and immediately added a Power Query parameter table as outline in Building a Parameter Table for Power Query.  I followed the steps there exactly, so I’m not going to detail that here.  The only things of note were:

  • I named the worksheet “Parameters”, and
  • I used the following for my “File Path” value:  D:\Consolidate\Begin

image

Next, I created a query to consolidate all the files in a folder.  Basically I ran through the following steps:

  • Power Query –> From File –> From Folder
  • Chose my folder:  D:\Consolidate\Begin
  • Did a bit of cleanup on the file
  • Implemented the “fnGetParameter” function call in place of the folder (as described in the aforementioned blog post)

The end result was some M code that looked like this:

let
Source = Folder.Files(fnGetParameter("File Path")),
#"Combined Binaries" = Binary.Combine(Source[Content]),
#"Imported CSV" = Csv.Document(#"Combined Binaries",null,",",null,1252),
#"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
#"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"TranDate", type date}, {"Account", Int64.Type}, {"Dept", Int64.Type}, {"Sum of Amount", type number}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Sum of Amount", "Amount"}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns",{{"Amount", type number}}),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type1", {"TranDate"})
in
#"Removed Errors"

And returned results like this:

SNAGHTML250112a0

I then loaded the Power Query to the worksheet, and called the worksheet “Data”.

So far so good?  Nothing really unusual yet.

Laying the Folder Change groundwork

With the basic functionality in place, I wanted to now give the user the ability to change the folder.

Rather than write the Browse For Folder routine from scratch, I headed over the VBAExpress.com and grabbed the one I submitted to the KB… um… a long time ago.  :)

Once I had that code copied I:

  • Opened the VBE
  • Browsed into my workbook
  • Right clicked the project and added a new Module
  • Pasted in all the code (shown below):

Function BrowseForFolder(Optional OpenAt As Variant) As Variant
'Function purpose:  To Browser for a user selected folder.
'If the "OpenAt" path is provided, open the browser at that directory
'NOTE:  If invalid, it will open at the Desktop level
Dim ShellApp As Object
'Create a file browser window at the default folder
Set ShellApp = CreateObject("Shell.Application"). _
BrowseForFolder(0, "Please choose a folder", 0, OpenAt)
'Set the folder to that selected.  (On error in case cancelled)
On Error Resume Next
BrowseForFolder = ShellApp.self.Path
On Error GoTo 0
'Destroy the Shell Application
Set ShellApp = Nothing
'Check for invalid or non-entries and send to the Invalid error
'handler if found
'Valid selections can begin L: (where L is a letter) or
'\\ (as in \\servername\sharename.  All others are invalid
Select Case Mid(BrowseForFolder, 2, 1)
Case Is = ":"
If Left(BrowseForFolder, 1) = ":" Then GoTo Invalid
Case Is = "\"
If Not Left(BrowseForFolder, 1) = "\" Then GoTo Invalid
Case Else
GoTo Invalid
End Select
Exit Function
Invalid:
'If it was determined that the selection was invalid, set to False
BrowseForFolder = False
End Function

With that in place, I just needed to link that function into a routine that can use it…

Rolling the Final Routine

I write a lot of VBA, so this was pretty quick to knock up for me.  The code itself looks like this:

Sub UpdatePQ()
Dim vFolder As Variant
Dim wsParameters As Worksheet
Dim wsData As Worksheet
'Set the worksheets
Set wsParameters = Worksheets("Parameters")
Set wsData = Worksheets("Data")
'Update the folder to import
vFolder = BrowseForFolder
If CStr(vFolder) = "False" Then
'No folder chosen, exit the routine
Exit Sub
End If
'Update parameter table with folder
wsParameters.Range("B2").Value = vFolder
wsData.Range("A1").ListObject.QueryTable.Refresh BackgroundQuery:=False
End Sub

In short, this routine basically:

  • Sets references to the two worksheets
  • Prompts the user for the folder.  The output here could be a folder path or “False” which means the user cancelled.  If it’s “False” we bail out
  • If the path was provided, it gets placed in B2 of the Parameter worksheet (which is the “Value” field for the “File Path” variable.  (So if you didn’t create your table starting in A1, then you’d need to update this.)
  • Finally, we refresh the Power Query output table which I landed in cell A1 of the Data worksheet.

Final Setup Step

The very last step is to link the UpdatePQ macro to a button.  This is fairly straight forward as well, and the steps are illustrated well in the Refresh Power Query with VBA post.

Make Power Query Prompt for a Folder at Refresh

And now, the only thing we need to do is click a button…

image

Choose a folder…

SNAGHTML2525f706

And the query will refresh the data in the landing page.

Try it out

You can download the completed workbook, as well as two folders with the source data, from my Skydrive. Click here to do so.

Caveats

I hope it goes without saying that the data structure of the files in the selected folders must be the same.  If they’re not, the query load will fail, of course.  But so long as you’re just trying to deal with different months/years, this should work nicely.

Master Your Excel Data: Live Courses

I’m pleased to announce that I’m just opened registration for 3 days of Excelguru live training courses in May 2015.

Master Your Excel Data

This course will focus on using tables, PivotTables and Power Query to master your Excel data.  I’m super stoked to be teaching this, as I think Power Query is just one of the hottest tools out there right now.  Based on it’s incredible ability to set data into a useful format it only makes sense to blend it with a bit of PivotTables to show how we can take nasty data and turn it into something beautiful.

It will be a full day, hands-on experience, and I’ll be running it at the following dates/times:

  • Victoria, BC – May 11, 2015 (9:00-4:30)
  • Kelowna, BC – May 14, 2015 (9:00-4:30)

To learn more and register, click here:  http://www.excelguru.ca/content.php?291-Live-Course-Master-Your-Excel-Data

Financial Model Design

This course will focus on the steps required to build a solid financial model.  From planning to architecture and implementation, we’ll look at both theory and the tools to help it become a reality.  More than that, we’ll build a solution that is not only easy to audit, but also focus on making it stand the test of time.

It will also be a full day, hands-on experience, and I’ll be running it at the following dates/times:

  • Victoria, BC – May 12, 2015 (9:00-4:30)

To learn more and register, click here:  http://www.excelguru.ca/content.php?292-Live-Course-Financial-Model-Design

Not Coming to Your Area?

For reference, if either of these (or any other course) is something you’d like to see in your area, I’d love to hear about it. Drop us a note and let us know, as we’ll be trying to reach out further and offer more courses as the year progresses.

Power Query Errors: Please Rebuild This Data Combination

I got sent this today from a friend.  He was a bit frustrated, as he got a message from Power Query that read “Formula.Firewall: Query 'QueryName' (step 'StepName') references other queries or steps and so may not directly access a data source. Please rebuild this data combination.”

What Does This Mean?

This message is a bit confusing at first.  For anyone who has worked with Power Query for a bit, you know that it’s perfectly acceptable to merge two queries together.  So why when you do some merges does Power Query throw this error?

image

The answer is that you cannot combine an external data source with another query.

Looking At Some Code

Let’s have a quick look at the code that my friend sent:

image

Notice the issue here?  The Merge step near the end references a query called DimShipper.  No issue there.  But it tries to merge that to an external data source which is called in the first line.

This is actually a bit irritating.  But let’s break this down a bit further.  Looking at the first line, it is made up as follows:

Source=Excel.Workbook(File.Contents(Filename())),

The Filename() portion is calling a function to return the desired filename from a workbook table, (based somewhat on this approach.)  We already know that works, but we also know that this is definitely pulling data from an external workbook (even it the file path is for this current workbook!)  And to be fair, this would be the same if we were pulling from a web page, database or whatever.  As long as it’s an external source being combined with another query, we’re going to get smacked.

So it’s suddenly starting to become a bit clearer what Power Query is complaining about (even if we think it’s frustrating that it does complain about it!)

Let’s “rebuild this data combination”

Right.  Who cares why, we care how to deal with this.  No problem.  It’s actually fairly straightforward.  Here’s how I dealt with it:

  • Right click the Query in Excel’s Query window
  • Choose Edit

I’m now in the Power Query window.  On the left side, I found the Queries pane and clicked the arrow to expand it:

image

Duplicate the Existing Query

The query in question here is “Purchase”…

  • Right click “Purchase” and choose Duplicate
  • Immediately rename the query to PurchaseList
  • Go to View –> Advanced Editor
  • Selected everything from the comma on the second line down to the last row of the query:

image

  • Press Delete
  • Change the final line of “Merge” to “Purchase_Sheet”

Perfect… so my new PurchaseList query looks like this:

image

  • Click Done

This query will now load.  Even though it is pointing to an external data source, it’s not being combined with anything else.  So it’s essentially just a data stage.

Modify the Existing Query

Now we need to go back and modify our existing query.  So in that left pane we’ll again select the Purchase query.

  • Go to View –> Advanced Editor
  • Select the first two lines and delete them
  • Put a new line after the let statement that reads as follows

Source = PurchaseList,

NOTE:  Don’t forget that comma!

And what we end up with is as follows:

image

So What’s The Difference?

All we’ve really done is strip the “external” data source and put it in it’s own query.  Yet that is enough for Power Query to be happy.  The new Purchase query above now has two references that it is comfortable with, and it works.  (And that’s probably the main thing.)

Designing To Avoid This Issue

I make it a practice to land all of my data sources into specific “Staging Tables”, which are set to load as connections only.  From there I build my “finalization” tables before pushing them into my Data Model (or worksheet).  You can visualize it like this:

SNAGHTML409dfd44

The key takeaways here are:

  • I always go from data source into a Staging Query (and do a bit of reshaping here)
  • My staging queries are always set to load to “Connection Only” so that the aren’t executed until called upon
  • All combining of data from disparate sources is done in queries that reference other queries, not external data sources

I’ve found this approach works quite well, and always avoids the “rebuild this data combination” error.

Consolidating Worksheet Print Areas With Power Query

I’ve posted on consolidating worksheets before, but in a task I got asked to do last week, I took a slightly different approach.  This approach allows consolidating worksheet print areas to grab the data that I need.

Background

In the previous post, I used some custom M code to reach outside the current workbook, and attach to the cached copy of the same workbook to read the data.  It works, but it’s a bit ridiculous that you need to.  One of the problems with that approach is that new sheets don’t show up in your solution until you save the workbook and refresh the query.  Not a big issue if you safeguard against it, but it’s a gotcha you want to be aware of.

In the situation I was working through, I wanted to consolidate some of the worksheets in the workbook, and also wanted to “future proof” it for new sheets that would be created later.  But I also wanted to make sure that all sheets would be pulled into the solution, even if they’d been newly created and the workbook hadn’t been saved.

So my data (which you can download here) looks like this:

image

Yes, that’s right.  It’s pivoted, even though it’s not in a Pivot Table.  And I’ve got one sheet per month.  Naturally we’d like to turn this into a proper data source.

My Approach

I decided to go seriously hardcore, and just went right into a blank query:

  • Power Query –> From Other Sources –> Blank Query

Not much to see yet (it is a blank query, after all) but I typed the following into the formula bar (which you can activate from the “View” tab if it’s not there.)

  • =Excel.CurrentWorkbook()
  • Enter

SNAGHTML1be24864

And check that out… I set some print areas!

What is in Excel.CurrentWorkbook()?

The Excel.CurrentWorkbook() command (and yes it is case sensitive) will give you a list of all tables, named ranges and power queries that are stored in your workbook.  And, as it happens, I can take advantage of this.  Because the print range is a named range, I can employ that to pick up my data.

Now, to be fair, I TOTALLY wish that Excel.CurrentWorkbook() also included all the raw worksheets.  That would be awesome.  You can get them by going to Excel.Workbook(File.Contents(full_file_path)) but that means it grabs them from the last saved version of the file in the workbook path.  Not ideal if your data is “live”.

At any rate, I’m good now.  So long as my users copy the previously set up sheet each month, and rename it to the current month, it will inherit the print area.  I now have something I can work with.

Consolidating Worksheet Print Areas

So let’s make use of this…

Something we need to recognize is that – as soon as we load this query to the worksheet – we are going to get a query and a table showing up in this list.  So let’s protect against that.  Click the Filter area on the “Name” column, and filter to records that contain “Print_Area”.

image

Now I don’t actually need the worksheet name for anything here, so I can

  • Select the Name column –> Right Click –> Remove
  • Click the Expand button (at the top of the the Content column)
  • Un-check the option to use original column name as prefix and click OK

And I’m left with this:

SNAGHTML1c49ca60

So now it’s just time for cleanup.  Let’s do that:

  • Go to Add Column –> Add Custom Column
  • Call the column Date and enter the following formula:
    • if Text.Start([Column1],5) = "Sales" then [Column3] else null
  • Select the Date column –> Transform –> Fill Down
  • Go to Home –> Remove Rows –> Remove Top Rows –> 2
  • Go to Transform –> Use First Row as Headers
  • Rename:
    • “Column1” to “Major Group”
    • “Column2” to “SubClass”
    • “Column7” to “Date”
  • Select the “Date” column –> Transform –> Data Type –> Date
  • Select “Major Group” column
    • Go to Transform –> Fill Down
    • Filter to only rows that “does not contain” “Sales”
  • Select “SubClass”
    • Filter to remove “null” records
  • Remove the “Total” column
  • Select the “Major Group”, “SubClass” and “Date” columns
  • Go to Transform –> Unpivot Columns –> Unpivot Other Columns
  • Rename
    • “Attribute” to “Category”
    • “Value” to “Units Sold”
  • Rename the query to “Sales” and load it to the worksheet

Create a Simple Pivot

Now that we have our data, we can create a simple Pivot Table from it:

image

Which is great, but what about next month?  Let’s see what happens…

  • Right click the “Feb Sales” worksheet –> Move or Copy
  • Check the “Create a copy” checkbox and click OK
  • Change C1 to 3/31/2015

Yuck… we’re simulating that we can’t trust our users to even rename the worksheet properly, but the do get the data right.  Now let’s see how our query reacts…

Go back to the Power Query worksheet, refresh the query, and the PivotTable if necessary…

image

Awesome!

What’s Safe?

This worked because we only have the print area defined for our input sheets.  If someone added a print area on other worksheets we’d have to add additional/adjust our logic to deal with that, as we’d get data in our query that possibly shouldn’t be there.

If someone adds something outside the print area, then you have an issue.  In those cases a table may be a better option to start with.  And naturally tables also show up in the Excel.CurrentWorkbook() call.  Having said that, it’s harder to control the default names there, so that could be an issue for you.

If someone modifies the shape of a range, that could also be a problem.  Say a new column gets added or something.

At the end of the day you need to think through an anticipate the scope of what your users are likely to do, and what shape the data will likely stay in (or not) so that you can plan accordingly.  In the scope of the solution that I built, I’m fairly certain it will work without a challenge.

Combine Multiple Excel Workbooks in Power Query

I got a comment on a previous post today, which made me realize I’d promised this but never posted it.  So let’s look at how to combine multiple workbooks together in Power Query, providing they have the same format.  Yes, it’s been covered before, (even linked to in the comments of the previous posts,) but I’m going to put my own flavour on it.  By the time we’re done, you’ll see how similar it is to working with non-Excel files.

Background

For our example we’re going to assume that we have four (or more) Excel files which you can download here.  I’ve stored in a folder called “Data”, which is a subfolder of the “Combine Workbooks” folder (more on why I store them in a subfolder a little later.)

Each file has a similar structure, which looks like this:

SNAGHTML1984743a

Notice that no one has bothered to set up a table style or anything, they are just raw worksheets of data.  Having said that, they are consistent in the fact that

  • The data starts in row 5
  • Each files is set up across same number of columns
  • The column headers and data types are consistent across files

(Just as a quick note, if they DID have tables set up, that would be okay too.  I’m just demoing that it isn’t necessary.)

The End Goal

The end goal we’re after is fairly common.  We basically want to grab the data from each file, strip out the first 4 rows, and append the tables together (without repeating header info.)  This will essentially create a nice data source that we can use in PivotTables, charts and other tools.

The Process

My preferred tool to combine multiple workbooks into one data source – where it used to be VBA – is most definitely Power Query today.  And here’s the steps we need to put together to make it work.

  1. Import a single workbook
  2. Convert it to a function
  3. Import all file contents (using our function)
  4. Combine all the data

Step 2 does involve a VERY minor manipulation of M code today, but as you’ll see it’s quite easy.

Let’s Combine Multiple Excel Workbooks

Step 1: Import a single workbook

To begin we’ll go to the Power Query menu and choose:

  • From File –> From Excel –> Sales-July2014.xlsx
  • Select Sheet1 and choose to Edit it

You’ll now see your query showing in the Query Editor:

SNAGHTML199ca279

We’ll need to do a bit of cleanup here to get the data just the way we need it:

  • Home –> Remove Rows –> Remove Top Rows –> 4 –> OK
  • Transform –> Use First Row As Headers
  • Select InventoryID and SalesPersonID –> Transform –> Data Type –> Whole Number
  • Select Cost, Price and Commission –> Transform –> Data Type –> Decimal Number
  • Select Date –> Transform –> Data Type –> Date
  • Select Date –> Home –> Remove Errors

That last one might be a bit odd, but I like to do that to my date columns.  The reason is that this protects me when I stack another table and it has headers.  I know that converting text to a date format will throw an error, so I’m guaranteed that any subsequent header rows will be nuked out.

At this point we’ve got a nice tidy import that would look pretty good if we decided to land it in a workbook.

Next up…

Step 2: Convert it to a function

Converting our nice query to a function is actually SUPER easy.  To begin, while still in the Power Query editor, we need to go to the View tab and click Advanced Editor.

When we do, we’ll see code similar to this, with the highlighted portion being the most important part:

SNAGHTML19b495d2

Okay, now follow carefully:  Right before the let statement at the very beginning, type:

(filepath)=>

The ()=> indicate to Power Query that this is a function, not a regular query.  And the “filepath” is the name of a parameter that we want to pass to the function.

The second part is that we want to replace the entire hard coded file path highlighted in yellow in the image above – including the quotes – with the name of our variable.  When we do, the lead three lines should look like this:

image

That’s all the code editing you need to do.  Let’s finalize this.  Click Done.  At which point you’ll see this:

image

No too inspiring or exciting really, but it IS what you want.  Final thing to do here is give the function a better name than Sheet1.  I’m going to use “fnGetContents”.  Choose your name, replace Sheet1 in the name box, then click File –> Close and Load.

If the query shows up in your Queries Pane saying “Load is Disabled”, don’t freak out.  That’s just a really poorly worded message that is indicating it is only a connection.  It will still work.  :)

You’ll also notice that, even though we imported this file, NOTHING has landed in any worksheet.  Huh?

Oddly enough, that’s as designed…

Step 3: Import all file contents (using our function)

All right, now it’s show time.  Let’s make some magic happen.  Let’s go get all of the files we need:

  • Go to the Power Query tab –> From File –> From Folder
  • Browse and select the folder that holds the data files
  • Click OK

You’ll now end up in the Query Editor and be staring at this:

SNAGHTML19d697a1

Now, normally we’d be tempted to click the double arrows beside the Content header to combine them all… but that doesn’t work here.  (It just returns the first file when we’re working with Excel files.)  So this is where we need to resort to our function.

  • Go to Add Column –> Add Custom Column
  • Enter the following:  fnGetContents([Folder Path]&[Name])
  • Click Enter
  • Right click the new “Custom” column –> Remove Other Columns

Almost done!

Step 4: Combine all the data

You should now be seeing:

image

Click the little double headed arrow, turn off the “Use original column name as prefix”, and click OK.

You should now have a nice table of data.  Give it a name, click Close and Load and you’re done!

Final Thoughts

Working with Excel files is pretty easy once you realize how easy creating that function really is.

One caveat though… If you store your “consolidating” workbook in the same folder as your data files, your fnGetContents() function will throw an error.  The reason for this is that your consolidation workbook will also get pulled in to the query, as well as the temp file version (prefixed with a ~).  You’ll need to filter out both files.