I got a comment on a previous post today, which made me realize I’d promised this but never posted it. So let’s look at how to combine multiple workbooks together in Power Query, providing they have the same format. Yes, it’s been covered before, (even linked to in the comments of the previous posts,) but I’m going to put my own flavour on it. By the time we’re done, you’ll see how similar it is to working with non-Excel files.
For our example we’re going to assume that we have four (or more) Excel files which you can download here. I’ve stored in a folder called “Data”, which is a subfolder of the “Combine Workbooks” folder (more on why I store them in a subfolder a little later.)
Each file has a similar structure, which looks like this:
Notice that no one has bothered to set up a table style or anything, they are just raw worksheets of data. Having said that, they are consistent in the fact that
- The data starts in row 5
- Each files is set up across same number of columns
- The column headers and data types are consistent across files
(Just as a quick note, if they DID have tables set up, that would be okay too. I’m just demoing that it isn’t necessary.)
The End Goal
The end goal we’re after is fairly common. We basically want to grab the data from each file, strip out the first 4 rows, and append the tables together (without repeating header info.) This will essentially create a nice data source that we can use in PivotTables, charts and other tools.
My preferred tool to combine multiple workbooks into one data source – where it used to be VBA – is most definitely Power Query today. And here’s the steps we need to put together to make it work.
- Import a single workbook
- Convert it to a function
- Import all file contents (using our function)
- Combine all the data
Step 2 does involve a VERY minor manipulation of M code today, but as you’ll see it’s quite easy.
Let’s Combine Multiple Excel Workbooks
Step 1: Import a single workbook
To begin we’ll go to the Power Query menu and choose:
- From File –> From Excel –> Sales-July2014.xlsx
- Select Sheet1 and choose to Edit it
You’ll now see your query showing in the Query Editor:
We’ll need to do a bit of cleanup here to get the data just the way we need it:
- Home –> Remove Rows –> Remove Top Rows –> 4 –> OK
- Transform –> Use First Row As Headers
- Select InventoryID and SalesPersonID –> Transform –> Data Type –> Whole Number
- Select Cost, Price and Commission –> Transform –> Data Type –> Decimal Number
- Select Date –> Transform –> Data Type –> Date
- Select Date –> Home –> Remove Errors
That last one might be a bit odd, but I like to do that to my date columns. The reason is that this protects me when I stack another table and it has headers. I know that converting text to a date format will throw an error, so I’m guaranteed that any subsequent header rows will be nuked out.
At this point we’ve got a nice tidy import that would look pretty good if we decided to land it in a workbook.
Step 2: Convert it to a function
Converting our nice query to a function is actually SUPER easy. To begin, while still in the Power Query editor, we need to go to the View tab and click Advanced Editor.
When we do, we’ll see code similar to this, with the highlighted portion being the most important part:
Okay, now follow carefully: Right before the let statement at the very beginning, type:
The ()=> indicate to Power Query that this is a function, not a regular query. And the “filepath” is the name of a parameter that we want to pass to the function.
The second part is that we want to replace the entire hard coded file path highlighted in yellow in the image above – including the quotes – with the name of our variable. When we do, the lead three lines should look like this:
That’s all the code editing you need to do. Let’s finalize this. Click Done. At which point you’ll see this:
No too inspiring or exciting really, but it IS what you want. Final thing to do here is give the function a better name than Sheet1. I’m going to use “fnGetContents”. Choose your name, replace Sheet1 in the name box, then click File –> Close and Load.
If the query shows up in your Queries Pane saying “Load is Disabled”, don’t freak out. That’s just a really poorly worded message that is indicating it is only a connection. It will still work. 🙂
You’ll also notice that, even though we imported this file, NOTHING has landed in any worksheet. Huh?
Oddly enough, that’s as designed…
Step 3: Import all file contents (using our function)
All right, now it’s show time. Let’s make some magic happen. Let’s go get all of the files we need:
- Go to the Power Query tab –> From File –> From Folder
- Browse and select the folder that holds the data files
- Click OK
You’ll now end up in the Query Editor and be staring at this:
Now, normally we’d be tempted to click the double arrows beside the Content header to combine them all… but that doesn’t work here. (It just returns the first file when we’re working with Excel files.) So this is where we need to resort to our function.
- Go to Add Column –> Add Custom Column
- Enter the following: fnGetContents([Folder Path]&[Name])
- Click Enter
- Right click the new “Custom” column –> Remove Other Columns
Step 4: Combine all the data
You should now be seeing:
Click the little double headed arrow, turn off the “Use original column name as prefix”, and click OK.
You should now have a nice table of data. Give it a name, click Close and Load and you’re done!
Working with Excel files is pretty easy once you realize how easy creating that function really is.
One caveat though… If you store your “consolidating” workbook in the same folder as your data files, your fnGetContents() function will throw an error. The reason for this is that your consolidation workbook will also get pulled in to the query, as well as the temp file version (prefixed with a ~). You’ll need to filter out both files.