If you've wanted to use Power Query to combine Excel files with a single click - like you could for TXT and CSV files - the feature is finally here*. The Combine Binaries feature has been re-written (as I discussed yesterday), and it now allows for easy combination of Excel files.
* This new feature in the O365 Fast Insider preview today and in Power BI Desktop's November update. I'm expecting to see this show up in the January Power Query update for Excel 2010/2013.
Just a quick caveat, I'm going to cover the items specific to the Combine Excel Files experience here, and not dive deep into the methods of how to modify and adjust the created queries. For that reason, if you haven't read yesterday's post, I highly recommend doing that first.
The Classic Combine Excel Files Experience
For anyone who has tried to combine Excel files in the past, you'll be familiar with this. You create a new query to pull From File --> From Folder and navigate to the folder full of Excel files. Then you hopefully click the Combine Binaries button:
And you get this:
Obviously that's not so helpful. The answer to deal with this was to go to Add Column --> Add Custom Column and use the formula =Excel.Workbook([Content]) to convert the data into tables. Then some filtering and more user interface driven work was required in order to get your data. The experience was a bit painful, and certainly not for beginner users.
The New Combine Excel Files Experience
To start with, I'm going to take two Excel files with sales transactions in them and combine them easily using the new combine Excel Files experience. Here's the file characteristics:
- Each file has a small set of data (2-3 rows)
- The data is stored on Sheet 1 in a table
- Both files are stored in the same folder
To get the data, I'm going to do the following:
- Get Data using a new query "From File --> From Folder"
- Browse to and select the folder
- Click Edit at the preview window
As you can see, we've got a couple of Excel files here:
So I'll click the Combine Binaries button (at the top right of the Content column.)
And where this would have triggered an error in the past, it now kicks out a preview:
And what happens next depends on what you select (which is why they are numbered above.)
Combine Excel Files - Method 1
For the first kick at this, I'll select the Sample Binary Parameter 1 folder icon (indicated by the number 1 in the Combine Binaries preview.
Nothing will ever show in the preview window, but upon selecting the folder I can click OK, which will result in this:
As I showed in yesterday's post on the new Combine Binaries Experience, we now get a bunch of new queries and a few steps along the way. The end result of this query, however, is a listing of all the worksheets, tables and ranges in each workbook. This is the view that will allow you to go back and granularly pick out what you need and transform it. In other words, this is kind of the detailed hard core view which was the equivalent of writing the custom columns that we used to have to do.
Because this is so similar to the classic method, I'm not going to do much more with this. The real point was to expose that selecting the folder in the preview window will bring you to this setup.
Combine Excel Files - Method 2
Method 2 revolves around selecting the table in the preview window; in this case the Sales table. When we select that branch in the preview window we WILL get a preview of the data:
And when we click OK, we actually get the data combined nicely:
As discussed in the previous post, if we wanted to modify and/or change:
- The Source columns (Source.Name or others): We modify the Removed Other Columns1 step in this query.
- The data before it is imported and combined: We modify the Transform Sample on the left side.
Now this section is MUCH easier than what we used to have to do!
Combine Excel Files - Method 3
But what if your data is not in an official Excel Table? What if it's just data in a worksheet range? Well, then you select the worksheet icon instead:
And the results are virtually identical to the previous method:
Why does that work? It works because the Transform Sample is smart enough to automatically promote the first row to headers, so it doesn't actually need the table. On the other hand, if that data wasn't in the first row, you may need to go back to the Transform Sample and tweak it to do what you need (like remove top rows, promote rows to headers, delete columns, filter, etc.)
Caveats When Trying to Combine Excel Files
This experience will work well for many things, but as always there are some caveats.
Single Object Only
The default experience here is to combine based on the object you select. In other words, if you select Sheet 1, it will combine Sheet 1 from each file. It won't combine all sheets in the file based on the Sheet 1 code. If you want to do that, you need to go back to Method 1 above, filter to the objects you want, and deal with them using classic import methods. (Unless you try to get real techy and build the function then repurpose it to use in that table - something I have not done yet.)
Preview Based On First Item In the List
The preview and import templates are based on the first file in the list. You can see how that affects things when I throw a new file into my structure that has different worksheet and table names:
While the two Sales workbooks have Sheet1 in them, this one doesn't, making it impossible to use this function to combine the first worksheet in each file. (The template would be based on Dec and would return errors for the other two files.)
If the order is important, you'll need to sort the file list first to get the correct file to the top before you trigger the Combine Binaries function.
For the record, I have sent an email to the Power Query team suggesting that it would be nice to get the option to pick the file here which the template should be based upon. That would make this much easier to work through, I think.
Inconsistent Columns Are Bad News
Let's say that you have two files with tables in them who have different column names (or quantities). The transformations generated will actually deal with this correctly, resulting in a column of tables which have different headers. All good so far, but when the main query gets to the last step, it expands the column of tables based on the headers for the table in the first row only. This is actually a standard thing, so not a surprise, I just want to make sure you don't think this is a holy grail that will solve the differing column issue. (We have more work to do in that case.)
At the end of the day, I have to say that this is a pretty welcome addition. I'm still not a fan of the names of the generated queries, and I would add something to change the template file, but I think this is going to make it a LOT easier for people to import and transform Excel files than it has been in the past.