There was a really cool new feature added in the latest Power Query update: The ability to split Power Queries. This is something that has always been possible by editing the M code manually, but this makes it super simple.
Where this is useful
Where can this be super helpful is when you’ve built a nice query to reshape your data. It runs fine for a bit, then you realize that you need to create a larger data model out of the data. For example, assume we have this data:
And we run it through these steps:
To come up with this:
All of this is fairly easy as Power Query goes, but now the model needs to grow. In order to expand it, we also want to create a table of unique Inventory Items and a table of unique Sales people. Basically we want to de-aggregate the data that should have come in from separate tables in the first place.
Methods to Split Power Queries
Like always, there are a variety of ways to do this. You could create new queries to load the data from the original table, then cut it down to the columns needed in each case. But that causes and extra load requirement.
You could manually cut the code up to the step required, create a new blank query, then reference the new query from the previous. But that takes some knowhow and tinkering that many people won’t be comfortable with.
Starting in Power Query version 2.26 (released today), we have a MUCH easier way. Let’s assume that we want to split this query right after the Changed Type step, so that we can create an Items table and a Saleperson table in addition to the Transactions query that we already have.
How to Split Power Queries – the easy way
To start, we need to understand the steps and what they give us. We can step through each step of the query, and find the step that gives us the jumping off point we need. In my case, this is the Changed Type step. We then right click the step AFTER Change Type, and choose Extract Previous:
You’ll be prompted to enter a name (I’ll use “Base Load”), and upon clicking OK, you’ll see a couple of things happen:
- A Base Load query is created
- The Queries Navigator opens on the left, showing you’ve now got multiple queries
- The Transactions query (the one I’ve been working on) gets shorter
- The Transactions query’s Source Step gets updated to #”Base Load”
You can see these changes here:
So the Transactions query still contains all the subsequent steps, but the Source step changed, and the Changed Type step is now in the Base Load query:
The biggest piece of this whole thing, however, is that the Base Load query is still pointing to the raw source table, but the Transactions query now loads from Base Load, NOT the original data source. So it’s following the staging/loading approach I talk about in this blog post.
Now, how can we use this…?
Making Use of the Split Power Queries
So far, very little is different to the overall goal, except that we load in two stages. Let’s change that by creating a new query that references the Base Load query:
- Right click the Base Load query in the Query Navigator (at left)
- Choose Reference
- Change the query name to Salespeople
- Right click the Sold By column –> Remove Other Columns
- Select the Sold By column –> Home –> Remove Duplicates
- Right click the Sold By column –> Rename –> SalesPerson
And we’ve now got a nice query that shows our unique list of sales people:
Now let’s build the SalesItems table:
- Right click the Base Load query in the Query Navigator
- Choose Reference
- Change the query name to SalesItems
- Right click the Inventory Item column –> Remove Other Columns
- Select the Inventory Item column –> Home –> Remove Duplicates
- Right click the Inventory Item column –> Rename –> SalesItem
And this table is done now as well:
Loading the Split Power Queries to the Data Model
The final step is to load these to the Data Model. We’ve actually created three new queries in this session, but we don’t get the liberty of choosing a new destination for one of them. Instead, we get to choose a single loading style that will be applied to ALL of them. (If in doubt, I’d suggest that you load queries as Connection Only first, then change them after if you need to pick different destinations. This will save you waiting while Power Query litters your workbook with extra worksheets and loads the data to them.)
For our purposes here, I’ll load them directly to the Data Model:
- Home –> Close & Load To…
- Select Only Create Connection
- Select Add to the Data Model
- Click Load
The only side effect here is that that the Base Load query was also loaded to the data model, and I don’t need that. So I’ll now quickly change that.
- Go to the Workbook Queries pane –> right click Base Load –> Load To…
- Uncheck “Add this data to the Data Model” –> Load
And I’ve now got my tables where I need them so that I can relate them and build my solution.
This is a fantastic feature, and I was actually going to blog on how to do this the old way, until they surprised me with this update. I’m a huge fan of the staging/loading approach, and this will certainly make it easier to retrofit a query after it’s already been built.