I got sent this today from a friend. He was a bit frustrated, as he got a message from Power Query that read “Formula.Firewall: Query 'QueryName' (step 'StepName') references other queries or steps and so may not directly access a data source. Please rebuild this data combination.”
What Does This Mean?
This message is a bit confusing at first. For anyone who has worked with Power Query for a bit, you know that it’s perfectly acceptable to merge two queries together. So why when you do some merges does Power Query throw this error?
The answer is that you cannot combine an external data source with another query.
Looking At Some Code
Let’s have a quick look at the code that my friend sent:
Notice the issue here? The Merge step near the end references a query called DimShipper. No issue there. But it tries to merge that to an external data source which is called in the first line.
This is actually a bit irritating. But let’s break this down a bit further. Looking at the first line, it is made up as follows:
The Filename() portion is calling a function to return the desired filename from a workbook table, (based somewhat on this approach.) We already know that works, but we also know that this is definitely pulling data from an external workbook (even it the file path is for this current workbook!) And to be fair, this would be the same if we were pulling from a web page, database or whatever. As long as it’s an external source being combined with another query, we’re going to get smacked.
So it’s suddenly starting to become a bit clearer what Power Query is complaining about (even if we think it’s frustrating that it does complain about it!)
Let’s “rebuild this data combination”
Right. Who cares why, we care how to deal with this. No problem. It’s actually fairly straightforward. Here’s how I dealt with it:
- Right click the Query in Excel’s Query window
- Choose Edit
I’m now in the Power Query window. On the left side, I found the Queries pane and clicked the arrow to expand it:
Duplicate the Existing Query
The query in question here is “Purchase”…
- Right click “Purchase” and choose Duplicate
- Immediately rename the query to PurchaseList
- Go to View –> Advanced Editor
- Selected everything from the comma on the second line down to the last row of the query:
- Press Delete
- Change the final line of “Merge” to “Purchase_Sheet”
Perfect… so my new PurchaseList query looks like this:
- Click Done
This query will now load. Even though it is pointing to an external data source, it’s not being combined with anything else. So it’s essentially just a data stage.
Modify the Existing Query
Now we need to go back and modify our existing query. So in that left pane we’ll again select the Purchase query.
- Go to View –> Advanced Editor
- Select the first two lines and delete them
- Put a new line after the let statement that reads as follows
Source = PurchaseList,
NOTE: Don’t forget that comma!
And what we end up with is as follows:
So What’s The Difference?
All we’ve really done is strip the “external” data source and put it in it’s own query. Yet that is enough for Power Query to be happy. The new Purchase query above now has two references that it is comfortable with, and it works. (And that’s probably the main thing.)
Designing To Avoid This Issue
I make it a practice to land all of my data sources into specific “Staging Tables”, which are set to load as connections only. From there I build my “finalization” tables before pushing them into my Data Model (or worksheet). You can visualize it like this:
The key takeaways here are:
- I always go from data source into a Staging Query (and do a bit of reshaping here)
- My staging queries are always set to load to “Connection Only” so that the aren’t executed until called upon
- All combining of data from disparate sources is done in queries that reference other queries, not external data sources
I’ve found this approach works quite well, and always avoids the “rebuild this data combination” error.