Power Query Errors: Please Rebuild This Data Combination

I got sent this today from a friend.  He was a bit frustrated, as he got a message from Power Query that read “Formula.Firewall: Query 'QueryName' (step 'StepName') references other queries or steps and so may not directly access a data source. Please rebuild this data combination.”

What Does This Mean?

This message is a bit confusing at first.  For anyone who has worked with Power Query for a bit, you know that it’s perfectly acceptable to merge two queries together.  So why when you do some merges does Power Query throw this error?

image

The answer is that you cannot combine an external data source with another query.

Looking At Some Code

Let’s have a quick look at the code that my friend sent:

image

Notice the issue here?  The Merge step near the end references a query called DimShipper.  No issue there.  But it tries to merge that to an external data source which is called in the first line.

This is actually a bit irritating.  But let’s break this down a bit further.  Looking at the first line, it is made up as follows:

Source=Excel.Workbook(File.Contents(Filename())),

The Filename() portion is calling a function to return the desired filename from a workbook table, (based somewhat on this approach.)  We already know that works, but we also know that this is definitely pulling data from an external workbook (even it the file path is for this current workbook!)  And to be fair, this would be the same if we were pulling from a web page, database or whatever.  As long as it’s an external source being combined with another query, we’re going to get smacked.

So it’s suddenly starting to become a bit clearer what Power Query is complaining about (even if we think it’s frustrating that it does complain about it!)

Let’s “rebuild this data combination”

Right.  Who cares why, we care how to deal with this.  No problem.  It’s actually fairly straightforward.  Here’s how I dealt with it:

  • Right click the Query in Excel’s Query window
  • Choose Edit

I’m now in the Power Query window.  On the left side, I found the Queries pane and clicked the arrow to expand it:

image

Duplicate the Existing Query

The query in question here is “Purchase”…

  • Right click “Purchase” and choose Duplicate
  • Immediately rename the query to PurchaseList
  • Go to View –> Advanced Editor
  • Selected everything from the comma on the second line down to the last row of the query:

image

  • Press Delete
  • Change the final line of “Merge” to “Purchase_Sheet”

Perfect… so my new PurchaseList query looks like this:

image

  • Click Done

This query will now load.  Even though it is pointing to an external data source, it’s not being combined with anything else.  So it’s essentially just a data stage.

Modify the Existing Query

Now we need to go back and modify our existing query.  So in that left pane we’ll again select the Purchase query.

  • Go to View –> Advanced Editor
  • Select the first two lines and delete them
  • Put a new line after the let statement that reads as follows

Source = PurchaseList,

NOTE:  Don’t forget that comma!

And what we end up with is as follows:

image

So What’s The Difference?

All we’ve really done is strip the “external” data source and put it in it’s own query.  Yet that is enough for Power Query to be happy.  The new Purchase query above now has two references that it is comfortable with, and it works.  (And that’s probably the main thing.)

Designing To Avoid This Issue

I make it a practice to land all of my data sources into specific “Staging Tables”, which are set to load as connections only.  From there I build my “finalization” tables before pushing them into my Data Model (or worksheet).  You can visualize it like this:

SNAGHTML409dfd44

The key takeaways here are:

  • I always go from data source into a Staging Query (and do a bit of reshaping here)
  • My staging queries are always set to load to “Connection Only” so that the aren’t executed until called upon
  • All combining of data from disparate sources is done in queries that reference other queries, not external data sources

I’ve found this approach works quite well, and always avoids the “rebuild this data combination” error.

49 thoughts on “Power Query Errors: Please Rebuild This Data Combination

  1. That's super Ken.

    I have a scenario where I can only connect to source data when I am on-site. I extract basic Customer Data (Code & Name) and load it to a worksheet. This is enhanced with additional classifications that are not available in the basic app. The Customer table loaded to the model is a merge of the worksheet that came from the external source and the Additions table.

    Of course I could not load the Customers to the model without being on-site. Creating a "Staging Query" from the worksheet that was loaded with the external data did the trick.

    Many thanks.

  2. Thanks for the explanation - was not clear how I needed to solve that problem from the error description.

    One question - in your other posts you mention query folding - to which I think you mean that powerquery is building the sql command and submitting it back to the data source to actually run the query. Seems ideal.

    So I am wondering if your approach of using connection only staging queries still offers the benefits of query folding, or is the query now run by power query rather than the data source? Hope that makes sense

  3. Hi Charlie,

    Your understanding of Query Folding is correct, yes. Power Query rolls up it's own SQL statement and sends it to the database.

    With regards to the staging query, they will still offer the benefits of Query Folding, as they are pulling from the database. Any queries that refer to the staging queries, however, will not. So the majority of your filtering should be done in the staging query, before you get into serious reshaping at the next stage.

  4. Thanks and btw - is there a way to enable email notification when someone replies to my post - other blogs have that feature and it makes it easier rather than having to frequently check the post for replies

  5. Hi Charlie,

    Not yet. Great feedback though, and I'll see if I can add that to the blog feature set when I have a bit of time. 🙂

  6. let

    FullPath=RawDataFolderPath{0}[RawDataFolderPath]&"\STEP.xls",
    Source = Excel.Workbook(File.Contents(FullPath))

    in
    Source

    I get this same error on the query above. The RawDataFolderPath query pulls a table that includes the text of the file path (excluding the file name itself) and then I append the name of the file. The first line correctly creates the full path as text, but throws an error when I try to actually access the file at that path.

    Any suggestions? The RawDataFolderPath query is essentially a "staging query" as described in this post, right?

    Thanks!

  7. Ken, I am trying to utilize a parameter table in my spreadsheet to pass variables to an SQL statement that is pulling from Teradata. The Teradata query works great. The Parameter table works. But when I put them together by using ..."&Text.From(fnGetParameter("Period")&"... I get the rebuild this data combination error.

    Any suggestions on how to get around this?

    Thanks in advance.

  8. Two options here.

    You can try declaring a variable at the top of your query to load the value from fnGetParameter like this:
    myPeriod=fnGetParameter("Period"),
    Then using that variable in place of the function call in the middle of your document like this:
    ..."&Text.From(myPeriod&"...

    That may help. If not, you may have to break the queries down and load them into staging queries first. Just be warned here... as soon as your first line of non-ui M code enters the equation, query folding stops. That's a big downside if you're filtering tables.

  9. Hi Ken,

    I have an interesting issue whereby I attempt to run two queries sequentially; the first a query generates a data table in excel that contains formulas within additional columns, and the second query takes this table and performs a table.Group operation. Attempting the second query throws this error...

    In de-bugging I have striped the second query down so that it only replicates the table but it didn't work!

    Regrettably the excel formulas cannot be replicated efficiently in the first query step (running total on 35,000 + rows) as per your blog article 'Create Running Totals in Power Query'.

    So do you know if it is ever possible to reference excel tables that originate themselves from query tables?

  10. Hey Joseph,

    Yeah... that's problematic, for sure. Honestly, the way I've got around this in the past is actually to use a bit of VBA to refresh query 1, copy the results to another table (disconnected from PQ), then pull that table back into PQ to do further operations. It's a bit of a pain, but it allows the job to be done.

  11. Cheers Ken, I've done that and it works OK now. I've actually never been able to reference a connection only table, it always throws this error... with or without an external data source. Is this still a bug?

  12. Ah yes sorry I just tried with a separate connection only query and it does work...but it still gives the error when dealing with more complex query tables and merge operations, the query that throws the error, however, couldn't be simpler...

    let
    Source = RANGE_CORRECTION_APPEND,
    #"Table Join" = Table.Join(Source,{"Index"},MIP_TEST_DATA_APPEND, "Index"})
    in
    #"Table Join"

    Where RANGE_CORRECTION_APPEND and MIP_TEST_DATA_APPEND are connection only queries. Referencing either table without the Table.Join operation works fine. The particular error is slightly different and makes zero sense to me!...

    "Query 'MIP_TEST_DATA_APPEND' (step 'Added Index') used 'MultipleUnclassified/Trusted' data when last evaluated, but now is attempting to use 'MultipleUnclassified/Trusted' data."

  13. Sorry I got my errors confused the above post was intended for the ''MultipleUnclassified/Trusted error'' blog post discussion. I'll re-post it there.

  14. I have found today that you can also replicate this issue if you are referencing `PurchaseList` using `Expression.Evaluate("PurchaseList",#shared)`.

    http://stackoverflow.com/questions/35980096/call-a-table-using-a-dynamic-identifier

    This has really killed some ideas I've been having about referencing queries dynamically based on conditionals (ie having staging use fixtures (seed data) in development and then SQL Server data in production).

    Sad face.

  15. What does "PurchaseList" do? If it connects directly to a data source, then that doesn't surprise me as you are essentially loading the M code from somewhere and running it. If PurchaseList just transforms data that is loaded from another source (but doesn't connect to the data source) then I'd be surprised you're getting this issue.

  16. Thank you for your article.
    I encounter the similar error in the following scenario.
    Query1 imports data from a csv file (Source = Csv.Document(File.Contents("C:\...) ).
    Query2 first references to Query1 and then gets the value of a named range from the CurrentWorkbook.
    This produces the same error (...references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.).

    Note: if I do all steps in Query1 it works fine. However I can't because I have intermediate queries that do some manipulations and need to be separate (merge and append).

    Thank you for any assistance

  17. Hi there,

    So the issue is not the first query, it's the second. Make Query2 pull in from the named range then load to connection only. Then create a new query that references BOTH Query 1 and Query 2. That will allow you to avoid the error.

  18. Hi,
    Thank you for your fast reply, and apologies for my late reaction - it's only now I'm back on this.
    I'm trying to figure it out, but I don't see how to reference to two queries.
    Note that I'm new with PQ.
    It's obvious how to reference to one query (Source = ...), but how can a query reference to an additional query (Source2 = ...)?
    Thank you.

  19. Just stumbled across here and recognized that there are some new features out meanwhile that make this task a bit easier:

    1) If you can accept the security aspects coming with it, you can simply enable "fast combine" and then wouldn't need to rebuild anything. ("Check: "Ignore the Privacy Levels...." under Settings -> Options -> Privacy)

    2) Splitting queries is made easier than never before: In the query editor check your step "Removed Column" -> rightclick your mouse and choose "Extract Previous". This will separate the query automatically as you need it.

  20. Hi Imke,

    Re #1, I don't believe the "Fast Combine" setting exists any more. "Fast Data Load" does, but it does something different now. (Doesn't load to a temp memory space first.) You can still turn off privacy, but that's now done under Privacy --> Ignore (which I think is what you're saying.)

    Re #2, absolutely true. 🙂

  21. Hi Imke/Ken,

    Firstly, thanks Ken - have just re-read this post and finally get it in relation to why you recommend the staging tables.

    I've scooted past the issue before in PowerBI Desktop using Imke's method ("Check: "Ignore the Privacy Levels...." under Settings -> Options -> Privacy), but now I am trying to schedule a refresh on PowerBI.com using the personal gateway and am getting a failure:

    Data source error: [Unable to combine data] Section1/Employees/AutoRemovedColumns1 references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.

    Am I right in thinking a) this is essentially the same issue that you've outlined in your post and b) that even though I've supressed the warnings in PowerBI Desktop - these can't be supressed on PowerBI.com - so I will need to follow your staging table method to get the refresh working?

  22. very helpful post. thanks!

    managed to solve my problem following the steps outlined in your tutorial 🙂

    I couldn't use the Purchase_Sheet reference though (in last example). Had to put in the source name instead. I wasn't removing columns there though - I was defining the column type. Don't know if that makes a huge difference but I found that strange.

    Anyway, my query works so I'm happy.

  23. Hey Brian,

    I'm going to say "I suspect so". I haven't actually tested this myself with PowerBI.com yet. I'd actually be curious to hear your results!

  24. Thank you so much for this! I don't know how I would have solved this problem without this post. Using this in combination with the new parameters feature of power bi desktop made my day.

  25. Hey Ken - you should change your formula fnGetParameter in your book - Datamonkey too ( for the next print 😉 ), because the error pops up too, when you use the paramtable directly in the Function fnGetParameter. (i used it to filter a calendarweek in a query, which comes after the staging querys) Now i referenced it to a staging-query of the parameter -table and all works perfectly.

    (ParameterName as text) =>
    let
    ParamSource = Parameter,
    ParamRow = Table.SelectRows(ParamSource, each ([ParaName] = ParameterName)),
    Value=
    if Table.IsEmpty(ParamRow)=true
    then null
    else Record.Field(ParamRow{0},"Value")
    in
    Value

    Bye Bye - Sven

  26. Cheers Ken, solved my problem straight away.. Why I can't be trusted to set these permissions myself I don't know..

  27. Thanks Sven. Yes, I've ended up playing with that a bit to try and figure out the best way to navigate the formula firewall with the parameter function. Such a pain!

  28. My load queries are not stopping at the staging queries they point to, but rather kicking off all related queries. Any help on what may be causing this?

  29. Hey Josh,

    Are the staging queries set up as "Connection Only" queries, or do they load to a destination (worksheet or data model)? The other thing to check would be if those staging queries are referencing other queries. I assume you're just initiating the refresh on a single Load query?

  30. Hi,

    I'm confused as to why I get the error message in this context. I think I get the concepts here but any advice gratefully accepted
    ____________________________________________________________
    Formula.Firewall: Query 'Data Filter' (step 'Filtered Rows') references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
    ____________________________________________________________

    let
    SolutionYear = fnGetParameter("Selected Year"),
    SolutionMonth = fnGetParameter("Selected Month"),
    Source = Data,
    #"Filtered Rows" = Table.SelectRows(Source, each ([Year] = SolutionYear) and ([Month] = SolutionMonth))
    in
    #"Filtered Rows"

    The Parameter model is working using other parameters like filepaths...
    The Source 'Data' is a merge of 4 different data sources (csv files)

  31. Hi Ken,
    I have been using your blogs and your "M is for (Data) Monkey" book for my data projects - thanks for sharing your knowledge and expertise. I have noticed that your writings are very easy to understand/follow and implement compared to most other authors. Keep up the great work!

    I am trying to write a function to get the current quarter into a PowerQuery step instead of too many table merging. Here is my first attempt to get the current quarter (based on today's date) from the enterprise date database using a function:

    let fCurQtr = ()=>
    SELECT QTR_NAME FROM [SHARED].[ENTERPRISE_DATE]
    WHERE FULL_DATE=CAST(GETDATE() AS DATE)
    in fCurQtr

    I got the error that you mentioned in this post when I used this function. So I created a staging query (CurQtr) as you suggested to pull the current quarter and then I reference it in the function as follows

    let fCurQtr = ()=>
    CurQtr
    in fCurQtr

    Now, I don't get the error but when I use the function fCurQtr() it doesn't output the quarter directly but gives me a table to expand. Is there any way to avoid the expand table step and directly get the current quarter?

    Any ideas or suggestion would be greatly appreciated!

    Thanks, Anil

  32. Hi Ken,

    I changed the function as follows and it worked, please let me know if there is a better way to do it or appreciate comments if any (basically I am directly referencing the value in the table which always have one row)

    let fCurQtr = ()=>
    CurQtr{0}[QTR_NAME]
    in fCurQtr

    Thanks, Anil

  33. Hey Anil,

    Glad you got solution. The only question I'm really curious on here is why you're using a custom function to generate a quarter. Why not just add a custom column and use the Date.QuarterOfYear() function to generate that. It would avoid the firewall, and would also be calculated in one step.

  34. I get this error although I am not using an external data source.
    I am just trying to pass a dynamic source name to a query
    I have created a simple parameter table with the names of the tables I want to pass as source, like this:
    SourceTable
    Table1
    Table2
    Table3
    Then I have built a simple connection only query called Parameter using above table as source.

    Now I am trying to build my actual query using this:
    Source = Excel.CurrentWorkbook(){[Name=Parameter[SourceTable]{0}]}[Content]

    Although all my tables and queries are in the same workbook I get the Formula Firewall error

    Any help would be very much appreciated

    Thanks, Panagiotis

    PS: I am using Excel 2016 64 bit

  35. Hi Ken,

    I posted a question yesterday but I don't see it now.
    Yesterday evening the question was still here with a status of "awaiting moderation"
    What did I do wrong?

    thanks
    Panagiotis

  36. Ken -

    I'm trying to combine your parameter tip with a merge query and I'm not getting this to work despite trying multiple combinations. I have a parameter that determines how many days of history will be loaded.

    fnGetWorkBookParameter
    (PName as text) =>

    let
    ParamSource = Excel.CurrentWorkbook(){[Name="tblParameter"]}[Content],
    ParamRow=Table.SelectRows(ParamSource, each ([ParamName]=PName)),
    ParamValue = if Table.IsEmpty(ParamRow) = true then null else Record.Field(ParamRow{0}, "Value")
    in
    ParamValue

    This correctly returns the value in the table (7 in this case). My original query that generated the error was as follows:

    Order History
    let
    Source = Table.NestedJoin(Orders,{"Business Unit", "Par Location ID", "Compartmnt", "Item ID", "PO/MSR No", "Line No"},Receipts,{"Business Unit", "Par Location ID", "Compartment", "Item ID", "PO No.", "PO Line Number"},"NewColumn",JoinKind.LeftOuter),
    ChooseReceivedQty = Table.ExpandTableColumn(Source, "NewColumn", {"Received Qty"}, {"Received Qty"}),
    /*
    For smaller extracts, take only the dates according to the incremental value
    */
    DateInc = fnGetWorkbookParameter("Days History") * -1,
    //
    LatestDatesOnly = Table.SelectRows(ChooseReceivedQty, each [Demand Date] > Date.AddDays(Date.From(DateTime.FixedLocalNow()), DateInc)),
    AddColRecvVariance = Table.AddColumn(LatestDatesOnly, "Recv Variance", each if [Received Qty] = null then 0-[#"PO/MSR Qty"] else [Received Qty]-[#"PO/MSR Qty"]),
    RecvVarTypeInt = Table.TransformColumnTypes(AddColRecvVariance,{{"Recv Variance", Int64.Type}})
    in
    RecvVarTypeInt

    So I split these out into GetHistoryParameter
    let
    Source = Table.NestedJoin(Orders,{"Business Unit", "Par Location ID", "Compartmnt", "Item ID", "PO/MSR No", "Line No"},Receipts,{"Business Unit", "Par Location ID", "Compartment", "Item ID", "PO No.", "PO Line Number"},"NewColumn",JoinKind.LeftOuter),
    ChooseReceivedQty = Table.ExpandTableColumn(Source, "NewColumn", {"Received Qty"}, {"Received Qty"})
    in
    ChooseReceivedQty

    and Order History
    let
    Source = GetHistoryParameter,
    //
    DatesInc = fnGetWorkbookParameter("Days History") * -1,
    LatestDatesOnly = Table.SelectRows(Source, each [Demand Date] > Date.AddDays(Date.From(DateTime.FixedLocalNow()), DatesInc)),
    AddColRecvVariance = Table.AddColumn(LatestDatesOnly, "Recv Variance", each if [Received Qty] = null then 0-[#"PO/MSR Qty"] else [Received Qty]-[#"PO/MSR Qty"]),
    RecvVarTypeInt = Table.TransformColumnTypes(AddColRecvVariance,{{"Recv Variance", Int64.Type}})
    in
    RecvVarTypeInt

    But the error persists.

  37. Actually what if you can only make a second query after you know the list of items in the first query and also you are only able to make a single request each time?

    My idea was to have a list of items and for each of them run a custom function that would make a second query for that value...

    This is giving that firewall problem. I cannot run the second query globally because it really needs the input parameter that is coming from first query.

  38. Yes, so that is definitely an issue. You can turn off the firewall but turning off the Privacy settings, or you can use VBA to do a staged refresh where you land the data in a table, then suck it back into Power Query after. (The latter is kind of a pain though.)

If you have a comment or question about the blog post content, please feel free to post it here. If you need help adapting this solution to your own needs, please post in our free help forum.

Your email address will not be published. Required fields are marked *