How to Reference other Power Query queries

One of the things I really like to do with Power Query is shape data into optimized tables. In order to accomplish that goal, I’ve begun using Power Query to source data over Power Pivot’s built in methods. But in order to build things the way I want, I need an easy way to reference other power query queries.

Why would I go to the effort of feeding through Power Query first? I’m no SQL ninja, and I find Power Query allows me to easily re-shape data in ways that would be hard with my SQL knowledge. I can leverage this new tool to optimize my tables and build Power Pivot solutions that require less tricky and funky DAX measures to compensate for less than ideal data structure. (I’d rather have easy to understand relationships and simple DAX measures!)

Methodology

My methodology generally goes something like this:

  • Load a base table into a Power Query. I then set it to only create a connection. Let’s call this my Base Connection.
  • Next I’ll create as many queries as I need to re-shape the data in the Base Connection into the forms I need, then load those into the data model.

It’s that second part that is key. I need to be able to reference other Power Query queries (namely my Base Connection) so that I could prune/trim/re-shape the data.

Reference other Power Query queries - The Old Way

Until recently, I would create my Base Connection, then I’d do the following to create the new query to reference that one.

  • Go to the Power Query tab
  • Show the Workbook Queries pane
  • Right click the Base Connection query and choose Reference

The problem was this… my intention was to reference and customize my query. Instead, it immediately loads it into a worksheet. I have to wait for that to finish before I can edit the new query and customize it the way I want.

Reference other Power Query queries - The New Way

I learned a new method last week from one of the Power Query team members which is much better (thanks Miguel!). I included it in my last post, but I thought this was worth calling out on its own.

Instead of following the method above, this time we will:

  • Go to the Power Query tab
  • Show the Workbook Queries pane
  • Right click the Base Connection query and Edit

Now we’re taken into the Power Query window. On the left side we can see a collapsed “Queries” pane. When you expand that, you get a list of all Power Queries in the workbook.

image_thumb13[1]

  • Right click the Base Connection query and choose “Reference”

We now have a new query in the editor that we can edit, without loading it into a worksheet first. Faster, and more in line with my goals.

The other thing I like about this method is that it immediately gives me access to that queries pane. Why is that important? Because I can drill through the other queries and get at their M code without having to close the window and go back to Excel first. So if I have some funky M code I need to re-use, it makes it way easier to review it and copy it.

20 thoughts on “How to Reference other Power Query queries

  1. Pingback: Excel Roundup 20150126 « Contextures Blog

  2. Hello Ken

    Thank you for your post. With these new method what I see is that every reference reads to the original source and not the final layout of my base query, and thats make that the time for data refresh grown in an exponential mode for every reference that I have. Is that correct?

    Is there another way to use the outcome of a queryas source for another query?

    Regards

  3. Hi Ivan,

    I'm not sure I fully understand here... Assume that we have query 1 and add a query called "2" that references query 1. Are you asking why, when you refresh Query 2, Query 1 refreshes? Are you looking to make Query 1 static, so that it doesn't update unless you specifically refresh it?

    If that's the case, then no... when you use one query to reference another it creates a dependency chain. So when you refresh the query, it will reach back to the original data source and refresh that. If that is a database or file, it will reach out to make sure the data is current. If it's another query, it will refresh that query to make sure it is current, which will then reach back to it's sources.

    I haven't found that structuring queries to read from each other adds any significant time to the refresh process, but then to be fair I haven't set up any tests to measure it either.

    Did that answer your question, or did I misinterpret?

  4. It would be great to use this ability to reference all Power Queries in a workbook to do search in all of them easily. Even better would be a Find and Replace feature, to update variable names etc.

    Has anybody done this yet? Would love to use this on workbook that 10 + Power Queries with 20+ steps each.

    Thanks!

  5. I'm not sure, to be honest. I think I've seen something, but iirc it was more about using a crash report to scan the details.

  6. I have a data sheet where the generated date shows up in a single cell up on the top and then the data table itself follows.

    I created a query to just pick up the generated data but now I want to use that date within a formula of a new column in the 2nd query (the one that pulls/transforms the data table itself). How can I do that?

    Or how can I add the Generated data as a new column within the data table itself? That way I can use this generated date column and create the new column referencing this.

  7. Ken, I agree with the aproach of using PQ to shape the data to help simplify the DAX relationships and measures. Can this be done in separate workbooks? In other words, can I import query-generated tables or connections, or data model tables from one workbook into another? I have many, many data tables that requiring a great deal of shaping. I'd like to import the results of all the shaping to another workbook for later processing either in PQ or DAX.

  8. Hey Bob,

    You can't connect to the data model in another workbook, but you can connect to Excel tables in another workbook. So if you wanted to use one workbook to run your main queries, then land those in tables, you could pull those into your solution.

    To be fair though, reaching against that workbook as a source won't force it to refresh. That may be the goal here though, meaning you only need to refresh from the original source when absolutely needed.

  9. Hi Ken, thanks. I was in your first PQ course ; and your and Miguel's book has been invaluable. My challenge is that I have a workbook with many tables each having well over 1m rows. Your assumption is correct; I don't need to refresh often.

  10. Yeah, that would be a pain, for sure. I guess my thought is that I'd probably try and build a solution that pulled the data in and cleaned/processed it with Power Query before uploading it into a staging database. It would take some VBA knowledge, but if built a prototype in the past that did exactly this. I have logged the "processed files" to a database table to ensure that they would not be refreshed again in future.

    Kind of a pain to build, but probably the better solution for you, as you can then refresh from the database as needed.

  11. Ken, thanks. What I did was in Workbook A I used DAX as query language (EVALUATE and SUMMARIZE) to extract from the data model into an Excel table (goal is to not violate the 1M rows limit for each Excel table). Then, in Workbook B I used PQ import and apend to reassemble, and then load into another data model.

  12. Hi Ken, great post. Thanks.

    I am trying restrict the rows in a Projects table (8000 rows) to only those ProjectNumbers that are present on Timesheets (100k rows) within the 12-month moving time period defined for my Timesheets query.

    I create a new ProjectList query which references the ProjectNumber collumn in the Timesheets query and removes duplicates to create a distinct list of the ProjectNumber values present.

    let
    Source = Timesheet,
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Project Number"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Other Columns")
    in
    #"Removed Duplicates"

    My query is on refresh, this ProjectList query reloads the entire 100k rows of the Timesheets query from SQL. Is there any way to reference the Timesheets query in such a way that it does not go back to SQL? Would Table.Buffer help 'somewhere'?

    Kind regards,
    Brian

  13. Hi Brian,

    Unfortunately the second query can't take advantage of any query folding against the SQL database, only the first query in the chain can. If I read this right, you've done this:
    Query 1: Connects to the raw table
    Query 2: References Query 1 and filters the data

    Basically, this means that no steps from query 2 will be rolled up and sent to the database, causing you to load all 100k rows

    Now if you set it up to do this:
    Query 1: Connect to the database, remove columns, remove duplicates
    Query 2: References Query 1 for further operations

    The the entire contents of Query 1 could be passed back to the SQL database in order to process it way more efficiently, returning just the 8 k rows.

  14. Hi Ken,

    The problem I have with this approach is that if I do some filtering of the data, then the referenced query gets affected too. This is inconvenient because sometimes I want to handle that query as a total different one, but having the flexibility of only "refreshing" one time, one data source. For example, I can bring the data as a connection and make another query as a reference. Everything fine here. Let's say I do a remove duplicate on the country column and obtain 10 countries. However, if I apply some filter on the first (original) query, and the countries in that scope are limited, then my other query will be affected and now I would see less countries.

    The other way of doing it is by duplicating the query, but the problem is that then the file path is hard coded and then I might need to first remember to update the path and then it takes manual effort to do this.

    Any idea?

    Thanks

    Fernando

  15. Hi Ken,

    Well I guess I don't NEED to modify the base query... I can just create more references and modify those without touching the original base query.

    Disregard!

    Thanks anyways

    Fernando

Leave a Reply

Your email address will not be published. Required fields are marked *