Refer to other steps in Power Query

Last week I posted a technique to show how to calculate a rolling 12 months in Power Query.  One of the techniques used was to refer to other steps during the construction of that query.  Shortly after publishing that, a user asked a question on a non-related post that can make use of the same technique.  Because of this I thought I should focus on that specific technique this week, and where it can add more value.

The Question

I have a data sheet where the generated date shows up in a single cell up on the top and then the data table itself follows.

I created a query to just pick up the generated data but now I want to use that date within a formula of a new column in the 2nd query (the one that pulls/transforms the data table itself). How can I do that?

Now, the asker is working between two queries.  I’m actually not going to do that at all, rather focusing on getting things working in a single query.

The Mock-up

I haven’t seen the asker’s original data, but I mocked up a sample which I believe should be somewhat representative of what was described:

image

As you can see, we’ve got a single cell with the data in A3, and a table below it.  While I’ve done this in Excel, this could easily be pulled in from a text file, web page, or some other medium. The key that I want to focus on here is how to get that date lined up with the rest of the rows in the table.

Options, Options, and more Options

There’s actually a ton of ways to do this.  Just some include:

  • Naming the date range, using the fnGetParameter function to pull it in, and pass it into the query that way.
  • Pull the data into Power Query, duplicate the first column, format it as a date, replace errors with null, fill down, and cull out the rest of the garbage rows
  • Add a custom column that refers directly the the 3rd field of the first column
  • And many more

But in order to pull this of today, I’m going to refer to other steps in the Applied Steps section of the query.  This is a method you can use to determine a variable through the user interface without resorting directly to M code.

Building the Output

Loading the data

To pull the data in, I’ll set up a named range, as this doesn’t exactly look like a table.  To do that I:

  • Selected A1:C8
  • Replaced the A1 in the Name box (just to the left of the formula bar) with the name “Data”

Which landed me the following in Power Query:

image

Filter down to just the date cell

This part is relatively easy, we just need to:

  • Right click Column1 –> Remove other columns
  • Right click Column1 –> Change Type –> Date
  • Go to Home –> Remove Errors
  • Filter Column1 –> Uncheck the null values

And we’re now down to just our date:

image

You’ll also notice, on the right side, the Applied Steps window shows this step as “Filtered Rows”.  I’m going to right click that and rename it to “ReportDate” (with no space).

Refer to Prior Steps

With this in place, we can now essentially revert to our original query.  To do that, we:

  • Go to the Formula Bar and click the fx logo to get a new query line:

image

Notice that it refers to the previous step.  No big deal, change that back to “=Source” (the original step of our query.  When you do, your “Custom1” step will look like this:

image

Perfect.  Let’s add a custom column.

  • Go to Add Column –> Add Custom Column
  • Set up the custom column as follows:
    • Name:  Date
    • Formula:  =ReportDate

Your “ReportDate” step gets added as a table:

image

  • Click the expand arrow to the top right of the date column header and expand it (without keeping the column prefix)

image

And now it’s just basic cleanup to get things in the right place:

  • Go to Home –> Remove Rows –> Remove Top Rows –> 4
  • Go to Transform –> Use First Row as Headers
  • Right click Column4 –> Rename –> Date

And you’re done:

image

So… could you build one query to get the date, then try to pass it to a query with your data table in it?  Sure, but why?  Much better to do it all in one place.

It’s Faster with M

Before Bill S jumps in and shows us that it’s faster when we manipulate the M code directly, I figure I’ll cover that too.  Bill is absolutely correct when he comments on my posts showing why we should learn M.  Here’s the (summarized) route to do the same thing using M code:

  • Load the initial table into Power Query
  • Go to Home –> Remove Rows –> Remove Top Rows –> 4
  • Go to Transform –> Use First Row as Headers
  • Add a custom column
    • Name:   Date
    • Formula:  =Date.From(Source[Column1]{2})

You’re done.  🙂

Why?  The trick is all in the formula.  Let’s build it up gradually.

We start by referring to the Source step.

  • =Source

This would return a table to the column (as you saw earlier).  We then modify the formula and append [Column1] to this so that we have:

  • =Source[Column1]

This returns the list of all of the values in Column1 from the Source step.  (Never mind that we moved past that step – it will still refer to it as if it was still around.)  Next we append the index of the data point we want.  Remembering that Power Query is base 0, that means that we need #2 to get to the 3rd data point:

  • =Source[Column1]{2}

Now, if you went with this as your formula you’d find that it actually returns a DateTime value.  So the last step is to wrap it in a formula to extract just the date:

  • =Date.From(Source[Column1]{2})

Final Thoughts

So now you’ve seen two ways to pull this off… one via the user interface, one more efficient by writing a small bit of M code.  Different options for different comfort levels.

What I love about Power Query is that you don’t NEED to master M to use it effectively.  But if you DO master M, then it sure can make your queries more efficient and elegant.

Also, I should mention this… if the user really DID want to merge these two queries together, it is as easy as adding a new step (by clicking that fx button), then putting in the name of the other query.  That will bring the data over, and then it’s simply a matter of following the rest of this post to stitch them together.

11 thoughts on “Refer to other steps in Power Query

  1. This is another values article Ken.
    Clear and concise.
    Many TX.

    I hope I may ask an unrelated question here... (correct me if this is not allowed)

    The other day I was asked if Power Query could delete all top rows up to a found value. I could not find a solution and its been a burning question till now.
    For example: If I import a csv file containing columnar info, but the headings for the list are in different rows for each import. I know that the first heading in column A is called "ID Number", but each import has this heading in a different row.
    How do I determine an applied step to delete all rows above "ID Number". I cannot use the delete top rows as its not always 5 rows, some import the headings start in row 10, others in row 3...but the label I am looking for is always "ID Number".

    TX.

  2. Hi Rudi 🙂
    Try this below. That is enough, I think.

    let
    Source = Csv.Document(File.Contents("C:\CSV\One.csv"),[Delimiter=",",Encoding=1252]),
    AddIndex = Table.AddIndexColumn(Source, "Indeks", 0, 1),
    Skip = Table.Skip(Source, Table.SelectRows(AddIndex, each ([Column1] = "ID Number")){0}[Indeks]),
    PromoteHeaders = Table.PromoteHeaders(Skip)
    in
    PromoteHeaders

    Regards

  3. Bill beat me to it. My thoughts about the process were essentially this:
    -Add an index column
    -Filter to show only rows with "ID Number"
    -Extract the Index value
    -Refer to the first step
    -Filter to the index value you extracted

    Bill's code essentially does that, just a little more M focused than UI focused. Maybe I'll do up a full post on that next week for those who don't follow the blog comments. 🙂

  4. Hi,

    After my post I continued experimenting and I actually managed to get there. 🙂

    I will certainly look into the advice provided as I am completely open to learn more...but here is the M code I used to get the job done.
    I located the rows containing ID Number
    I located the row containing Grand Total
    (I actually wanted to extract all rows BETWEEN this range of rows)
    Then modified the Table.Range function with the two variables.\
    It is working beautifully... 🙂

    Cheers Ken and Bill
    Its great to have instant solutions because of your expertise.

    M Code:
    =======
    let
    Source = Csv.Document(File.Contents("C:\Test.csv"),[Delimiter=",",Encoding=1252]),
    FilterIDRow = Table.SelectRows(Source, each ([Column1] = "ID Number")),
    IDRow = Table.PositionOf(Source, FilterIDRow {0}),
    FilterGrandTotal = Table.SelectRows(Source, each [Column1] = "Grand Total"),
    GrandTotalRow = Table.PositionOf(Source, FilterGrandTotal {0}),
    #"Kept Range of Rows" = Table.Range(Source,(IDRow+1),(GrandTotalRow-IDRow-2)),
    #"Promoted Headers" = Table.PromoteHeaders(#"Kept Range of Rows")
    in
    #"Promoted Headers"

  5. Hi Ken,

    Since posting the question above, I have learnt a lot about referring to previous steps. I have studied up Bills reply too and its great to play around with these functions...learning to nest them to get two or more steps resolved in a single step. There is only one thing still "bugging" me...

    ... You list these steps in your reply above:
    -Add an index column
    -Filter to show only rows with "ID Number"
    -Extract the Index value
    -Refer to the first step
    -Filter to the index value you extracted

    As simple as it my sound, how do you: Extract the Index Value??? What function does one use for that? In my solution I use Table.PositionOf() to get (store) the filtered row. Bill uses Table.Skip() to jump directly to the filtered row. I just cannot figure out how to Extract the Index Number (and possibly store it for later use).

    Please could you clarify.
    Many TX.

  6. Hi Rudi,

    Assuming that you did things in that order, the step where you filtered to ID Number should be called "Filtered Rows". From there, you want to extract the value that sits in the Index column, in row 0 (since it's the only row.)

    So we start by referring to the step, then the column, then the row number we want:
    =#"Filtered Rows"[Index]{0}

    What I would do once you've got that is actually refer to the Source step again, set up a filter to remove the top rows, then edit it to put in the formula above like this:

    = Table.Skip(Custom1,#"Filtered Rows"[Index]{0})

    Does that make sense?

  7. 100% sense...

    Actually, now that I see your solution, it seems I fell into the trap of complicating things, which I tend to do when learning something new.

    Many TX Ken; and I am scrutinizing the recent article you posted on this topic.
    BIG cheers!!!

  8. Ken wrote: "... it’s faster when we manipulate the M code .."

    I (think) I read where M optimizes the code before it executes. In other words, although written line by line, M does not execute line by line. So combining lines may not be any faster. Do I have this correct?

    Thanks and great posts!
    Dave

  9. I reached out to one of the PM's at Microsoft on this, just to be sure. His reply:

    "True. Combining lines won’t improve performance. The order of the operations can sometimes affect performance, but these should be edge cases… the M Engine will optimize the query and reorder things as needed."

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.