Extract Data Based on the Previous Row

This is a cool example of how to Extract Data Based on the Previous Row, which came up as a viewer's question inside our Power Query Academy.  Let's look at how we solved it…

What Kind of Data Needs This Treatment?

Here's a picture of the user's raw data after a little bit of cleanup:

SNAGHTML775ec10

So What's the Problem?

The challenge here is all about the two data points highlighted as A and B.  They are categories, and need to be extracted from this data… but how?  There is no common pattern between rows that we can look at to say "this is a category, but this is not."

But there is data on the row above.  Everywhere there is a "------" in the Quotes column, the next row has our Category in the Source column:

SNAGHTML778e971

But how do we get at it?  Power Query doesn't have any easy-to-use facility to refer to the prior row, so what do we do?

How to Extract Data Based on the Previous Row

There are actually a couple of ways to do this.  One is to write a formula that refers to the previous row, the other is to do this via a creative use of merging tables.  The previous row method is covered in M is For Data Monkey (page 185), so this time I'm going to focus on the latter.

Extract Data Based on the Previous Row - Setup

The first thing I did here is add two new columns to the data table above: an Index column starting from 0 and another Index column starting from 1.  To do this:

  • Go to Add Column --> Index Column -->  From 0
  • Go to Add Column --> Index Column -->  From 1

Pretty easy, and gives you this:

SNAGHTML77d9ea5

And at that point we call it a day and create this as a connection only query.  Here's what I did there:

  • Named the query: "Prelim" (but you can call it anything)
  • Went to Home --> Close & Load To… --> Only Create Connection

This gives me a query that I can call again when needed, without loading it to a worksheet or the Data Model.

Extract Data Based on the Previous Row - Completion

Next, I created a new query by right clicking the Prelim query in the Query Pane and choosing Merge.

I then did something really weird… I chose to Merge the query against itself… yes, seriously.  (I've always told people that it seems weird you can do this, but one day you'll need to… and today is the day!)

I configured the Join as follows:

  • Use the Prelim query for both the top and bottom tables
  • Chose to use the Index column on the top as the join key
  • Chose Index.1 on the bottom

Like this:

image

Then, once in the Power Query editor, I expanded just the Quotes column (without the column prefix), and removed the Index and Index.1 columns.  This left me in a pretty good place:

SNAGHTML786d430

With a pattern to exploit, this is now a simple matter of:

  • Creating a Conditional Column (Add Column --> Conditional Column) called "Category" with the following logic:
    • if [Quotes.1] = "-------" then [Source] else null
  • Right click the Category column --> Fill Up
  • Filter the [Quotes.1] column to remove all "------" values
  • Remove the [Quotes.1] column
  • Filter the [Quotes] column to remove all null and "------" values

And honestly, that's pretty much it.  To be fair, I also reordered the column and set the data types before the picture below was taken, but you get the idea.  At the end of the day, the data is totally useable for a PivotTable now!

image

  Interested in Mastering Power Query and the M Language?

Come check out our Academy.  We've got over 13 hours of amazing material that will take you from no skills to mastery and get you hours back in your life.

4 thoughts on “Extract Data Based on the Previous Row

  1. Ah, that's brilliant! When it's a simple pattern like this (which so many of them are) this is a much easier way of handling the problem. And I can already see all kinds of possibilities involving iterations of the index...

  2. I had a similar requirement in the webscraping file I sent you the other day, and I did the PowerQuery equivalent of a COALESCE to merge those two columns together.
    The equivalent for your data set would be:
    if [source] null then [source] else [quotes].

    Would that help?

  3. Hey Jeff, did your comment drop the "not equal" characters between Source and null?

    No, that's not going to do much. It will essentially re-create [Source] as there are no nulls in that column (blank is not null). Even if the test was against blanks though, you'd end up with nulls in their place.

    I suppose that would allow you to test if [Source]=null then [quote] else null. That would in turn allow you to fill down and then make comparisons. So it's possible that you could work around this particular issue.

Leave a Reply

Your email address will not be published. Required fields are marked *