This is a cool example of how to Extract Data Based on the Previous Row, which came up as a viewer's question inside our Power Query Academy. Let's look at how we solved it…
What Kind of Data Needs This Treatment?
Here's a picture of the user's raw data after a little bit of cleanup:
So What's the Problem?
The challenge here is all about the two data points highlighted as A and B. They are categories, and need to be extracted from this data… but how? There is no common pattern between rows that we can look at to say "this is a category, but this is not."
But there is data on the row above. Everywhere there is a "------" in the Quotes column, the next row has our Category in the Source column:
But how do we get at it? Power Query doesn't have any easy-to-use facility to refer to the prior row, so what do we do?
How to Extract Data Based on the Previous Row
There are actually a couple of ways to do this. One is to write a formula that refers to the previous row, the other is to do this via a creative use of merging tables. The previous row method is covered in M is For Data Monkey (page 185), so this time I'm going to focus on the latter.
Extract Data Based on the Previous Row - Setup
The first thing I did here is add two new columns to the data table above: an Index column starting from 0 and another Index column starting from 1. To do this:
- Go to Add Column --> Index Column --> From 0
- Go to Add Column --> Index Column --> From 1
Pretty easy, and gives you this:
And at that point we call it a day and create this as a connection only query. Here's what I did there:
- Named the query: "Prelim" (but you can call it anything)
- Went to Home --> Close & Load To… --> Only Create Connection
This gives me a query that I can call again when needed, without loading it to a worksheet or the Data Model.
Extract Data Based on the Previous Row - Completion
Next, I created a new query by right clicking the Prelim query in the Query Pane and choosing Merge.
I then did something really weird… I chose to Merge the query against itself… yes, seriously. (I've always told people that it seems weird you can do this, but one day you'll need to… and today is the day!)
I configured the Join as follows:
- Use the Prelim query for both the top and bottom tables
- Chose to use the Index column on the top as the join key
- Chose Index.1 on the bottom
Then, once in the Power Query editor, I expanded just the Quotes column (without the column prefix), and removed the Index and Index.1 columns. This left me in a pretty good place:
With a pattern to exploit, this is now a simple matter of:
- Creating a Conditional Column (Add Column --> Conditional Column) called "Category" with the following logic:
- if [Quotes.1] = "-------" then [Source] else null
- Right click the Category column --> Fill Up
- Filter the [Quotes.1] column to remove all "------" values
- Remove the [Quotes.1] column
- Filter the [Quotes] column to remove all null and "------" values
And honestly, that's pretty much it. To be fair, I also reordered the column and set the data types before the picture below was taken, but you get the idea. At the end of the day, the data is totally useable for a PivotTable now!
Interested in Mastering Power Query and the M Language?
Come check out our Academy. We've got over 13 hours of amazing material that will take you from no skills to mastery and get you hours back in your life.