Number rows by group using Power Query

After one of my previous sorting posts, Devin asked if we can number rows by group.  Actually, that's a paraphrase… what he really asked was:

Any thoughts on how to produce something like a ROW_NUMBER function with PARTITION from T-SQL? Similar to this: https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-2017#d-using-rownumber-with-partition

I've never used the PARTITION function in SQL, so I checked in with him just to confirm what he was after.  And here it is: take the first three columns of the table below, and add a final column with index numbers sorted by and restarting with each group:

Number rows by group example

And of course there is.  For this one, though, we actually do need to write a little code.

Preparing to Number rows by Group

Now here's the interesting part.  The source data looks exactly the same as what you see above, the only difference being that in the output we also added a number rows by group.  So how?

Well, we start by grouping the data.  Assuming we start by pulling in a table like the above with only the first 3 columns:

  • Sort the data based on the Sales column (in descending order)
  • Group the data by Group
  • Add a aggregation column for "All Rows"

Like this:

Group By dialog

Which yields this:

Table after aggregation

We are now ready to add the numbering.

Now to Number rows by Group

In order to number rows by group, what we really want to do is add an Index column, but we need to add it for each of the individual tables in our grouped segments.  If we try to add it here, we'll get 2 new values, showing 1 for Alcohol and 2 for Food.  That's not exactly what we need.  But if we expand the column, then the index will not reset when it hits Food, and the numbering won't be right either.

The secret here is to add an Index column to each of the individual tables in the Data column, then expand the columns.  The numbering will then be correct.

To do this, I added a custom column using the following code:

=Table.AddIndexColumn([Data], "Index", 1, 1)

Where did I get this code?  I actually added an Index column to the whole table.  That added the following code in the formula bar:

=Table.AddIndexColumn(#"Grouped Rows", "Index", 1, 1)

I copied that code, and deleted the step. Then I added my custom column, pasted the code, and replaced #"Grouped Rows" (the name of the previous step) with the column I wanted to use.

The result was this:

Table after Index column added

The final steps to clean this up:

  • Remove the Data column
  • Expand all columns from the Partitioned table except the Group (since we already have it)

Which leaves us with our table in place including the new column which does number rows by Group as originally planned.

If you want to play with this one, the example file can be found here.

I would also be remiss if I didn't mention that we have a great video in our Power Query Academy that covers this kind of operation (among others).  It's called "Advanced Row Context" (in our M deep dive section) where Miguel shows all kinds of cool stuff that you can do by adding new columns to Grouped Rows.

9 thoughts on “Number rows by group using Power Query

  1. Holy cow! That's really amazing, going into my quick reference list for sure!

  2. Good stuff, Ken. I was using your solution and trying to build a custom function with it, but I often struggle with getting the right structure for them. Any wisdom there?

  3. Maybe that should be the next blog challenge, Alex. 😉

    Curious, how do you see the function working? Pass it a table and tell it which master column you need to group by? How would you control the sub-sorting... make the user do that first, or as part of the function? (The more you want the function to do, the more parameters and checks that will need to be inside it, and the more complicated it becomes.)

  4. @Matthew, the interesting part about this one to me is that I've never been asked this until Devin raised it. Within 7 days I was then asked 3 more times by completely different people!

  5. @Ken, I'll add to the weirdness, literally last week I had a scenario where this would have been extremely helpful, but it was so far beyond what I'd thought of that it never even occurred to me to look for something on it.

    This week, I got another request for a completely different analysis where this is probably going to be crucial.

    Man, I love Power Query. DAX is cool too, but PQ makes me happy.

  6. @ Ken you can add the custom column in line
    let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    mSort = Table.Sort(Source,{{"Group", Order.Ascending}, {"Sales", Order.Descending}}),
    mGroup = Table.Group(mSort, {"Group"}, {{"GRP", each Table.AddIndexColumn(_, "Index", 1, 1), type table}}),
    mRemCols = Table.RemoveColumns(mGroup,{"Group"}),
    mXpandGrp = Table.ExpandTableColumn(mRemCols, "GRP", {"Group", "Category", "Sales", "Index"}, {"Group", "Category", "Sales", "Index"})
    in
    mXpandGrp

  7. I have been using the following for this - I call it a "clustered index". The english from the code is not great, but the code works fantastically. The below code needs to be added into a blank query in the advanced editor:

    let
    Suoyin = (table as table, text as text) as table =>
    let AddedSortIndex = Table.AddIndexColumn(table, text, 1, 1),
    RemovedColumns = Table.ReorderColumns(AddedSortIndex,List.Combine({{text},List.RemoveLastN(Table.ColumnNames(AddedSortIndex))}))
    in RemovedColumns,
    Clsuoyin = (table as table, list as list, text as text) as table =>
    let AddedSortIndex = Table.AddIndexColumn(table, "Orindex", 1, 1),
    GroupedRows = Table.Group(AddedSortIndex, list, {"FilteredTable", each Suoyin(_,text)}),
    Colnames = List.Combine({{text},Table.ColumnNames(AddedSortIndex)}),
    ExpandedTable = Table.ExpandTableColumn(Table.SelectColumns(GroupedRows,"FilteredTable"),"FilteredTable",Colnames,Colnames),
    SortRemove = Table.RemoveColumns(Table.Sort(ExpandedTable,{"Orindex", text}),{"Orindex"})
    in SortRemove
    in
    Clsuoyin

    Then add the below line to your query:

    let
    Source = Excel.CurrentWorkbook(){[Name="ClusteredIndex"]}[Content],
    addsuoyin = Clsuoyin(Source, {"Name","Factory"}, "Index")
    in
    addsuoyin

    taken from:

    https://social.technet.microsoft.com/Forums/sharepoint/ar-SA/d9a30cd0-9dbe-48ca-8260-8fabc585bceb/power-query-insert-index-by-subcategory-ie-clustered-index?forum=powerquery

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.