Can we refresh an index column in PQ?

Charley

Member
Joined
Jul 29, 2017
Messages
64
Reaction score
0
Points
6
Excel Version(s)
365
I need to recreate/rebuild/refresh an Index column several times. The only way I've found to do this is to add a new Index column each time, and then, just before closing, move the final Index into place, delete the others, and then rename it.

It would be fewer steps if I could just refresh an existing Index column with new values.

Is that possible?

Thanks.

Charley
 
You might remove the Index column each time before adding a new one.

You can also create a custom function to refresh the index. A simple example, I called it RefreshIndex:

Code:
(Table as table) as table =>
let
    RemovedIndex = Table.RemoveColumns(Table,{"Index"}),
    AddedIndex = Table.AddIndexColumn(RemovedIndex, "Index", 0, 1),
    Reordered = Table.ReorderColumns(AddedIndex,Table.ColumnNames(Table))
in
    Reordered

Example of how this can be used:

Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Text", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each {1..5}),
    #"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
    #"Refreshed Index" = RefreshIndex(#"Expanded Custom")
in
    #"Refreshed Index"
 
Marcel,

I hadn't thought about using a custom function as you illustrate. Thanks for your code!

The advantage to your approach is that the code is cleaner and more modular.

But one potential disadvantage is execution time. Your code executes more lines than my approach. However, I'm new to PQ, and I don't yet have a feel for the steps that take a relevant amount of time and those that don't.

Would you (or anyone else) care to comment about that?

Thanks!

Charley
 
There is a big difference between the code as we see it, and the code that is actually executed in the background: queries are evaluated, resulting in code that is actually executed.
I regard this as the M(agic) of Power Query and accepted as fact that you just don't know what is really happening in the background.

As an example, in case of Direct Query with Query Folding (that is: if you have a database with a database management system as data source), an entire query definition is translated into 1 SQL-statement that is fired to the database and executed by the database management system, which will only return the resulting data.
These are the fastest queries.

For the execution time, it doesn't matter if you have many query steps, or if you combine the code of multiple steps into 1 step.

Some recommendations if direct query/query folding is not applicable:

1. Referring to a table row, using an index (e.g. Table1{Index}) is time consuming.
If you want to refer to data on the previous row, it is better to add 2 Index columns - 1 starting with 1 and the other with 0 - and merge the table with itself on these index columns.
This way, you will get the data on the same row, which will be much faster.

2. Before any looping (using List.Accumulate, List.Generate or recursive functions), buffer the table using Table.Buffer.
With recursive functions involving table transformations, you should also buffer the table at each iteration.

3. If you want to perform approximate lookups, it is much faster to transform the lookup table in a buffered list of lists and use that to perform the approximate lookups.
You can find an interesting discussion on TechNet.

4. It seems that DAX tends to be faster than Power Query.

These are the examples I can think of, but I'm sure there will be other recommendations.
 
Thanks, Marcel! That helps a lot.

Charley
 
Back
Top