VLOOKUP (as per Ken P)

DickyMoo

New member
Joined
Mar 7, 2016
Messages
27
Reaction score
0
Points
0
Location
London
Hi all,

I am doing a VLOOKUP in Power Query, Data and Lookup tables attached, code below.

(taken from this post by Ken P: http://www.excelguru.ca/blog/2015/01/28/creating-a-vlookup-function-in-power-query/)

The data table is around 10,000 rows, and that's what I end up with in my query. However, when loading it shows around 1.3m rows loading and takes an age to load.
It uses a fuzzy match, as the lookup table works off code ranges.

Does anyone have an idea why this might be?

Thanks
Rich



Code:
let VLOOKUP_AC = (lookup_value as any, table_array as table, col_index_number as number, optional approximate_match as logical ) as any =>
  let
    /*Provide optional match if user didn't */
    matchtype = 
        if approximate_match = null
        then true
        else approximate_match,


     /*Get name of return column */
    Cols = Table.ColumnNames(table_array),
    ColTable = Table.FromList(Cols, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ColName_match = Record.Field(ColTable{0},"Column1"),
    ColName_return = Record.Field(ColTable{col_index_number - 1},"Column1"),


    /*Find closest match */
    SortData = Table.Sort(table_array,{{ColName_match, Order.Descending}}),
        RenameLookupCol = Table.RenameColumns(SortData,{{ColName_match, "Lookup"}}),
    RemoveExcess = Table.SelectRows(RenameLookupCol, each [Lookup] <= lookup_value),
    ClosestMatch=
        if Table.IsEmpty(RemoveExcess)=true 
        then "#N/A" 
        else Record.Field(RemoveExcess{0},"Lookup"),


    /*What should be returned in case of approximate match? */
    ClosestReturn=
        if Table.IsEmpty(RemoveExcess)=true 
        then "#N/A" 
        else Record.Field(RemoveExcess{0},ColName_return),


    /*Modify result if we need an exact match */
    Return = 
        if matchtype=true
        then ClosestReturn
        else
            if lookup_value = ClosestMatch
            then ClosestReturn
            else "#N/A"
  in Return
in VLOOKUP_AC
 

Attachments

  • PQ Lookup.xlsx
    125.8 KB · Views: 18
So by "fuzzy", you mean approximate, yes? (Closest without going over?) Just clarifying here as fuzzy is actually something different, and this doesn't actually do fuzzy matching.

At any rate, I've pulled together the way I'd use this. Basically the deal is this:
  • Create a staging query to connect to each table (a query that reads the data, but loads to Connection Only)
  • Add the pqVLOOKUP command

The trick is in the next part. What I did was created a new query that references the Chart of Accounts table, then modified the Source step to add the Table.Buffer command:
[noparse]=Table.Buffer(Source)[noparse]

This effectively reads the Chart of Accounts into memory. This "cached" table is then passed to the pqVLOOKUP function, meaning that Power Query doesn't have to re-evaluate what the table is for each pqVLOOKUP call.

In my tests I was able to run 2 calls and get back all the data in about 5 seconds. Sample is attached.
 

Attachments

  • PQ Lookup.xlsx
    291.7 KB · Views: 44
Thanks Ken. Yes you're right, I meant approximate, not fuzzy.

I'll have a look in the morning when I get back in the office and let you know the results. I'm doing this in Power BI, does it work in the same way?

Would you say this approach is the best way of doing this?
 
Yes, it should work the same in Power BI. The secret is in the Table.Buffer() to cache the results into a static table, then passing that table into the vlookup, rather than passing a "dynamic" table into the vlookup. (Those terms aren't perfect, but without using Table.Buffer, power query will try to re-calculate the table each time it is passed into the function.)
 
Thank you so much Ken! Last night the query took nearly 30 mins to run, this morning it's done in about 10 seconds.
 
That's a pretty good improvement!

Just don't get tempted to add Table.Buffer to all your queries... it does add overhead to read it to memory. But as a rule of thumb, I do usually try to buffer a table before I pass it into a function. :)
 
PQ Merge is NOT SAME AS vlookup exact!!??

PQ is not equivalent to vlookup with exact match.
vlookup only returns the first matching row from the 2nd table.
PQ merge (NestedJoin LeftOuter) returns all matching rows.

So I am confused why Ken says here at excelguru and in the M is for DATA MONKEY book that it is equivalent.
I have not found an equivalent in PQ. Because the Removed Duplicates rows Step does not observe the prior Step sort!

How do I make the Removed Duplicates rows Step observe the prior Step sort?
 
You need to buffer the table.

Between the sort step and the remove duplicates step, add a step: Buffer = Table.Buffer(#"Previous Step Name")
 
Wow, that is huge. This is the first I've seen anywhere for a Power Query VLOOKUP (exact match).
It works!
Ken, please add Table.Buffer to your book! (I do not see it in the Index)

THANK YOU!
 
PQ is not equivalent to vlookup with exact match.
vlookup only returns the first matching row from the 2nd table.
PQ merge (NestedJoin LeftOuter) returns all matching rows.

True enough, Kevin. That comment was made under the assumption that data in the lookup table did not have duplicates (I never use VLOOKUP against a table with duplicates out of principal.) As Rudi mentions, buffering the table should preserve the step when before you merge. (The same is true of sorting, actually.)

We are going to be adding Table.Buffer() to the book when we do another version.
 
Out of principal, I have a choice:
1. Make a separate copy of the table, sort, remove duplicates, vlookup.
2. Sort, vlookup.

It just doesn't make sense to me to do two extra steps AND have an extra table to maintain.
If my thinking is correct, WE NEED A TRUE VLOOKUP FUNCTION IN POWER QUERY AND ALSO IN DAX!
 
Wait, can we go back to what you're trying to do here? Why do you need a separate copy of the table? You're pulling it in to Power Query, correct?

Pull in the "lookup" table, sort, remove duplicates and stop. Give it a name like "Lookup" and load it as Connection only.
Pull in the table that you want as a source, and give it a name like "Seek" and load it as Connection only.

Now
-Create a new blank query
-In the formula bar type =Table.Buffer(Lookup)
-Click the fx button to add a new step
-In the formula bar type =Seek
-Add a new column for the pqVlookup function that was referred to earlier in this thread, using Lookup as the table name

Load this query wherever you need it.

What's not working?
 
Back
Top