Power Query - Address search with (expanded) fuzzy logic

pinarello

Member
Joined
Jun 21, 2019
Messages
214
Reaction score
4
Points
18
Location
Germany
Excel Version(s)
Office 365
A few years ago, just as a hobby, I developed a fuzzy search in our SAP system that, in my opinion, also works well. Then I had the idea to implement it in Excel. But since my VBA knowledge is rather modest, I didn't try to realize this idea.

But since I have been working with Power Query for some months now, I had the idea to realize the fuzzy address search with it, also because "Fuzzy" is offered as an option when joining 2 tables.

So I created a simple address table and a search mask and started.
First I wanted to match the search term "First name" (Connection Search-terms) against the addresses using Fuzzy-Join and load the result as a connection. This also works. But if I now want to connect this result (Search_F_name (3)) with the "Last name" from the connection "Search-terms" with a fuzzy join, then I run into an error, which I do not understand. This error is contained in the query "Search_L_name (3)".

Therefore I have changed my strategy and ask all search terms against the address table and show at the end the ones where the number of hits matches the number of search terms. I have created the following queries: Search_F_name, Search_L_name, Search_Street, Search_No, Search_Zip, Search_City and Search_result. The result is shown on the left side of the spreadsheet "Search".

Many different spellings of the name "Meier", but show the limits of the integrated fuzzy logic. That's why I wanted to improve it and created the table "Replacements", which is currently only maintained for "Meier". In the queries "Search-terms (2) and "Addresses (2)" I wanted to prepare, in additional columns, first name, last name, street and city for better search results.
Here I have the problem that the new column First_name, which is created from First name (without subtraction), is deleted as soon as I generate the column Last_name from Last name. So the question is, how can I perform multiple text replacements in a query?

But to see if and how the extended fuzzy logic affects the last name, I have only used it for the last name. The result is shown in the "Search" worksheet on the right.

Here are my questions again:
1. how can the fuzzy factor be set dynamically?
2. why can't the result of a fuzzy query be used as input for the next fuzzy query?
3. How can I apply replacements read from a table to multiple columns in a query?
 

Attachments

  • Excel PQ - Address search.xlsm
    59 KB · Views: 17
So the fuzzy factor can be set dynamically by creating a query and drilling down to just the factor, then replacing the factor in the fuzzy M code with that query name.

With regards to #2, it's a bug. The team knows about it, but there's no ETA on a fix yet. For right now, buffer the results of the table with Table.Buffer() before you go into your next fuzzy join.
 
So the fuzzy factor can be set dynamically by creating a query and drilling down to just the factor, then replacing the factor in the fuzzy M code with that query name.

That sounds good. But my PQ knowledge is not yet sufficient to implement this answer.


With regards to #2, it's a bug. The team knows about it, but there's no ETA on a fix yet. For right now, buffer the results of the table with Table.Buffer() before you go into your next fuzzy join.

At what point, I need to apply the Table.buffer for it to work.

And do you have any answer to my third question?
 
Hi,

In the attached workbook, I've modified the first step of the Search_L_Name query to pull in P_Fuzzy dynamically. Basically just sub that name in place of the value. (The only trick is that if you called your query "P Fuzzy", you'd have to refer to it a #"P Fuzzy", but the underscore is fine.

I've also added two steps to Search_L_Name(3) at the beginning to buffer your tables before then merging them. This takes some hacking of the code, but does allow it to proceed. After that I had to remove a couple of your columns to remove conflicts, but hopefully you can follow and check that part.

As for part 3, I wasn't really sure what you meant by that, and figured that it might answer itself by taking care of these two.

Hope this helps,
 

Attachments

  • Excel PQ - Address search.xlsm
    60 KB · Views: 19
Hello, Ken,

first of all many thanks for this great support!!!

I imagine that I had defined the dynmaic fuzzy factor in the same way. But I couldn't tell that it worked. I was also irritated that it was only visible in the extended editor, but not in the function editor. But now I can see that it works.

Also, with my current knowledge, I would never have had the idea to use the table buffer and the join as you did it in the query "Search_L_name (3)". But I can confirm that it works.

So I should be able to do these adjustments for the following searches as well.

To my 3rd question, which is about preparing the search columns for better fuzzy results, I found a solution in the meantime.

Instead of making replacements in the columns First name, Last name, Street and City and creating a new column each time, I now first create a copy of these columns and then do a single replacement.

For this I use the following code:

let

TextToChange = Excel.CurrentWorkbook(){[Name="Addresses"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(TextToChange,{{"First name", type text}, {"Last name", type text}, {"Street", type text}, {"No", type text}, {"Zip", type text}, {"City", type text}}),

Replaced_Value = Table.ReplaceValue(#"Changed Type",null," ",Replacer.ReplaceValue,{"First name", "Last name", "Street", "No", "Zip", "City"}),
#"Duplicated Column" = Table.DuplicateColumn(Replaced_Value, "First name", "First name - Kopie"),
#"Duplicated Column1" = Table.DuplicateColumn(#"Duplicated Column", "Last name", "Last name - Kopie"),
#"Duplicated Column2" = Table.DuplicateColumn(#"Duplicated Column1", "Street", "Street - Kopie"),
#"Duplicated Column3" = Table.DuplicateColumn(#"Duplicated Column2", "City", "City - Kopie"),
#"Capitalized Each Word" = Table.TransformColumns(#"Duplicated Column3",{{"First name - Kopie", Text.Proper, type text}, {"Last name - Kopie", Text.Proper, type text}, {"Street - Kopie", Text.Proper, type text}, {"City - Kopie", Text.Proper, type text}}),

Replacements = List.Buffer(Table.ToRows(Excel.CurrentWorkbook(){[Name="Replacements"]}[Content])),
ReplacementFunction = each List.Accumulate(Replacements, _, (t, r) => Text.Replace((t), Text.From(r{0}), (r{1}) )),

Output = Table.TransformColumns(#"Capitalized Each Word",

List.Transform(List.FindText(Table.ColumnNames(#"Capitalized Each Word"), "- Kopie"), each {_} & {ReplacementFunction})),

#"Changed Type1" = Table.TransformColumnTypes(Output,{{"First name - Kopie", type text}, {"Last name - Kopie", type text}, {"Street - Kopie", type text}, {"City - Kopie", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"First name - Kopie", "First_name"}, {"Last name - Kopie", "Last_name"}, {"Street - Kopie", "Street_x"}, {"City - Kopie", "City_x"}})

in
#"Renamed Columns"

When I have inserted all the changes into my queries and the whole thing works the way I want it to, I will upload the finished workbook here again.
 
Back
Top