Results 1 to 5 of 5

Thread: Power Query - Address search with (expanded) fuzzy logic

  1. #1
    Seeker pinarello's Avatar
    Join Date
    Jun 2019
    Location
    Germany
    Posts
    16
    Articles
    0
    Excel Version
    Office 365

    Power Query - Address search with (expanded) fuzzy logic



    Register for a FREE account, and/
    or Log in to avoid these ads!

    A few years ago, just as a hobby, I developed a fuzzy search in our SAP system that, in my opinion, also works well. Then I had the idea to implement it in Excel. But since my VBA knowledge is rather modest, I didn't try to realize this idea.

    But since I have been working with Power Query for some months now, I had the idea to realize the fuzzy address search with it, also because "Fuzzy" is offered as an option when joining 2 tables.

    So I created a simple address table and a search mask and started.
    First I wanted to match the search term "First name" (Connection Search-terms) against the addresses using Fuzzy-Join and load the result as a connection. This also works. But if I now want to connect this result (Search_F_name (3)) with the "Last name" from the connection "Search-terms" with a fuzzy join, then I run into an error, which I do not understand. This error is contained in the query "Search_L_name (3)".

    Therefore I have changed my strategy and ask all search terms against the address table and show at the end the ones where the number of hits matches the number of search terms. I have created the following queries: Search_F_name, Search_L_name, Search_Street, Search_No, Search_Zip, Search_City and Search_result. The result is shown on the left side of the spreadsheet "Search".

    Many different spellings of the name "Meier", but show the limits of the integrated fuzzy logic. That's why I wanted to improve it and created the table "Replacements", which is currently only maintained for "Meier". In the queries "Search-terms (2) and "Addresses (2)" I wanted to prepare, in additional columns, first name, last name, street and city for better search results.
    Here I have the problem that the new column First_name, which is created from First name (without subtraction), is deleted as soon as I generate the column Last_name from Last name. So the question is, how can I perform multiple text replacements in a query?

    But to see if and how the extended fuzzy logic affects the last name, I have only used it for the last name. The result is shown in the "Search" worksheet on the right.

    Here are my questions again:
    1. how can the fuzzy factor be set dynamically?
    2. why can't the result of a fuzzy query be used as input for the next fuzzy query?
    3. How can I apply replacements read from a table to multiple columns in a query?
    Attached Files Attached Files

  2. #2
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    2,344
    Articles
    57
    Blog Entries
    14
    Excel Version
    Excel Office 365 Insider
    So the fuzzy factor can be set dynamically by creating a query and drilling down to just the factor, then replacing the factor in the fuzzy M code with that query name.

    With regards to #2, it's a bug. The team knows about it, but there's no ETA on a fix yet. For right now, buffer the results of the table with Table.Buffer() before you go into your next fuzzy join.
    Ken Puls, FCPA, FCMA, MS MVP

    Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

  3. #3
    Seeker pinarello's Avatar
    Join Date
    Jun 2019
    Location
    Germany
    Posts
    16
    Articles
    0
    Excel Version
    Office 365
    Quote Originally Posted by Ken Puls View Post
    So the fuzzy factor can be set dynamically by creating a query and drilling down to just the factor, then replacing the factor in the fuzzy M code with that query name.
    That sounds good. But my PQ knowledge is not yet sufficient to implement this answer.


    Quote Originally Posted by Ken Puls View Post
    With regards to #2, it's a bug. The team knows about it, but there's no ETA on a fix yet. For right now, buffer the results of the table with Table.Buffer() before you go into your next fuzzy join.
    At what point, I need to apply the Table.buffer for it to work.

    And do you have any answer to my third question?

  4. #4
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    2,344
    Articles
    57
    Blog Entries
    14
    Excel Version
    Excel Office 365 Insider
    Hi,

    In the attached workbook, I've modified the first step of the Search_L_Name query to pull in P_Fuzzy dynamically. Basically just sub that name in place of the value. (The only trick is that if you called your query "P Fuzzy", you'd have to refer to it a #"P Fuzzy", but the underscore is fine.

    I've also added two steps to Search_L_Name(3) at the beginning to buffer your tables before then merging them. This takes some hacking of the code, but does allow it to proceed. After that I had to remove a couple of your columns to remove conflicts, but hopefully you can follow and check that part.

    As for part 3, I wasn't really sure what you meant by that, and figured that it might answer itself by taking care of these two.

    Hope this helps,
    Attached Files Attached Files
    Ken Puls, FCPA, FCMA, MS MVP

    Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

  5. #5
    Seeker pinarello's Avatar
    Join Date
    Jun 2019
    Location
    Germany
    Posts
    16
    Articles
    0
    Excel Version
    Office 365
    Hello, Ken,

    first of all many thanks for this great support!!!

    I imagine that I had defined the dynmaic fuzzy factor in the same way. But I couldn't tell that it worked. I was also irritated that it was only visible in the extended editor, but not in the function editor. But now I can see that it works.

    Also, with my current knowledge, I would never have had the idea to use the table buffer and the join as you did it in the query "Search_L_name (3)". But I can confirm that it works.

    So I should be able to do these adjustments for the following searches as well.

    To my 3rd question, which is about preparing the search columns for better fuzzy results, I found a solution in the meantime.

    Instead of making replacements in the columns First name, Last name, Street and City and creating a new column each time, I now first create a copy of these columns and then do a single replacement.

    For this I use the following code:

    let

    TextToChange = Excel.CurrentWorkbook(){[Name="Addresses"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(TextToChange,{{"First name", type text}, {"Last name", type text}, {"Street", type text}, {"No", type text}, {"Zip", type text}, {"City", type text}}),

    Replaced_Value = Table.ReplaceValue(#"Changed Type",null," ",Replacer.ReplaceValue,{"First name", "Last name", "Street", "No", "Zip", "City"}),
    #"Duplicated Column" = Table.DuplicateColumn(Replaced_Value, "First name", "First name - Kopie"),
    #"Duplicated Column1" = Table.DuplicateColumn(#"Duplicated Column", "Last name", "Last name - Kopie"),
    #"Duplicated Column2" = Table.DuplicateColumn(#"Duplicated Column1", "Street", "Street - Kopie"),
    #"Duplicated Column3" = Table.DuplicateColumn(#"Duplicated Column2", "City", "City - Kopie"),
    #"Capitalized Each Word" = Table.TransformColumns(#"Duplicated Column3",{{"First name - Kopie", Text.Proper, type text}, {"Last name - Kopie", Text.Proper, type text}, {"Street - Kopie", Text.Proper, type text}, {"City - Kopie", Text.Proper, type text}}),

    Replacements = List.Buffer(Table.ToRows(Excel.CurrentWorkbook(){[Name="Replacements"]}[Content])),
    ReplacementFunction = each List.Accumulate(Replacements, _, (t, r) => Text.Replace((t), Text.From(r{0}), (r{1}) )),

    Output = Table.TransformColumns(#"Capitalized Each Word",

    List.Transform(List.FindText(Table.ColumnNames(#"Capitalized Each Word"), "- Kopie"), each {_} & {ReplacementFunction})),

    #"Changed Type1" = Table.TransformColumnTypes(Output,{{"First name - Kopie", type text}, {"Last name - Kopie", type text}, {"Street - Kopie", type text}, {"City - Kopie", type text}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"First name - Kopie", "First_name"}, {"Last name - Kopie", "Last_name"}, {"Street - Kopie", "Street_x"}, {"City - Kopie", "City_x"}})

    in
    #"Renamed Columns"

    When I have inserted all the changes into my queries and the whole thing works the way I want it to, I will upload the finished workbook here again.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •