Results 1 to 4 of 4

Thread: Problem with "Remove Duplicates" in "Get & Transform" (Excel 2016)

  1. #1

    Question Problem with "Remove Duplicates" in "Get & Transform" (Excel 2016)



    Register for a FREE account, and/
    or Log in to avoid these ads!

    The problem is with "Remove Duplicates" in "Get & Transform" in Excel 2016. It also persists with with "Power Query" in Excel 2013.

    The data source is a csv file data.csv
    The processed Excel file is Duplicates_not_removed.xlsx

    Using Excel 2016, I've used "Get & Transform" to carry out the following steps:

    . Loaded data from file data.csv. There are four text columns - Col_A to Col_D
    . Changed all column types to "Text"
    . Added a custom column containing the length of data in Col_D using Text.Transform()
    . Filtered the new column to keep only text with length 2
    . Removed the newly added column
    . Merged columns Col_A to Col_D to get a single column Merged_for_primary
    . Selected the new single column and clicked on "Remove Duplicates"
    . "Close and Load" data to Sheet1
    . Using "Conditional Formatting" in Excel, I've highlighted duplicate items in the single column in Sheet1
    . As can be seen from the attached Excel file Duplicates_not_removed.xlsx, the column contains duplicate items despite running "Remove Duplicates" from "Get & Transform"

    I've checked with "Power Query" from Excel 2013 and the problem persists there also.

    "Remove Duplicates" from "Get & Transform" or "Power Query" did not do its job but Excel can easily detect duplicates. What seems to be the problem?

    I'd be grateful for any help.
    Attached Files Attached Files

  2. #2
    Sorry for the mistakes in above post. The function used to detect length of strings in Col_D is Text.Length() and not Text.Transform() as mistakenly written above.

    The values shown as duplicates are 2203-00-001-Ao and 2203-00-001-AO. The difference is that the last characters of the two text values are in small case in one and upper case in other. Still, they are treated as duplicates and when loaded to data model, it refuses to create relationships using that column stating that there are duplicates in that column.

    The workaround could be converting the column to upper case and then removing duplicates but still it would have been nicer to not treat those strings as duplicates EVERYWHERE.

  3. #3
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    2,401
    Articles
    45
    Blog Entries
    14
    Excel Version
    Excel Office 365 Insider
    So this is interesting, but honestly, if I put this to either the Power Query or Power Pivot team, they will tell me that the tools are working "by design". The issue here is that - since both were designed by different teams - their designs are different.

    Excel (and Power Pivot) is very loose on comparisons and generally ignores case sensitivity. Power Query on the other hand, is VERY sensitive to this.

    I'll be honest, your workaround is the solution, it's just that you need to know to do it.

    I do hear you though, and get how it isn't what you'd expect as an Excel user. But I think it's something that we're just going to have to live with as I don't see them changing it. (Honestly, I don't think they CAN change it without breaking a huge amount of existing solutions today.) What would be nice is if they added an option to ignore case sensitivity. I'd suggest that the best way to fix this is to log suggestions at http://ideas.powerbi.com and http://excel.uservoice.com
    Ken Puls, FCPA, FCMA, MS MVP

    Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

  4. #4
    Power Query can do most of what can be done with the 'dplyr' and 'tidyr' packages of R. And it provides visual methods to do it, so it's really useful and much easier to learn. The problem is with the inconsistency with Power Pivot and Excel functions.

    In Power Query, the functions doing the same tasks as Excel are named differently and they are case sensitive. PQ uses Base 0 method instead of Base 1 method of Excel.

    But to me, the strangest thing is that Excel and Power Pivot comparisons regarding duplicate cell values are not case-sensitive whereas PQ comparisons are! We need to remember to convert the key field to upper or lower case uniformly and then remove duplicates before exporting it as a data model into Power Pivot. If we forget to convert case, Power Pivot won't allow relationships stating that there are duplicates.
    Last edited by joysan7; 2016-12-20 at 04:40 AM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •