Clean WhiteSpace in PowerQuery

The other day as I was working through a model, I once again tripped upon the fact that Power Query’s Text.Trim function doesn’t clean whitespace inside the text string, only at the ends.  For those who are used to Excel’s TRIM function, this is a frustrating inconsistency.

Just to circle on it, here’s the difference:

Source Function Result
Excel =TRIM(“  trim   me  “) “trim me”
Power Query =Text.Trim(“  trim   me  “) “trim   me“

Typically, I’ve just gone through the cycle of replacing a double space with a single space a few times on the same column to deal with this issue.  The issue, of course, is that you need to do this twice if there are 4 spaces, but add more spaces, and you have to do this more times.  Doesn’t seem like a really robust solution.

At any rate, this time I emailed one of my friends on the Power Query team and suggested that they should implement a function to make this a bit easier.

My Suggestion for a Clean Whitespace Function

The gist of my suggestion was to create a new function that would not only trim the whitespace internally, but would also allow you to specify which character you want to clear out.  This way it would work nicely to clean whitespace in the shape of spaces (the usual culprit in my world), but would also allow you to substitute in other characters if needed.  (Maybe you need to replace all instances of repeating 0’s with a single 0.)

It got referred to another friend on the team, (who wishes to remain nameless,) and he responded with some pretty cool code.  I’ve taken that code, broken it down and modified it a bit, and the end result is a slightly different version that can work the same as Excel’s TRIM() function, but adds an optional parameter to make it even more robust.  For lack of a better name, I’m going to call it “PowerTrim”.  (Just trying to do my part to keep the Power in Power Query!) 😉

Here’s the function:

(text as text, optional char_to_trim as text) =>
let
char = if char_to_trim = null then " " else char_to_trim,
split = Text.Split(text, char),
removeblanks = List.Select(split, each _ <> ""),
result=Text.Combine(removeblanks, char)
in
result

And to implement it, you’d take the following steps:

  • Copy the code above
  • Create a new query –> From Other Sources –> Blank Query
  • Change the query name to PowerTrim
  • Go into the Advanced Editor
  • Select all the text and replace it with the code above –> Done

Like this:

image

How it Works

We’d call this from a custom column, feeding in a column of text, and specifying the character (or even string of characters) we’d like to trim.  The function then works through the following process:

  • It checks to see if the char_to_trim was provided, and uses a space if not
  • It splits the text by that character, resulting in a list:

image

(This list shows the word “bookkeeper” split by “e”)

It then:

  • Filters out any blank rows
  • Combines the remaining items using the original character to split by

(The original version was actually all rolled up in one line, but I find it easier to debug, step through, examine and play with when it’s separated.)

Demo

Here’s some examples of the function in action. I started with a raw table from Excel.  (Create a new query –> From Table)

image

And added a Custom column by going to Add Column –> Add Custom Column

  • Name:  Trim_null
  • Formula:  =PowerTrim([Text])

image

Notice that in the first row it trimmed the leading, trailing and internal spaces.  Just like Excel!  (Remember that if you used Power Query’s default Text.Trim() function, you would return “trim   me”, not “trim me”.)

Now, let’s add another and try with an alternate character… like 0.  Again, we go to Add Column –> Add Custom Column:

  • Name:  Trim_0
  • Formula:  =PowerTrim([Text],”0”)

image

In this case the extraneous zeroes are trimmed out of row 3, leaving only a single one.  Cool stuff.  Now what about the “e”. Let’s see how that one goes.

Once more to Add Column –> Add Custom Column:

  • Name:  Trim_0
  • Formula:  =PowerTrim([Text],”e”)

image

The first time I looked at this, I thought there was an issue with the function.  But then I remembered in this case we are removing all leading and trailing e’s, as well as replacing any duplicate e’s with a single e.  You can see that this is indeed what happened in both rows 2 and 4.

Final Thoughts

I wish there was a way to get this to easily role into the Text functions category, so that I could call it something like Text.PowerTrim() or even replace the Text.Trim() function with my own.  Unfortunately a query name can’t contain the period character, which kind of sucks.  I guess it’s to to protect you from accidentally overwriting a function, but I’d like the ability to do it intentionally.

14 thoughts on “Clean WhiteSpace in PowerQuery

  1. Pingback: Transform table column using own function in Power Query – Ivan Bond's blog

  2. useful function. Thanks! Do you know if you can make the replacement inside the same column, without adding a new one.

  3. Unfortunately there isn't a one click transformation to do this on an existing column, no. Would be nice to have though, so maybe submit it as an idea at http://excel.uservoice.com. Until we see it in the product though, the custom function is going to be the best route.

  4. Ken...
    I just thought I'd drop a note of appreciation for your blog. I end up here a lot when searching to solutions for problems in power query and find the information you post invaluable. This function was fantastic for cleaning up a text file that had irregular columns.
    Cheers,
    Mike

  5. Evening Ken. I'd just lifted the above and was about to deploy when I stumbled across the following by accident
    If you click on a column and hit the filter arrow you will an option, 5th one down entitled 'Remove Empty'
    I'm in Power BI desktop and can't say I've ever seen this option before
    Presumably it would get rid of rows where there is all white space, i.e. empty?
    Or it could be something entirely different?

  6. Anthony, I always thought that was truly empty, as in it didn't even say null or have any spaces or other characters. But could be wrong there too... I've never actually used that one.

  7. There are at least two types of spaces - ordinary and non-breaking space.
    For example "asdf zxcv" here are three spaces between asdf and zxcv. I.e. asdf[space][nbsp][space]zxcv. Not sure your website's parser displays it properly. So if you apply your function to this string, nothing changes

    This (a bit modified) function fixes this problem
    ====
    (text as text, optional char_to_trim as text) =>
    let
    char = if char_to_trim = null then " " else char_to_trim,
    nbsp = if char=" " then Character.FromNumber(160) else "",
    split = Text.SplitAny(text, char & nbsp),
    removeblanks = List.Select(split, each _ ""),
    result=Text.Combine(removeblanks, char)
    in
    result
    ====
    You can also add some more spaces (or not spaces) to nbsp variable.

    Read more: Whitespace character - https://en.wikipedia.org/wiki/Whitespace_character

  8. Thank you! This is exactly what I needed to clean up some merged columns. It has saved me a ton of time

If you have a comment or question about the blog post content, please feel free to post it here. If you need help adapting this solution to your own needs, please post in our free help forum.

Your email address will not be published. Required fields are marked *