Keep Only Numbers in Power Query

My last blog post was interesting in that I got a few emails about it.  Both Imke Feldman and Bill Szysz sent me better methods, and a blog commenter asked for a slightly different version.  For this post, I’m going to adapt Imke’s technique to show how we can Keep Only Numbers from a string of text (removing all other characters.)

Other Posts on the Subject

Each of these posts will be a targeted to a specific scenario with it’s own idiosyncrasies.  Which you need depends on your specific situation, of course

  • My original post to split off measurements leaving only the numbers (this will only work if there are no numbers in the measurement.)
  • The method in this post (which will remove all numbers – or text – in the input)
  • Bill Szysz’s method to split off measurements (coming in a future post, but better than my original as it doesn’t break when measurements also include numbers)

In this Post:

In this post, we are going to keep only numbers in our data set.  Actually, we’ll also keep spaces and decimals the first time around, but we could easily modify the function to clear those too.  So for our first go, we’ll convert the data in the left column below, to show as displayed in the right column:

image

Of course, I started by just pulling the data into Power Query via the From Table command.

How to Keep Only Numbers

Looking at this from a logic point of view, what we want to accomplish is to remove any character that is not a number.  Ideally, we would like to use a function like this in a custom column in order to do so:

=Text.Remove(text as nullable text, removeChars as any)

The first parameter should be pretty easy, we could just feed in the [Quantity] column, but how would we provide all the characters to the last parameter?

Here’s the cool part… removeChars is an “any” datatype… that means we’re not restricted to a single character, we can actually provide a list.  So all we need to do is find a way to create a list of the characters to remove.

This is where Imke’s email to me was really helpful.  She had a step similar to the following in her code:

CharsToRemove = List.Transform({33..45,47,58..126}, each Character.FromNumber(_))

So what does this do?  It actually creates a list of non-contiguous numbers (33-45, 47, 58-126), then transforms each value in the list into it’s alphanumeric equivalent.  A partial set of the results is shown here:

SNAGHTML82d28c1

For reference, character 32 is a space, 46 is a period, and 49-57 are the values from 0 through 9 – facts that you can discover by changing the values inside the lists.

In order to use this, I just popped into the Advanced Editor, and pasted the line above right between the “let” and “Source=…” lines.  (Don’t forget to add a comma at the end.)  And with a nice list of values contained the the CharsToRemove step, we can now create the custom column from the Source step:

  • Add Columns –> Add Custom Column
    • Name:  Result
    • Formula:  =Text.Remove([Quantity],CharsToRemove)

And it loads up nicely:

image

Now, keep in mind here that the purposed of this is to strip all characters except the numbers.  In the case of things like m2 and m3 in this data set, we’re left with a the final value, but that is exactly what the query is designed to do.

The final M code for this solution is:

let
CharsToRemove = List.Transform({33..45,47,58..126}, each Character.FromNumber(_)),
Source = Excel.CurrentWorkbook(){[Name="RawData"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Quantity", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each Text.Remove([Quantity],CharsToRemove))
in
#"Added Custom"

Keeping Only Numbers

What if we wanted to also remove any spaces and decimals?  Easy enough, just add those values to the original list in the CharsToRemove step as follows:

CharsToRemove = List.Transform({32,46,33..45,47,58..126}, each Character.FromNumber(_))

And the result:

image

Removing Numbers Only

Now let’s keep the text and remove the numeric characters from 0-9 only.  To do this we modify the original list values again:

CharsToRemove = List.Transform({48..57}, each Character.FromNumber(_))

 

 

And the result:

image

End Result

This is pretty neat.  Once we recongnize which character code represents each character, we can build a list of those characters to remove, and take care of them all in one shot.  To put it all together, here is a look at the different views all shown in one table:

image

You can also download the completed file here.

3 thoughts on “Keep Only Numbers in Power Query

  1. Cool approach to the problem. A tiny comment about this. If we know what we want to remove then we should use a list contains only these characters ( because of efficiency). Shorter list - better performance. So for example, if we want to remove letters only, we can use construction like this below
    CharsToRemove = {"a".."z"}&Text.ToList("ąćęłńśóźż")
    and then the last step looks like this
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each Text.Remove(Text.Lower([Quantity]), CharsToRemove)).

    This is more efficient than using similar construction
    CharsToRemove = {"a".."z","A".."Z"}&Text.ToList("ąćęłńóśźżĄĆĘŁŃÓŚŹŻ")
    and
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each Text.Remove([Quantity], CharsToRemove))
    But like I said on the beginning, Imke's approach you've shown in this post was very cool 🙂
    Thanks 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *