Separate Values and Text – Part 2

In this week’s post we’re going to circle back to the original post on how to Separate Values and Text in Power Query from November 4, 2015.  That post attracted a couple of suggestions from Imke Feldman and Bill Szysz with better methods.

Other Posts on the Subject

All three of the last post are closely related and, as a reminder, which you need depends on your specific situation, of course.  Here are the three iterations:

The Issue at Hand

So why do we need to examine this again?  Well, the reality is that the solution I built worked perfectly for the data set that I used.  Bill, however, mocked up some different data which looked like this:

SNAGHTML46ae4721

Now, my friend Scott would tell you that the user (I’ll paraphrase this) “should get a stern lesson on proper data entry practices”, but if the data is already in the system… it’s too late and we have to deal with it.

If you tried my method, you’ll find that it fails, as shown below:

SNAGHTML46b0cc07

Basically, any measure that has a number in it, or commas or spaces mid number… they’re all killers to my routine.  So Bill emailed me to show me how he would approach the situation.

Bill’s Method

I’ve broken the steps back a bit from Bill’s original submission, and built it in a way that I think most users will approach this as you’ll see.  (Bill’s original submission was a bit more perfect, but I show how I would have arrived there trying to build what he ended up with.)

If you’d like to follow along, the source workbook can be downloaded here.

Step 1: Pull in the Data

Of course, to start with, we need the data…

  • Create a new query –> From Table
  • Right click the Quantity column –> Transform –> lowercase

This last step is actually quite important.  The reason is that we now want to split the data apart at the first instance of a character between a and z.  Since Power Query is case sensitive, forcing the text to lowercase means that we won’t miss splitting based on a character in the A to Z set.  It also means that we give Power Query less processing to do, since it only has to look for 26 characters, not 52 (both lower and upper case.)

Step 2: Separate Values and Text

Now that we know what we want to do, let’s do it.  Let’s split the text by the first alpha character:

  • Go to Add Column –> Add Custom Column
    • New Column Name –> Value
    • Custom Column Formula:

=Text.SplitAny([Quantity],"abcdefghijklmnopqrstuvwxyz")

This formula is quite interesting, as it will split by an of the characters between the quotes.  Since we forced the text to lowercase, it will react to any letter of the alphabet from a-z or A-Z.  But there is one small issue… it returns a list, not the text:

image

Since we’re only interested in the first item in this list at the moment (everything that precedes the first letter), we can modify the formula to drill in to just the first element.  To do that:

  • Click the gear icon beside the Added Custom step in the Applied Steps window
  • Modify the formula to read as follows:

=Text.SplitAny([Quantity],"abcdefghijklmnopqrstuvwxyz"){0}

SNAGHTML46bed402

Remembering that Power Query counts from a base of zero, and that the number between the curly braces allows us to drill into a specific item in the list, we then get back a list which includes only the first element, as follows:

image

With this done, we can extract the remaining values from the right using some text functions.  (You can learn more about these in my post on 5 Very Useful Text Formulas – Power Query edition, or by reading Chapter 17 of M is for Data Monkey)

  • Go to Add Column –> Add Custom Column
    • New Column Name –> Measure
    • Custom Column Formula:

=Text.End(Text.From([Quantity]), Text.Length([Quantity])-Text.Length([Value]))

At this point, we can identify an issue in the way we stepped through the process.  Can you see it?

image

In the original data set, the L (for litres) was capitalized.  In our output, it’s not.  If you don’t care about this, then skip step 3, but if you think this is important… we need to modify our steps a bit.

Step 3:  Fix the Lower Case Steps

We caused the issue shown above by converting the Quantity column to lower case.  Because that column sticks around, we really need it to retain it’s original format so that we can split the measure and retain the correct case for the characters.  But ideally, we’d like to do this without modifying that original formula as follows:

=Text.SplitAny([Quantity],"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"){0}

So how?  If we need it converted in order to split with a smaller list, what do we need to do?

The answer is to nest the lowercase step into our first Added Custom step.  Let’s go modify the first Added Custom step:

  • Click the gear icon beside the Added Custom step
  • Modify the formula to read as follows:

=Text.SplitAny(Text.Lower([Quantity]),"abcdefghijklmnopqrstuvwxyz")

 

Now, let’s remove the Lowercased Text step and see if it still works (don’t forget to select the Added Custom1 step after you remove the Lowercased Text step:

image

Note:  If you’re expecting this from the beginning, there is obviously no need to convert to lowercase, cause the error then fix it.  You could skip the pain and just wrap the column in a Text.Lower() function to begin with.  The reason I showed this route is that I find I frequently iterate this way when building my own data cleanup scenarios.

Step 4: Final Cleanup

The only thing left to do is convert the Value column to numbers.  You’ll get an error if you try it though, as there are still spaces in the middle of some numbers (the commas are okay, but the spaces aren’t.)

  • Right click the Value column –> Replace Values
    • Value to Find –> a single space
    • Replace With –> leave this area blank
  • Right click the Value column –> Change Type –> Decimal Number

image

And finally:

  • Rename your query
  • Go to Home --> Close & Load

With the results working nicely:

SNAGHTML46dac7b5

4 thoughts on “Separate Values and Text – Part 2

  1. Ken/Bill,

    Nice work.

    If I had a choice, I'd use this option. It's easy to follow, covers more situations and utilises PQ's in-built functions beautifully.

    Keep up the great work!

    Regards, Ed.

  2. Pingback: Excel Roundup 20151207 « Contextures Blog

  3. Nice idea.I am learning with each new post.
    If for any reason the list of caracters needs to be generated rather than typed or hardcoded, the following patterns could help:
    = List.Accumulate({"A".."Z","a".."z"},"",(state, current) => state & current)
    or with ASCII codes:
    = List.Accumulate({32..45,49..64},"", (state, current) => state & Character.FromNumber(current))

Leave a Reply

Your email address will not be published. Required fields are marked *