Keep Only Numbers in Power Query

My last blog post was interesting in that I got a few emails about it.  Both Imke Feldman and Bill Szysz sent me better methods, and a blog commenter asked for a slightly different version.  For this post, I’m going to adapt Imke’s technique to show how we can Keep Only Numbers from a string of text (removing all other characters.)

Other Posts on the Subject

Each of these posts will be a targeted to a specific scenario with it’s own idiosyncrasies.  Which you need depends on your specific situation, of course

  • My original post to split off measurements leaving only the numbers (this will only work if there are no numbers in the measurement.)
  • The method in this post (which will remove all numbers – or text – in the input)
  • Bill Szysz’s method to split off measurements (coming in a future post, but better than my original as it doesn’t break when measurements also include numbers)

In this Post:

In this post, we are going to keep only numbers in our data set.  Actually, we’ll also keep spaces and decimals the first time around, but we could easily modify the function to clear those too.  So for our first go, we’ll convert the data in the left column below, to show as displayed in the right column:

image

Of course, I started by just pulling the data into Power Query via the From Table command.

How to Keep Only Numbers

Looking at this from a logic point of view, what we want to accomplish is to remove any character that is not a number.  Ideally, we would like to use a function like this in a custom column in order to do so:

=Text.Remove(text as nullable text, removeChars as any)

The first parameter should be pretty easy, we could just feed in the [Quantity] column, but how would we provide all the characters to the last parameter?

Here’s the cool part… removeChars is an “any” datatype… that means we’re not restricted to a single character, we can actually provide a list.  So all we need to do is find a way to create a list of the characters to remove.

This is where Imke’s email to me was really helpful.  She had a step similar to the following in her code:

CharsToRemove = List.Transform({33..45,47,58..126}, each Character.FromNumber(_))

So what does this do?  It actually creates a list of non-contiguous numbers (33-45, 47, 58-126), then transforms each value in the list into it’s alphanumeric equivalent.  A partial set of the results is shown here:

SNAGHTML82d28c1

For reference, character 32 is a space, 46 is a period, and 49-57 are the values from 0 through 9 – facts that you can discover by changing the values inside the lists.

In order to use this, I just popped into the Advanced Editor, and pasted the line above right between the “let” and “Source=…” lines.  (Don’t forget to add a comma at the end.)  And with a nice list of values contained the the CharsToRemove step, we can now create the custom column from the Source step:

  • Add Columns –> Add Custom Column
    • Name:  Result
    • Formula:  =Text.Remove([Quantity],CharsToRemove)

And it loads up nicely:

image

Now, keep in mind here that the purposed of this is to strip all characters except the numbers.  In the case of things like m2 and m3 in this data set, we’re left with a the final value, but that is exactly what the query is designed to do.

The final M code for this solution is:

let
CharsToRemove = List.Transform({33..45,47,58..126}, each Character.FromNumber(_)),
Source = Excel.CurrentWorkbook(){[Name="RawData"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Quantity", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each Text.Remove([Quantity],CharsToRemove))
in
#"Added Custom"

Keeping Only Numbers

What if we wanted to also remove any spaces and decimals?  Easy enough, just add those values to the original list in the CharsToRemove step as follows:

CharsToRemove = List.Transform({32,46,33..45,47,58..126}, each Character.FromNumber(_))

And the result:

image

Removing Numbers Only

Now let’s keep the text and remove the numeric characters from 0-9 only.  To do this we modify the original list values again:

CharsToRemove = List.Transform({48..57}, each Character.FromNumber(_))

 

 

And the result:

image

End Result

This is pretty neat.  Once we recongnize which character code represents each character, we can build a list of those characters to remove, and take care of them all in one shot.  To put it all together, here is a look at the different views all shown in one table:

image

You can also download the completed file here.

Last PowerQuery.Training Class of 2015!

There will be a regular blog post coming later this week, but we wanted to just throw out a quick heads up that we are currently accepting registrations for the last PowerQuery.Training class of 2015.

Registrations are open now for the class which begins on November 24, 2015.  This will be your last chance of 2015 to get an in depth training class on the best damn tool to hit Excel in 20 years.  (Sorry Power Pivot, but Power Query is going to reach more people overall.)

For more details on why you need to take this amazing live online workshop, check out the details here.

To register, you can follow this link (and click the Register button on the bottom right of the page.)

And don’t forget that when you register you get a free digital copy of our amazing new M is for Data Monkey book too.

Hope to see you there!

Separate Values and Text in Power Query

I recently received a comment on one of my blog posts asking how to separate values and text, especially when there is no common delimiter such as a space separating them.  This is a bit of an interesting one, as there is no obvious function to make this happen.

Background

The scenario the user has here is a list of values with their unit of measure, similar to this:

image

This issue here is that we don’t really have anything to easily split this up, as there isn’t really a good pattern.  Sometimes there are spaces after the values, sometimes not.  The letters change, and there is non consistency to how many characters the values represent.  So how would you approach this?

You can download the sample workbook here.

My Route

I think that a solution for this type of problem is going to be specific to the data in use.  Looking at the sample data, I figured that I can probably bank on all the numbers being at the beginning of the string, and that I probably won’t see something like square meters expressed as m2.  Of course, if that assumption wasn’t correct, I’d have to come up with another method.

At any rate, the angle I approached this was to build a custom function to remove the leading numeric values.  That should leave me with the text values, which I could then replace in the original string.  Let’s take a look.

Removing Numbers

As we recommend in M is for Data Monkey, the way to build a custom function is to start with a regular query that will let us step through each piece you need to do.

So focussing on doing this through the user interface, here’s how I started this solution.

  • Create new Power Query –> From Other Sources –> Blank Query
  • In the formula bar, I typed in 1.07Kg  (no quotes, just that text and pressed Enter
  • I then right clicked the text in the Power Query window, and choose to convert it to a list

image

Of course, you can’t do a ton with Lists in the user interface, so I converted it to a table:

  • List Tools –> Transform –> To Table –> OK

To be fair, I could have started by creating a record or a list from scratch (as we show you how to do in M is for Data Monkey,) but I didn’t really need to here in order to get up and running quickly.  Regardless, I’m now sitting in a nice place where I have the entire UI exposed to do what I need (which was my original goal.)

SNAGHTMLaf3bbeb

At this point, things become pretty easy:

  • Right click Column1 –> Replace Values –> Replace 0 with nothing
  • Repeat for 1 through 9 and the decimal character

This removed all numbers and decimals, leaving me with just text.  But because I know some of the values had spaces in them as well, I should deal with that:

  • Right click Column1 –> Transform –> Trim

image

The final thing I did was to drill into the data point there, as I don’t really want to return a table when I convert this into a function.  To do that I needed to:

  • Click the fx on the left of the formula bar
  • Append the following to the text in the formula bar:  [Column1]{0}

image

Notice that we now have just the data point, not the Column1 header.

Converting the Query to a Function

Now, we’ve got a neat little function that will let me take a data point, sanitize it, and turn it into data point with no leading values.  But how can I repurpose that to use it for every record?  The answer is to turn this query into a custom function, as we describe in  Chapter 22 of M is for Data Monkey.  Here’s how we do it:

  • Go to View –> Advanced Editor
  • Right before the “let” line, add the following:

(Source) =>

  • Go and place two / characters in front of the current Source line in order to comment it out (otherwise it would overwrite the function input)

//Source = “1.07Kg”,

  • Click Done
  • Rename the query to fxRemoveNumbers

That’s it.  We’ve converted it to a function.  Now you can go to Home –> Close & Load to save it and it’s ready for use.  The interesting part here is that creating the logic is the hard part, converting it to a function is deadly easy.

Separate Values and Text

So now let’s use our new function to separate values and text.  Here’s how I did this:

  • Select any cell in the table –> create a new query –> From Table
  • Go to Add Column –> Add Custom column
    • New column name:  Measure
    • Column formula:  fxRemoveNumbers([Quantity])

And we’ve got a nice new column with just the textual values.

image

Not bad, now we just need to figure out a way to replace the matching text in the Quantity column with nothing…  After checking MSDN’s Power Query formula guide, I found a formula called Text.Replace() that seems it should do just that:

  • Go to Add Column –> Add Custom column
    • New column name:  Value
    • Column formula:  =Text.Replace([Quantity],[Measure],"")

 

 

 

To summarize here, we’re going to look at what is in the Quantity column and replace any instance of the text in the Measure column with the value between the two sets of quotes (i.e. nothing.)  The results are shown below:

image

Now it’s just a simple matter of doing some cleanup:

  • Right click the Value column –> Change Type –> Decimal Number
  • Right click the Quantity column –> Remove

image

And there you go.  It’s finished.  We simply need to go to Home –> Close & Load to commit it, and then refresh it any time we need it.

M is for Data Monkey

The book is now available and is packed with good information that will help you solve this issue as well as many others.  Learn more about the book here.

Suggestion to Improve the Pivot Table Experience

This is a special post to to discuss a suggestion to improve the Pivot Table experience, especially for Power Pivot users.

This week I’m at the 2015 MVP Summit in Redmond, WA.  It’s a trip I’m lucky enough to make every year, and certainly one of the annual events that I look forward to the most.  It’s a chance to reunite with my friends in the global community of Excel experts, as  well as make some new friends there too.  In addition, we get the opportunity to meet with the Microsoft Excel engineers, give our feedback, and talk about the things that are/aren’t working in the program.

Of course, this doesn’t mean that they can or will implement the suggestions we have.  Excel is a massive program, and every feature change can cause bigger issues elsewhere.  But they do listen, and they do want this product to be the best it can be.  Like every company, they have to work out what they can afford to do, and where the best investments are for their limit of resources.

In the spirit of the summit, I thought I’d share one of the ideas I have that I think would be really beneficial to Power Pivot users.  Maybe it makes the radar, maybe it doesn’t, but I think it would be a really useful change.  I’m fairly certain it could also be implemented without causing any issues with other features in the product as well.

The Issue

For those working with Power Pivot, you know the power of DAX.  This leads to creating many different DAX measures, each of which are landed in the columns of the Pivot Table.  This is awesome, but it brings up a challenge with the usability of the Pivot Table field list:

image

Back when we just dropped singular fields into the Values area, things weren’t so bad.  I generally only ran with a few fields, and I didn’t feel super constrained by the size of the window.  Yes, I overran the limit on occasion, but it wasn’t a big deal.

With Power Pivot, things have changed.  I have so much more flexibility to write the DAX measures I need, which leads to many more columns being defined.  If you think about things like forecasting an annual cash flow statement, I’ll write at least 13 different measures (one for each month), plus a total.  And that’s just one scenario.  For a regular financial statement the same thing… Actual, Budget, Variances, Year to date Actuals, Year to date Budgets, and so on.  Again, it’s not uncommon to see a statement with over 12 columns.

This proliferation of measures leads us to the issue… the Values are of the Pivot Table field list is too small today.  It only holds 3-4 visible columns at a time.  Trying to move a measure into the right place is a real pain, especially if you add a new measure to the bottom, and you have to drag it up.  I’m sure you’ve had massive “overscroll” problems where the thing seems to speed up to mach 5 JUST as you are trying to move it up that one last row…

The Slightly Better View

The Pivot Table field list has an alternate view called “Field Section and Areas Section Side-By-Side”.

SNAGHTMLec87838

 

This is a bit better, as we can at least see more fields in the area on the left.  But that’s only helpful for scrolling and finding the fields we need, not placing them on the Pivot:

image

You see?  I’ve still only got three rows showing (four when my Excel is maximized on screen.)

But here’s the thing…

When I’m building my Pivot, I rarely end up putting anything in the Filters area, as I tend to use Slicers.  I might have a few fields in there that I don’t want users messing with (I hide the top rows of the Pivot Table), but generally I’m looking at between zero and two fields in there.

And when I build my Rows and Columns, I tend to drag them on the Pivot and call it a day.  I could use more space on occasion when I’m layering on my Row fields, but Columns are usually sufficient.  Especially now that I’m writing DAX formulas.  The measure gets dragged in to the Values area, and doesn’t need anything in the Columns area at all.  It’s partly for this reason that the small size of the Values area is killing me.  The old logic for how the Pivot was build has essentially changed, with the description moving from the Columns area to the Values area.

What that means is that I’ve got a ton of wasted whitespace in my Filters and Columns area.  So why not reclaim that whitespace?

Suggestion to Improve the Pivot Table Experience

So here’s my suggestion to improve the Pivot Table experience: modify the “Field Section and Areas Section Side-By-Side” view as follows (excuse the rough mockup…)

SNAGHTMLed7c5f1

The key changes here are really about the arrows to the right of the Filters, Rows, Columns and Values areas.  These are the same arrows as used in the Field List on the left, where the white arrow pointing to the right shows the area collapsed, and the black arrow shows the area expanded.

To be clear, the proportions aren’t correct here, but my thought is that the expanded areas consume an equal share of the remaining whitespace.  So if all four areas are expanded, they each get a 25% share of the remaining space, as it what we see in the current implementation.

But collapse one field (let’s say Filters), and each remaining area expands, as it now gets a 33% share of the remaining space.  Collapse two (as I’ve shown above), and the remaining two get 50% each.  Collapse three, and all remaining whitespace goes to the final area:

image

This would be fantastic, as it would let me build my Pivot much more easily.  I’d be able to see what I’m working with, especially on Pivot Tables with higher levels of Row or Values fields.

I didn’t scope this in, but it would also probably be a good idea to append a number in parenthesis to each area as well, indicating how many fields exist in each area.  So in this case: image

Naturally, when you’re first building a Pivot, it should open with all areas expanded to 25% of the share… but bonus points if there is a way to save the default view for a configured Pivot.  The reason that I say this is that my guess is that 75% of the time when I’m modifying a Pivot it’s the Values area I’m doing, 20% is Rows, 4% is Columns and the remaining 1% of the time I’m modifying Filters. Respecting that others have different uses though, the ability to choose which fields are expanded/collapsed by default on an already existing pivot would be incredible.

At any rate, that’s my idea.  Here’s hoping a program manager on the Excel team thinks there’s merit to it and starts to look at the feasibility.  Feel free to share your thoughts on the subject below.  🙂

If you like this idea...

Please throw it some votes at Excel UserVoice.  The more votes it gets there, the more likely it will be implemented!