Years ago I published an article on my site called Five Very Useful Functions For Working With Text. This post is alternative version, but with a twist. I bring you 5 Very Useful Text Formulas - Power Query Edition.
The article is geared to explaining five functions specific to working with Text in Excel, and are a set of the most under-utilized functions in Excel (in my opinion). It even lets you try them out live in the web page without even having to open Excel at all.
Now, over the past few months I’ve been working with Power Query, and one of the things that’s been driving me a little crazy is that the formula names in Power Query and not the same as they are in Excel. I personally think this is a bit of a mistake, and that the formula names in Power Query should have been a bit more congruent with standard Excel formulas (Power Pivot’s DAX functions are similarly named to Excel, so why not Power Query’s M language?)
Some critical background
Before you bang your head against the wall, there are two things that are really different between Excel formulas and Power Query formulas:
While Excel formulas are not case sensitive, Power Query formulas are. If the Power Query formula signature says “Text.Start” then “TEXT.START” or “text.start” will NOT work for you.
Base 1 vs Base 0
Excel formulas are what we refer to as “Base 1”. This means that you count starting at 1. Power Query, on the other hand starts counting at 0, not 1. The implications of this are that it is very easy to write your formula referring to a number that is out by 1. To see the effects of this, check the section below on the alternate for the MID function equivalent.
Excel –> Power Query Formula Equivalents
Let’s take a look at how the five functions I illustrated in that original example differ from Excel to Power Query…
To get the left x characters, we basically replace LEFT with Text.Start:
Easy enough once you recognize it, although I would have preferred that the formula name was consistent.
To get the right x characters we have a similar situation. The function name needs to change from RIGHT to Text.End:
|Power Query||=Text.End(text, num_chars)||=Text.End(“Excel”,2)||el|
Okay, so we’re getting the hang of this now… Just change the function name and the rest work the same, right? Um, no.
This one gets a bit weird. First we replace MID with Text.Range. Okay, no problem there. But look at the results when we pass the same parameters:
|Power Query||=Text.Range(text, start,num_chars)||=Text.Range(“Excel”,2,2)||ce|
They differ a little, don’t they? The issue comes down to that base 0 vs base 1 thing I mentioned above. Where Excel’s formula language counts the word with E being character 1, Power Query considers that character 0. So in this case, when we tell Power Query to start returning text at character 2, it pulls back c (E is 0, x is 1, c is 2). Interestingly though, the last parameter needs to be 2 to pull back 2 characters.
Getting the length of a text string in Power Query is actually a bit more intuitive than Excel’s native function, only because the function name isn’t trimmed off. Text.Length is what we need instead of LEN.
Notice that the result for this does return five characters, as you’d expect. So this plainly works as a 1 based result in both Excel and Power Query.
And finally we come to the FIND function. This one is again a bit confusing. We’ve got 3 things to consider here:
- The function name changes from FIND to Text.PositionOf
- The parameters for the text we want to find and the text we want to search in get flipped around!
- The result is 0 based, not 1 based
|Power Query||=Text.PositionOf(text, find_text)||=Text.PositionOf(“Excel”,”xc”)||1|
So in the case of FIND, we put the “xc” first, and “Excel” second. But in the Power Query version, it’s completely opposite. And look at that result… in Excel the x is treated as the 2nd character. In Power Query it is too, but because it starts counting at 0 we get a 1.
I find that even after using Power Query for a while now, I still have to look up the formula names from the Power Query formula categories page, both to find the formula name, and also the syntax. Part of this is due to the fact that there is no auto-complete/syntax help in the Power Query engine (I’m sure it will come eventually), but part is also that my instinct is to type the Excel function name first. And then, when I do get it right I’m constantly getting tripped up by the base 0 base 1 conversion.
As this is a tool aimed at Excel users, I am a bit disappointed in the formula naming convention. I could get used to pre-pending Text. to all of my functions, but I really wish the rest was similar to Excel. Maybe one day the PQ team can give us duplicate handles into the same back end function so that we can write stuff like this IN ADDITION to what already exists (don’t deprecate, just give us alternate routes):
Or how about:
That would be good too, especially if the signature could be tweaked to require a one based parameter for the start character. That would be consistent with what the Excel pro would expect due to their experience with the MID function.
And how about this:
How much easier would that be to learn if the function not only accepted parameters in the same order as the Excel function, but returned a 1 based result rather than a zero based result (again, similar to the way the current FIND function returns.)
Despite what I’m suggesting here, it does need to be recognized that there is a way to do the job, which is critical. I just think it could be better designed in order to make it easier for the seasoned Excel pro to learn because they’d be able to port their existing knowledge without having to learn a totally new syntax.