Power Query’s Extract Text Feature

This will be a short post, as today we are leading our second sold out Power Query workshop at http://powerquery.training/course.  I wanted to make sure I still got something out for my readers today though.  This time I’m looking at a feature that was added in the August Power Query Update: Extract Text.

You can find these commands on both the Transform and the Add Column tabs, with the former just converting your selected column, and the latter creating a new column of results while preserving the original column.

SNAGHTML196640f7

Here’s what’s interesting to me about these functions:

They replicate the LEN(), LEFT() and RIGHT() functions, saving you having to build them manually as I discuss here.  This is handy, and pretty seamless.

Here, you’ve got a table of words, and the second column is generated using Excel’s LEFT() function.  The final column was generated by:

  • Selecting the Word column –> Add Column –> Extract –> First Characters –> 4

image

Nice and consistent with Excel’s LEFT() function.

Likewise, Last Characters replicates the RIGHT() function by going to:

  • Selecting the Word column –> Add Column –> Extract –> Last Characters –> 4

image

And Length replicates the LEN() function:

  • Selecting the Word column –> Add Column –> Extract –> Length

image

The Range function is a user interface implementation of what should be equivalent to the MID() function.  In this case, however, it’s still has the following issues for Excel pros:

  • It is Base 0, meaning that you want to start at the 3rd character of the text string, you need to specify that you want to start at character 2 (Power Query starts counting at 0, not 1)
  • If you provide a value for the “number of characters to return” that is larger than the total number of characters – the starting character, you’ll get an error.  (Unlike the MID function)

So when you try to use Range in place of MID as follows:

  • Selecting the Word column –> Add Column –> Extract –> Range
  • Starting Number: 5
  • Number of Characters: 4

You get this:

image

Ugh.  And correcting to subtract one from the starting index, you get this:

image

Better, but still errors.

Honestly, I was hoping the user interface implementation would solve those issues building the more complicated code shown in my blog post on the subject.

So, at the end of the day, it’s awesome, but still doesn’t offer full “Excel parity”.  And if you want that, you’ll need to learn to work with formulas in Power Query.

The good news?  We teach how to do that in our Power Query workshop.  In addition, we’ve just announced a new registration intake.  If you’re interested in learning how to master Power Query, check it out at http://powerquery.training/course.

Update: Refresh Power Queries With VBA

Some time back I posted a routine to refresh Power Queries with VBA, allowing you to refresh all Power Queries in a workbook.

Things Changing…

The challenge with that specific macro is that it refers to the connection name, which may not always start with Power Query.  (As we move closer to Excel 2016, we are now aware that these queries will NOT have names starting with Power Query.)

I have seen an approach where someone modified the code I provided earlier to include a variable, allowing you to easily change the prefixed text from Power Query to something else.  While that works, it’s still not ideal, as names can be volatile.  It makes more sense to try and find a method that is cannot be broken by a simple name change.

Refresh Power Queries With VBA

The macro below is a more targeted approach that checks the source of the query.  If it is a connection that is built against Power Query engine (Microsoft.Mashup), then it will refresh the connection, otherwise it will ignore it.

Public Sub UpdatePowerQueriesOnly()
Dim lTest As Long, cn As WorkbookConnection
On Error Resume Next
For Each cn In ThisWorkbook.Connections
lTest = InStr(1, cn.OLEDBConnection.Connection, "Provider=Microsoft.Mashup.OleDb.1")
If Err.Number <> 0 Then
Err.Clear
Exit For
End If
If lTest > 0 Then cn.Refresh
Next cn
End Sub

This macro works with Excel 2010, 2013 and 2016, and should survive longer than the original version I posted.

Remove All Rows Up To A Specific Value

A couple of weeks ago, Rudi asked how you would go about setting up a query to remove all rows up to a specific value.  Specifically, his question was this:

The other day I was asked if Power Query could delete all top rows up to a found value. I could not find a solution and its been a burning question till now.
For example: If I import a csv file containing columnar info, but the headings for the list are in different rows for each import. I know that the first heading in column A is called "ID Number", but each import has this heading in a different row.
How do I determine an applied step to delete all rows above "ID Number". I cannot use the delete top rows as its not always 5 rows, some import the headings start in row 10, others in row 3...but the label I am looking for is always "ID Number".

While the question was answered in the initial post, I still though it would be interesting to do a full post on this for others who might need to create similar functionality.

The Data Set

I didn’t have Rudi’s exact data, so I knocked up a little sample with an ID Number and Amount column starting in row 3, which you can download here.

The components I was after here was the ID Number header and an extra column of some kind.  In addition, I wanted to have some garbage data above, as I didn’t want to give the impression we can just filter out blank rows.

Now, I assume this data is loaded from an external file, but it doesn’t really matter, I’ll just load this from a range, as it’s just a data source.  The key is that the header row is not the first row.  So I defined a new range to cover the data

  • Select A1:B7 –> Name Box –> Data

I can then select the name from the Name box to select my data:

SNAGHTML1549a4a4

I then loaded it into Power Query by creating a new query –> From Table

Determine the Header Row

This is the first job.  In order to remove any rows above the header row, we need to know which row the header resides.  To do that, we need to add an index column, and filter to the specific header that we’re after.

  • Add Column –> Index Column –> From 0
  • Filter Column1 –> only select ID Number

This results in a single row of data, and conveniently has the Index number of the ID Number row listed.  In this case it’s 2.

Call up the Original Table

We’ll park that value for a moment, and call up the original table.  To do that, we click on the fx button beside the formula bar to create a new step, then replace the formula with =Source.

image

Remove All Rows Up To A Specific Value

Now comes the tricky part.

We cant’ remove all rows up to a specific value immediately.  We need to insert a step that removes the top 2 rows first, then modify it.  So let’s do that:

Home –> Remove Rows –> Remove Top Rows –> 2

This gives us the following table and formula:

image

The key is to understand this formula first.

  • The Table.Skip() function removes (actually skips importing) the first x rows
  • Custom1 is the name of our previous step
  • 2 is the number of rows

So what we really need to get is the number of rows.  We can extract that from the Filtered Rows step like this:

#”Filtered Rows”[Index]{0}

Where:

  • #”Filtered Rows” is the name of the step in the Applied Steps window
  • [Index] is the column we want to look at in that step
  • {0} indicates the first row of that step (since Power Query starts counting at 0

So let’s modify the function to:

= Table.Skip(Custom1,#"Filtered Rows"[Index]{0})

As you can see, it works nicely:

SNAGHTML1559cc91

In fact, we can go even better than this.  Why are we referring to Custom1 at all?  Doesn’t that just refer to Source anyway?  Let’s also replace Custom1 with Source, and remove the Custom1 step from our query:

SNAGHTML155cd378

Cleanup

We can now promote our header rows, remove the unnecessary Changed Type step and set our data types correctly:

SNAGHTML155f21ae

Testing

If you try inserting new rows at the top of the data range, then refreshing the completed query… it just works!  The output table will never end up with extra rows at the top, as we’re filtering for to start at the ID Number row.

SNAGHTML156196d4

Using M

If you learn to write your own M code from scratch, you can start combining lines as well, skipping the next to explicitly filter the primary table.  Bill S provided code in answer to Rudi’s question (adapted to this workbook), which is shown below:

let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1),
Skip = Table.Skip(Source, Table.SelectRows(AddIndex, each ([Column1] = "ID Number")){0}[Index]),
PromoteHeaders = Table.PromoteHeaders(Skip)
in
PromoteHeaders

It essentially replicates the same steps as the query I built via the user interface, only he combined the Filtered Rows step into the Row Removal step.  To bonus here is that the code is a bit shorter.  The drawback is that it might not be quite so “de-buggable” to someone who isn’t as experienced with M.

Multiplying NULL values in Power Query

I was working through a scenario today and came up against something unexpected when multiplying NULL values in Power Query.

Background

I have a fairly simple table of transactions that looks like this:

image

And wanted to turn it into this:

image

Seems simple enough, but I ran into an odd problem.

Steps

Getting started was pretty easy:

  • Pull the data into Power Query using From Table
  • Remove the final column
  • Select the Bank Fee column –> Transform –> Standard –> Multiply –> –1

So far everything is good:

image

Then I tried to do the same to the Discount column.

Multiplying NULL values

At this point, something odd happened.  I did the same thing:

  • Select the Discount column –> Transform –> Standard –> Multiply –> –1

But instead of getting a NULL or 0 for John’s record, it gave me –1.  Huh?

image

This is honestly the last result I expected.  How can a NULL (or empty) cell be equivalent to 1?  I think I’d rather have an error than this.

Regardless, easy enough to fix, I just inserted a step before the multiplication step to replace null with 0:

image

Good stuff, so now just finish it off:

  • Right click the Customer column –> UnPivot Other columns

And all looks good…

image

… until I load it into the Excel table:

image

Seriously?  Negative zero?

To be honest, if I’m feeding a PivotTable with this anyway, I really don’t need the discount record for John.  To fix this I just went back to the Power Query and inserted another step right before the Unpivot portion when replaced 0 with null.  The result is now really what I was originally after:

image

End Thoughts

I can’t help but think that this behaviour has changed, as I actually tripped upon it refreshing a previous solution.  The killer is that the data has changed, but I’m pretty sure the old data set had these issues in it too, and it wasn’t a problem.

Regardless, I”m a little curious as to your opinions.  Is this expected behaviour?  Bug or feature?