Adding Try Results is Trying

I was playing around with a scenario this morning where I was adding try results together in order to count how many columns were filled with information.  What I needed to do kind of surprised me a little bit.

The Goal

There's a hundred ways to do this, but I was trying to write a formula that returns the total count of options someone has subscribed to based on the following data:

image

To return this:

Adding try results - Attempt 1

So I figured I'd try to add the position of the first characters (which should always be one) together.  If there is a null value, it won't have a character, so will need the try statement to provide a 0 instead.  I cooked up the following formula to return a 1 or an error:

= Text.PositionOf([Golf Option 1],Text.Middle([Golf Option 1],1,1))

And then, to replace the error with a 0, modified it to this:

= try Text.PositionOf([Golf Option 1],Text.Middle([Golf Option 1],1,1)) otherwise 0
+
try Text.PositionOf([Golf Option 2],Text.Middle([Golf Option 2],1,1)) otherwise 0

But when I tried to commit it, I got this feedback:

image

Adding try results - Attempt 2

Now I've seen this kind of weirdness before, so I knew what do do here.  You wrap the final try clause in parenthesis like this:

= try Text.PositionOf([Golf Option 1],Text.Middle([Golf Option 1],1,1)) otherwise 0
+

(
try Text.PositionOf([Golf Option 2],Text.Middle([Golf Option 2],1,1)) otherwise 0)

At least now the formula compiles.  But the results weren't exactly what I expected…

image

So why am I getting 1 where there should plainly be a result of 2 for the highlighted records?

Adding try results - The fix

Just on a whim, I decided to wrap BOTH try clauses in parenthesis, like this:

= (try Text.PositionOf([Golf Option 1],Text.Middle([Golf Option 1],1,1)) otherwise 0)
+
(try Text.PositionOf([Golf Option 2],Text.Middle([Golf Option 2],1,1)) otherwise 0)

And the results are what I need:

image

So why?

I thought this was pretty weird, but looking back at it in retrospect, it is following the correct order of operations.  The original formula I wrote was "otherwise 0 + …".  So in truth, the entire second try statement was only getting evaluated if no Golf Option 1 was present.

I guess writing formulas is hard in any language!

The Each Keyword in Power Query

This post is a guest article from Marcus Croucher, a finance professional based in Auckland, New Zealand. Marcus instantly fell in love with Power Query after seeing how it can easily transform data in ways Excel finds difficult, and how it can automate repetitive workflows.

I have been using Power Query in a professional capacity for a number of years, but have never fully understood exactly how the each keyword works in Power Query. I did some research around two years ago, but the documentation at the time was quite sparse (and still isn't great) and I did not have enough knowledge about wider programming to fully understand it.

In the meantime, I was looking for a way to use Power Query-like technology on OSX. I ended up learning a lot of Python, which has several libraries that have similar functionality to Power Query (albeit without the amazing graphical interface of Power Query.) Two notable examples are pandas (very popular data analysis library), and petl (a more light-weight and easy to use data processing toolkit).

This general programming knowledge gave me the background to understand some of the underlying concepts behind each, which I will now proceed to attempt to convey to you, the general intermediate to advanced Excel user, so that you can understand and wield the each keyword with confidence. I will use Python as a parallel and will link to some Python articles that expand on the underlying concepts I am trying to explain.

A Deep Dive into How the Each Keyword Works

In this article, I assume that you are familiar with Power Query and how to use it on a practical level. I also assume that you have some (limited) experience with the advanced editor and have seen the underlying M code, and understand the underlying data structures and how to access them.

  • Table
  • List (columns) – notated as {item one, item two, etc.} and accessed by [column header]
  • Record (rows) – notated as [category1: data1, category2: data2, etc] and accessed by {row number}

I suggest going ahead and pasting the code snippets into a blank query in Power Query to see for yourself what is really going on.

When Might You Use the Each Keyword?

The each keyword in Power Query is used by quite a few of the built-in operations. For example, filtering a table based on a certain column features each here:

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

filter_to_score_5 = Table.SelectRows(sample_table,

each ([Score] = 5))

in

filter_to_score_5

Figure 1

Using the Each keyword to filter a table based on a certain column

The purpose is quite clear: we want to keep each row where the value in the Score column is equal to 5. However, how does this work? [Score] looks like it is referring to a full column – how can we do a full column equal to a single value? Can I access values from the row above or below the row we want to keep/discard?

Another example of the each keyword in Power Query is creating custom columns. Here, I create a custom column which is the combination of the index and the name columns:

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

add_ordered_name_column = Table.AddColumn(

sample_table, "ordered_name",
each Number.ToText([Index]) & " " & [Person])

in

add_ordered_name_column

Figure 2

Using the Each keyword to create a custom column which is the combination of the index and the name columns.

Similar questions apply. It is easy to follow the logic, but how it works and the limitations behind it are somewhat of a mystery.

So What is the Each Keyword in Power Query?

The current documentation has this to say about the keyword:

Each Keyword

The each keyword is used to easily create simple functions. “each ...” is syntactic sugar for a function signature that takes the _ parameter “(_) => ...”

 

Each is useful when combined with the lookup operator, which is applied by default to _ For example, each [CustomerID] is the same as each [CustomerID], which is the same as () => _[CustomerID]

Still not very clear (unless you have a background in functional programming languages). It would be really nice to get a full understanding, as the each keyword in Power Query is used in a number of places (filtering, custom columns, etc.) and understanding it would give us an understanding of what we can and can't do with it.

Turns out, you need to understand three things to understand each:

  1. Functions as first class objects
  2. "_" as a temporary variable (and the associated shortcuts within M language)
  3. Anonymous functions (each)

Functions as First Class Objects

If you are used to using Excel, you are used to functions (e.g., =SUM() ) being a bit magical. We cannot touch or modify them, and they are supplied fully formed by Excel. You can create custom functions with VBA, but these are far and few between, and still seem like the lessor cousin to the in-built functions.

In Power Query, functions can be thought of just another "object" – or just another type of data. This means they can be:

  • Assigned to a variable and/or renamed.
    • Just like we can do something like variable = 5, in Power Query, so we can do something like variable = function.
  • Fairly easily created
  • Used as a parameter to another function (!)

To understand this, we need to distinguish between calling a function (using it in our code) and referring to it (to name it or to use it within another function). Similar to other programming languages, to call a function we use the parentheses at the end like:

function()

If we want to refer to a function we just omit the parentheses like:

function

Let's demonstrate the renaming/reassigning of functions. First I take one of the supplied functions which takes a list (i.e., column) and calculates the sum. Next, I build a sample table, and then take a sum of one of the columns using the function that I had defined at the beginning.

let

sum_column = List.Sum,

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

sum = sum_column(sample_table[Score])

in

sum

Figure 3

Calling a function within another function.

It works!

Let's create a basic function

In Power Query, the syntax to create a function is:

(variable) => body of function

The body of the function is like any other query that returns a value, but instead of being delivered to the main Power Query interface it is delivered to whatever called it.

We also want to assign a name to the function, so let's expand the above template. Here, we create a simple function that takes a number and multiplies it by two.

multiply_by_two = (number) =>

let

output = number * 2

in

output

Looking good, but now we want to use this snippet in a full query. Let's build a (trivial) query that uses this function to multiply a variable by two and output the result:

let

multiply_by_two = (number) =>

let

output = number * 2

in

output,

x = 5,
y = multiply_by_two(x)

in

y

Figure 4

Building a simple query that uses the function we created.

Functions as inputs to other functions

Now that we have explored how functions are the same as any other data type or variable, and we have demonstrated how we can create our own functions, let's look at functions that take other functions as inputs. One example from the official documentation is the filtering function, Table.SelectRows.

About

Returns a table containing only the rows that match a condition.

 

Table.SelectRows(table as table, condition as function) as table

So the function expects a table (makes sense), and a function! How does this work? According to the documentation, the condition is "the condition to match".

It's not very well documented, but it turns out that this function expects a function to be supplied. It then applies that function to each row (record) of the table, and expects a true or false response from the function. It then uses this response to decide whether to keep or discard the row.

To recap, a record is a data type representing a row. We can access the items from a record by supplying the column name as follows: record[column name].

Let's create a function which we can then supply to Table.SelectRows on our sample data. Note – this is our case study example which I will develop throughout this article.

let

filterer_score_two_plus = (record) =>

let

value = record[Score],
result = value >= 2

in

result,

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

filtered = Table.SelectRows(

sample_table,
filterer_score_two_plus)

in

filtered

Figure 5

Creating the sample_table function.

Figure 6

Using the Table.SelectRows function to filter the table created by the sample_table function.

What have I done here? First I have created a function which takes a record, extracts the value in the Score column and returns true if it is greater or equal to two. I then construct my sample table and apply the Table.SelectRows function on it, supplying my recently constructed function as the second input. As we can see, the output as expected provides a table with all rows with scores greater or equal to two.

Now, only if there was a quicker and easier way to create such functions, as it looks like we might have to build these one-use functions quite a lot...

"_" as a Temporary Variable

The use of _ as a throw-away variable is common across several programming languages, Python included (see point 4 here). Usually, _ is used to name things that are not going to be used again and so it is not worth using up another name.

Here I write a query creating a table, assigning it to a variable called _. Power Query has no problem whatsoever using _ in this way.

let

_ = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"})

in

_

For Python, using _ is just a convention, but it appears that Power Query has expanded the functionality here. Let's say we just want the column of names from the above table. Usually we can do this by selecting the column by name using [column_name] as the selector.

let

_ = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"})

in

_[Person]

It turns out, we can omit the _ in this statement as Power Query will infer that if we just put [Person], the table we are referring to is the one called _. The example below works just as well as the one above:

let

_ = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"})

in

[Person]

Figure 7

If we omit the _ variable in this statement and just use [Person], Power Query infers that the table we are referring to is the one called _.

I wouldn't recommend this as general practice, as it is not well documented or understood behaviour and explicit is usually better than implicit. However, it does provide nice-looking code when used with the each keyword.

Note that this technique only works for column selections [column_name] rather than row selections {row_number}, as Power Query will interpret {row_number} as a new list.

Applying the _ Variable

With this concept in place, let's revise our filtering query defined above:

let

filterer_score_two_plus = (_) =>

let

result = [Score] >= 2

in

result,

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

 

filtered = Table.SelectRows(sample_table,

filterer_score_two_plus)

in

filtered

The shortest we can actually get this is pretty close to our final stage. Let's put the function definition right into the Table.SelectRows function, and get rid of the let and the in (only really needed if there are multiple steps in the calculation):

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

 

filtered = Table.SelectRows(sample_table,

(_) => [Score] >= 2)

in

filtered

This is already pretty tight and clean. However, the (_) => is pretty scary if you're not familiar with function definitions, and the function definition symbol "=>" is fairly similar to our greater than symbol ">=".

Anonymous Functions

As we saw above, we end up creating one-off functions to supply to other functions quite frequently in Power Query. It seems silly to go through all of the syntax of creating and naming a function if it won't be used again. There is a concept called anonymous functions, which are functions that are defined but not named. They are used as soon as they are created. In Python, these are known as lambda functions.

We can actually use the each keyword in Power Query to define the function. (Yes, we are finally at the each keyword itself!) Each just minimizes the syntax for creating functions, by providing a default input variable name, "_", and removing the need for the => or anything else. So:

(_) =>

let

result = [Score] >= 2

in

result

 

can become:

each [Score] >= 2

You can still name this if you like (filterer = each [Score] >= 2), but using the each keyword in Power Query is much more useful if we use it inline. So we come to our final query, which should look fairly familiar to intermediate Power Query users:

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

filtered = Table.SelectRows(sample_table,

each [Score] >= 2)

in

filtered

 

So What have We Discovered?

Now we have demystified the magic of the each keyword in Power Query, but what have we discovered?

  • Each itself doesn't actually do that much, slightly simplifying our function definitions.
  • Once you understand the concept of supplying functions as inputs to functions, everything becomes a lot clearer.
  • The missing piece of the puzzle comes from an understanding of the special _ variable name, which enables us to take a shortcut when selecting columns from tables or records (we can use [column name] instead of _[column name]).
  • The use of each hinges on the behaviour of the underlying function. Because we know that Table.SelectRows calls the supplied function on each record of a table and expects a true/false response, we can construct a function that works as we expect it to. Unfortunately, this is not very well documented.

How can we use this information? I can think of a few different ways:

  • If we have a complicated add column or filtering step to do, we can separate out the underlying logic into a separate function which can have multiple steps. This removes the complexity from the main body of the code and abstracts it away. It can be easier to read filter_to_current_year_red_cars rather than try to interpret all of the various equivalence statements as you read through the code.
  • Now that we understand the context that is delivered (only the current record/row), we can construct functions that can do more interesting things (these usually require an index column). For example, we can filter a table based on a second table, or add a new column that subtracts the current row from the previous row.

Here's an example that:

  1. Abstracts the logic to a helper function, and
  2. Adds a column based on the difference for each row from the previous row.

let

row_difference = each

if [raw_index] = 0
then [Score]
else [Score] - add_index[Score]{[raw_index] - 1},

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

add_index = Table.AddIndexColumn(sample_table,

"raw_index", 0, 1),

add_difference_column = Table.AddColumn(

add_index, "difference",

row_difference)

in

add_difference_column

Figure 8

Adding a column based on the difference for each row from the previous row

So does this help your understanding of the each keyword in Power Query? Has this given you some new ideas on how to structure your queries? Please let me know if you have any questions!