Power Pivot is coming to all Office SKUs

Yes, you read that right… Power Pivot is coming to all Office SKUs and the rollout has already started for those on subscription versions of Office.

This is something I've been championing (along with many others) since Power Pivot was initially rolled in to Excel 2013.  The whole need for Pro Plus licensing, which was even initially only open to enterprise licensing, was a huge mistake in my opinion.

Today, the Excel team updated the Excel Uservoice request to make Power Pivot available in all SKU's of Excel to say "we're doing it".  Finally!  So yes, if you're running the Home and Student, or Business Premium plans, you'll finally have access to Power Pivot!

What is Power Pivot is coming to all Office SKUs?

How long will it take for Power Pivot to show up in your version of Excel?  That depends upon what you purchased…

Excel 2013/2016 non-subscription users

Unfortunately, you need to upgrade to a subscription version of Office or to Office 2019 (whenever it comes out).  They're not going to back port it to those versions as far as I'm aware.

Excel Subscription users

You have two options:

  1. Wait till it shows up.  Microsoft has said that they are already rolling this out to people on the April Current Channel (build 9330 and higher).  So depending on where you are in the cycle, it will just show up one day.
  2. Install an Insider Preview in order to jump the queue.  Keep in mind that this isn't for everyone as this is Beta software, but if you're interested you'll also get access to newer features like the new data types, Insights and more as well.

How do you know your Excel SKU, version and what channel you are on?

(Please note that Channel is only applicable to Subscription users)

Go to File --> About

Power Pivot is coming to all Office SKUs

How do you get on the Insider Channel?

It depends on the SKU of Office you purchased.

Consumer Office Versions (Office 365 Home, Personal, and University)

All the relevant info on opting in or out of the program can be found on Microsoft's website here.

Commercial Subscribers (Office 365 Business, Enterprise, or Education)

You'll need to go to this page.

Results: Top X with Ties Challenge

Last week I posted a Top X with Ties Challenge to see how our readers would approach this problem using Power Query in Excel or Power BI.  I'm really glad I did this, and am thinking that I need to do more of these kinds of posts in future.

Response for the Top X with Ties Challenge

I had come up with two different approaches to the Top X with Ties Challenge on my own, but it was interesting to see the different solutions posted by you all.  In total, we had 10 different files submitted from 9 people, some with multiple approaches to the problem.  I want to thank all of you for participating in this, and today's post will look at the different approaches which have been summarized in the attached workbook.  (Yes, the techniques will work in Power BI, I just use Excel to demo as it's easier to keep the data contained in the same file.)

In order to keep this to a manageable level, I had to group the solutions as well and to do that I focussed on the meat of the overall solution.  This meant that I left some of the slick tricks that some of you included if they didn't really impact the overall solution.

Just a quick summary of those:

  • Maxim rolled his solution into a custom function so that he could easily invoke it again.  One trick that he had in there was to dynamically create a dynamic "Top X" column header for the grouped column.  It was really cool, but it did kick me when I made my input dynamic and changed to 10 items, as my Changed Type step was trying to set the data type on the Top 5 column.  Winking smile
  • Sam used a slick trick to replace the values in the Item column with the new grouped names.  Also very cool, but to be consistent with the other solutions I wanted to keep the original column as well as add the grouped column.
  • Daniil's submission actually used a different ranking method than what the rest of the solutions (including mine) used.  I've also modified his to be consistent with the rest, although his solution has generated what will become a future blog post too.
  • Many of the submissions did things inline in a single query, referencing prior steps.  For the sake of transparency and explanation, I've broken most of these down into separate components so it's easier to follow.

One thing I did incorporate here was that I added some extra ties a bit lower down, suggested by Kevin.  This turned out to be super important, as it enabled me to validate that all the approaches were truly in sync (and fix them when they weren't). With the ability to change the "Top X" from a worksheet cell, we can now dynamically compare the results and see that they are the same.  So the revised table now looks like this:

image

Apart from those little nuances, everything (with some small mods) fell neatly into the buckets.  And all in all, I've got 5 different solutions to showcase that return the same result.

Laying the groundwork

Before jumping in to the techniques used to solve the Top X with Ties Challenge, I just want to lay out the groundwork I used to illustrate them all.

Groundwork - Dynamic portion

I wanted to be able to pull the Top X from a worksheet cell so that it was easy to update.  To do this I:

  • Created a named range called "rngKeep"
  • Entered the value of 5
  • Selected the cell and pulled it in to Power Query
  • Renamed the query "TopX"
  • Right clicked the whitespace beside my value --> Drill Down
  • Loaded the query as a Connection (not to table)

SNAGHTML1e541326

And with that, we've now got the framework to dynamically update our Top X via a worksheet cell.

Groundwork - Staging Query

The next piece I needed was a simple base query to use to demo all the different techniques.  To do this I:

  • Selected a cell in the data table (which I had renamed to "Sales")
  • Pulled the data into Power Query
  • Change the Data Types to Text and Whole Number
  • Sorted the rows in Descending Order based on the Sales column
  • Renamed the query to SourceTable
  • Loaded the query as a Connection

And that's it.  We're now ready to explore the solutions that came in. Smile

Top X with Ties Challenge - Solution 1 - Using Merges

We had a total of 3 submissions that used this method (including mine).  This solution (as illustrated) requires two queries (although they could be merged into one if you wanted to):

Creating the TopXItems query

To do this you just need to:

  • Reference the SourceTable
  • Keep the top 5 rows, then replace the 5 in the formula bar with "TopX" that it can update dynamically
  • Load it as a Connection only query (called TopXItems) for later use.

image

Creating the Grouping (via Merge) query

  • Reference the SourceTable again
  • Merge the Sales columns against the same column of the TopXItems query using a Left Outer join
  • Expand the Item field from the new column

SNAGHTML1e79c7bc

  • Add a new Conditional Column that replicates the following logic:
    • if [Item.1] = null then "Other" else [Item]
  • Remove the [Item.1] column
  • Select the Item and Sales columns (together) --> Remove Duplicates
  • Load the data to the destination

Relatively easy and it works well.  Just a quick note here about the duplicates part is that due to the join type you could end up duplicating values, so the removal is needed at the end.

Top X with Ties Challenge - Solution 2 - Using Grouping

Next up, we've got grouping as a method to work this out - a technique that was featured in 6 of the submitted examples in one form or another.  Funny enough, I didn't solve my issue this way, although I think it has to be a front runner for solving this issue.

To solve the issue using this method:

  • Add an Index column from 1
  • Go to Transform --> Group --> Advanced
  • Group by the Sales column
  • Create a column called Rank to aggregate the Min from Index
  • Create a column called Original to group All Rows

image

  • Expand the [Item] field from the Data column
  • Add a Custom Column to create the Group column using the following formula:
    • if [Rank] <= TopX then [Item] else "Other"

SNAGHTML1edaa73c

  • Remove the Rank column
  • Reorder the columns (if desired)
  • Load to the final destination

What I love about this is that it's super easy, 100% user interface driven and actually provides a lot of ability to play with the ranking.

I'll follow up on that last comment in a future post, including why I put the Index column where I did, and why it was part of the grouping (as I know some of you did this after grouping by Sales.)  We'll devote an entire post to those nuances as they are important and I've got 3 more methods to cover.

Top X with Ties Challenge - Solution 3 - Using Chris Webb's Rankfx Function

This one was an interesting one, as it uses a technique that Chris Webb first wrote about back in 2014 in his Power Query book.  (Note, this was not submitted by Chris!)

Here's how this query is built up:

  • Reference the SourceTable query
  • Click the fx in the Formula bar to get a new step and replace =Source with:
    • = (SalesValue) => Table.RowCount(Table.SelectRows(Source, each [Sales]>SalesValue)) + 1
  • Right click and rename the Custom1 to Rankfx
  • Click the fx in the formula bar again, and =Rankfx with =Source
  • Add a Custom Column using the following formula:
    • =Rankfx([Sales])

SNAGHTML1e87f9f4

  • Remove the Rank Column
  • Load to the final destination

I haven't checked with Chris, but I suspect that his technique precedes the existence of any UI to do grouping.  (I know it precedes the UI for merging, as that wasn't in place when we wrote M is for Data Monkey.)  Still, it works nicely to get the job done.

Top X with Ties Challenge - Solution 4 - Using List.Contains

While I wasn't the only one who drove to a List function to solve this issue, I was the only one that tried to do this by using a List.Contains function. My logic here was that list functions should be fast, and if I can check each line to see if it's included in the List, then I should be golden.

Creating the TopXList List

The first thing I needed was my list to examine.  This was pretty easy to accomplish:

  • Reference the SourceTable query
  • Right click the Sales column header --> Drill Down
  • List Tools --> Transform --> Keep Items --> Keep Top Items --> 5
  • Replace 5 in the formula bar with TopX

image

  • Rename to TopXList
  • Load as a Connection

Note that because the original column is called Sales, the list seems to inherit a name of Sales1.  I assume this is to disambiguate it in the code.

Grouping via List.Contains

Armed with my list, I could now figure out the grouping in a new query.  To do this:

  • Reference the SourceTable
  • Add a Custom Column called Group using the following formula:
    • if List.Contains(TopXList,[Sales]) then [Item] else "Other"

SNAGHTML1e950ea5

  • Load the query to the final destination

I have to be honest, it surprised me how easy this was to do.  Yes, I did need to do a quick bit of reading about the List.Contains function (since we don't have any Intellisense in Power Query yet), but overall, this was pretty slick to build.  Using list functions, this should be very quick to execute and - if it needs more speed - I could always buffer the list as well.  (With data this small it's irrelevant.)

The only thing I don't like about this one is that you do have to go and get jiggy with formulas.  And if you're not comfortable reading MSDN documents (which tends to be code centric and poorly illustrated for Excel people), then you could end up getting turned off by this.

Top X with Ties Challenge - Solution 5 - Using List Drilldown

The final solution I'm showcasing is the only other list function that was submitted.  This was submitted by Daniil and approaches the job a bit differently than I did.  It's still amazingly simple at the end though.  Broken down it has two parts:

  1. Identify the value representing the TopN value
  2. Run logic to change the description for items greater than the TopN value

Identifying the TopN item from the list

To get started, we can create the TopN as follows:

  • Reference the SourceTable query
  • Right click the Sales column --> Drill Down
  • Right click the row representing our max row --> Drill Down

image

  • Replace the value (which will be one less than the row number pictured above) with "TopX-1"

image

  • Rename the query to TopN
  • Load as a Connection only

The key to remember here is that Power Query always counts from 0, which is why we need to subtract 1 from out TopX value when we are drilling in to the table to retrieve that item.

Adding a Group based on the TopN

With the TopN identified, we can now run some conditional logic against our original table to group our data appropriately:

  • Reference the SourceTable
  • Add a Custom Column called Group with the following formula
    • =if [Sales] >= SaleN then [Item] else "Other"
  • Load the table to the destination

And that's it.  The only really tricky part in this is that you need to shift that TopX value by one when you're modifying the drill down.  But apart from that, this is just as easy to build as any other of the solutions.

Final thoughts on the Top X with Ties Challenge

Again, the first thing I want to say here is how much I appreciate everyone sending in their solutions to the Top X with Ties Challenge.  It's always awesome to see how others approach things, and you've got no idea what cool things you'd made me start thinking about.  Smile

The second thing, and it's kind of tied to the first, is how cool it is that we've got multiple approaches to do the same thing. I'm always fond of saying that there are at least 3 ways to do everything in Excel, and what's more is that I know what we've got here isn't an exhaustive list.  How cool is that?

Feel free to download the illustrated solution that I've described above.  Above the output table for each method is the first name for the people who submitted their solution (so you can see how I grouped you in on this).  And if you want to test out the refresh, here's what you need to do:

  • Change the value in cell F1

Visualizing the Top X with Ties Challenge

  • Go to Data --> Refresh All
  • Go to Data --> Refresh All (yes, a 2nd time… so it updates the PivotTable & PivotChart)

image

No matter which query results you're looking at, you'll see that they are identical.

Which will perform fastest?  Honestly, I didn't test this with more than 300,000 rows of data.  When I tested, there was no real difference in speed between the solutions.

Which method should we use?  Great question, and I don't have an answer.  The List function methods should be fast, but the grouping is VERY versatile and is going to be my recommended method going forward.  I know this is cruel, but in a couple of weeks I'm going to put up a post as to why that is.  Stay tuned for that one!

Let me know your thoughts and comments.  Did anyone else pick up any new techniques here?  Any challenges you'd like to see posted?

Come Master Your Data in Amsterdam

I can't believe that it's less than a month until I'll be at the Amsterdam Excel Summit.  We've got a full day of conference sessions on June 7, but the really cool part is that I get to teach Master Your Data with Power Query in Amsterdam for the first time.  I've taught this course in half a dozen countries around the world (so far), and saved people from thousands of hours of manual effort.  And now it's available to you in the EU!

What is Master Your Data?

Master Your Data with Power Query is my full day course on how to really get started with Power Query in Excel. Why is it so important? It's because it will help you  to get hours of your life back.  You spend 80-90% of your life PREPARING data for analysis.  This tool is all about getting those hours back so that you can analyze the data and add value to your company.

Power Query is a FREE add-in (from Microsoft) for Excel 2010/2013, and is so important that it is now built in to Excel 2016 and higher.

What is covered in Master Your Data?

We start with a quick PivotTable review (because that's the whole reason we need good data), then dive in to using Power Query with real world data.  We look at importing data from CSV, text and Excel files, and even entire folders full of files all at once.  And when we're done, we just update it for the next month by pressing the Refresh All button.  If you do any copying and pasting of data on a monthly basis, you're wasting your time today.

Why do you care?

The items above alone will save you a ton of time. Here's a couple of examples:

  • One of my clients saved 8 hours per week of manual labour using these techniques - that's 52 days per year she could work on other things!
  • At the Australia conference we were able to automate a workflow in 30 seconds, saving the attendee 6 hours per week - hours that she had been putting in on the weekend because she couldn't get it done at work.  30 seconds to get 6 hours of family time back!

Yes, there is a cost to your class.  And yes, you'll need to pay for your travel, meals and hotel.  And yes, that can look like a lot of money - let's say that all told it's 2,000 Euros...  Divide that by your hourly rate though, and see how many hours you need in order to break even on the training.

Let's be conservative and say it will save you 2 hours per week… how many weeks until you break even?  What if it were to save you 4 hours per week?  Or - like my client - 8 hours per week?  This is probably the most impactful thing you can do for your Excel career today.

But remember that the raw cost is only part of the equation here… what is your time really worth?  What opportunities are you forgoing because you're wrapped up in data preparation and not analysis?  What could you do for your company with another 2 or 4 hours per week?  That's where the real value comes in.

Is there more in Master Your Data?

Of course there is!

I'll show you 7 different ways to merge data tables (with no VLOOKUPS), which will allow you to identify the items you need to focus on reconciling (not focussing on the items that don't.)  I'll show you how to perform an approximate match in Power Query, as well as how many-to-many joins can be created and why you'll use them.

You'll learn how to convert data from one format to another - all refreshable with a button.  Pivoted data like this:

image

Stacked data like this:

image

And even subcategorized data like this:

image

Whether you need to feed a PivotTable, a chart or a Power Pivot Data Model, the steps shown in this course show you how to create refreshable transformations in seconds that you can update with a click next month.  How cool is that?

I'll show you how to deal with conditional logic patterns that can be built through user interface buttons like this:

image

As well as some that require a little more work and error handling, like this:

SNAGHTML395ab66

And - of course - more.  There's always stuff I include that is not in the outline.

Oh… and did I mention that this training is 100% transferable to Power BI as well?  Even if you're not in that space today, you're learning a skill that is 100% transferrable to other programs in the future.  It's double the bang for your buck!

Trust me when I say this… This toolset is the future of your data analysis systems - and the future is available to you today.

What do you get in Master Your Data?

You get:

  • A full day with one of the world's leaders in the Power Query technology (who also happens to be one of the most passionate presenters you'll ever see)
  • Copies of my slides to keep - many of which are data transformation recipes that you'll refer to again and again
  • Copies of both the before and after files
  • The chance to ask me questions about YOUR data as well
  • Hours of your life back (this is not a training expense - it's an INVESTMENT in your future!)

More details on Master Your Data

The full details on this class can be found here.  And you can register here.  I hope to see you join me so that we can super charge your data preparation process and free up your time for work that actually adds value!

Challenge – Top x with Ties

Yesterday I was working on a data set, and needed to work out the Top x with Ties in Power Query.  This posed to be a bit more challenging than I first thought it would be.

Why did I want Top x With Ties?

Let's take a look at the source data here, on a simplified data set.  That data looks (in part) like this:

SNAGHTML3e20433

I needed to chart it, and wanted it to look like this:

image

Now, appreciating you can't see all the data, the deal is this… I wanted to add a column to the data set that shows the item name for the top 5 items, and shows "Other" for all the rest.  This allows me to group things to show how the major sellers compare to the rest of the business.  Make sense so far?

The trick here is that I ended up with two item (Pint Winter Ale and Caesar) that have exactly the same value, which meant that I needed 6 categories in this case.  But Power Query doesn't have a native function to keep the Top x with Ties, it only has Keep Top Rows.

So how would you generate the Top x with Ties for this scenario?

Basically, what I need in order to serve up the chart correctly is this:

SNAGHTML3eff6f0

So here's the criteria… we want it to:

  • Automatically provide the Top x with Ties
  • Be easy to update to change the value of x (in case we want top 10)

Now you might think "Hey that's easy... I'll just sort it, add an Index column and then I can use a conditional column to choose anything over a certain number."  That's perfect if you're not worried about ties, but in this case we want the Top x WITH Ties.  So now what?  Can you filter to keep the Top x rows?  Nope, because again, that only keeps those rows, and not the ties.

Now I have two solutions for this already, but I'm curious how you'd approach this challenge.

Just to make this more fun…

Do NOT post your answer below.  (We don't want to spoil it for anyone.)

Instead, email your workbook to Rebekah at (you know) Excelguru dot ca.  We'll collect them and share the best ones (along with mine) next week.

You can download the source data here.  Let's see what you've got!  Smile

Wellington, NZ – Urgent Call For Help

As you’ve hopefully heard by now, we’re coming back for our annual Excel conference in Australia and New Zealand, with dates planned for Wellington on Apr 19 & 20. It’s hard to believe that it’s only a month away, but it’s time to make the difficult call about go/no-go decisions.  And while all the Australia dates are all locked in and going ahead, the reality is that Wellington is on the brink…

Unlock Excel Logo

We hope to be in Wellington on April 19 & 20, 2018

We need to hear from you

I want to be clear here, we REALLY want to come to Wellington.  The challenge we have is that we need to know people are going to show up to support the event.  It costs a lot to book venues, flights and the like, and right now we don’t have enough people to make this work.

And we have until 2PM New Zealand time TODAY (March 16, 2018) to decide if we have enough interest to make this viable.

Call to Action

We don’t need your completed registration just yet.  What we do need is enough people that INTEND to register for the Wellington edition.  If you can give us that, then we’ll take it on faith that you will register.  We get it – convincing the boss takes time.  We want to give you that time, so here’s how you can help us…

Fill out this survey if you want to attend the Wellington edition of the Unlock Excel conference.  All we need is a couple of pieces of information:

  • Do you intend to come?
  • What is the % chance you’ll be able to register?
  • Your name and email

We promise that we won’t add you to any email distribution lists, we just need to assure that you’re real.  The reality here is that we need to cover our costs.

Our Target?

Right now, we need an additional 30 people at a minimum to express their interest in attending the Wellington edition.  If we can get that, we’ll go ahead.  If we can’t…

What can you do?

There are two things you can do:

  1. Fill out the survey to indicate your interest
  2. Share this with anyone in the Wellington area who may be interested

We appreciate anything you can do to help us spread the word and make this conference a go.

Wait – Tell me more!

Sorry, you HAVEN’T heard of the Unlock Excel conference?

The conference is going to be awesome!  From charting and dashboarding, to VBA, to Power Query, Power Pivot and more, we will be exposing you to the fact that Excel isn’t just Excel any more.  With revolutionary changes to the product starting in Excel 2010, there is a VERY good chance that you’re still working the hard way, and we want to fix that for you.  We even have 1 on 1 sessions and group Q&A sessions where you can pose your questions directly to us.

Unlock Excel Speakers

Don't miss out on the chance to learn from an international group of Microsoft MVPs.

And if it’s professional development you’re after, it counts for 17.75 hours.

You can find out more about the Wellington event on the CPA Australia website: https://www.cpaaustralia.com.au/training-and-events/conferences/unlock-excel/wellington

Just don’t forget to fill out our survey. Or even better, register if you’re currently in a position to do so!

Return a Specific Day of the Next Month

In a comment on a previous post, a reader asked how you return a specific day of the next month from any given date.  In other words, I've got a date of March 5 and I want to use Power Query to return April 10 in Excel (or Power BI).  How do you do it?

The Excel User's First Guess

So my first thought was to jump straight into the Power Query Formula reference guide to review the date functions.  Surely there must be something in there to manipulate dates and such, right?

Here's a quick list of the the functions I knew I'd need:

  • Date.Year()
  • Date.Month()
  • Date.AddMonths()

So those are awesome for ripping dates apart and shifting them, but what I really needed at the end was a way to put things back together.  I needed an equivalent of Excel's =DATE(year,month,day) function.  I couldn't find one.

Return a Specific Day of the Next Month

After poking around with this for a while, it suddenly occurred to me that I was doing this all wrong.  To return a specific day of the next month, I just needed to provide the "literal" #date() and I was good to go.

Let's take a simple table like this:

image

I pulled it into Power Query, went to Add Column --> Custom Column, and added the following formula:

=#date(
Date.Year(Date.AddMonths([Dates],1)),
Date.Month(Date.AddMonths([Dates],1)),
10
)

And at that point it works beautifully:

image

Basically, the #date() literal works just like Excel's DATE() function, you just case it differently and put a # tag in front of it:

#date(year,month,day)

It's a weird one, for sure

Returning a specific day of the next month is one of those odd cases where you have to use one of Power Query's literals to create the date you want, rather than employing a function to convert values as you're used to in Excel.   The good news though?  Miguel does an amazing deep dive into the M coding language in our Power Query Academy, including explaining what literals, tokens, keywords and more are all about.

If you want to understand this in depth, check out our course:

image

PS:  Sign up for our free trial first, to make sure you like our style!  And when you're convinced… you won't find better Power Query training anywhere.  Smile

Unlock Excel Conference

Unlock Excel Returns to Australia and New Zealand

We're really excited that Ken is going to be heading back "down under" this April for CPA Australia's Unlock Excel conference. Spending two days in each of four different cities, Unlock Excel features sessions from a renowned group of Microsoft Most Valuable Professionals (MVPs). Based on how well-received the conference was last year, you don't want to miss out on this year's event!

Unlock Excel

The Unlock Excel conference will be coming to Melbourne, Sydney, Brisbane, and Wellington in April 2018.

Who is Unlock Excel For?

Unlock Excel is geared towards people who want to discover fresh and exciting ways to unlock the full potential of their data. While advancing your Excel skills, you will also learn how to become more efficient, explore new tools and applications, and streamline your day-to-day processes.

Who are the MVPs?

Microsoft chooses their MVPs annually based on their high level of public community contributions, but it's more than just quantity of materials that they share for free... they also have to be technical experts in their field.  One of the hallmarks of an MVP is their  continued dedication to discovering the best ways to use Excel and other Microsoft products, and another is their passion for sharing those techniques with the world.

Ken and the other presenters at Unlock Excel are passionate educators with a deep knowledge of Excel. Because of their experience bringing together diverse platforms, products, and solutions, they will share how they tackle real-world problems.

What Will I Learn?

The sessions at Unlock Excel will feature a variety of topics including charting, financial modelling, VBA, and the Power BI suite of tools. By attending, you will pick up valuable time-saving tips and tricks to take your current knowledge to the next level. Additionally, you will learn to manage your own complex data sets, thereby uncovering unique insights. As a result, you will add value to discussions and help drive effective decision making for your business.

When and Where Can I Attend?

This year's Unlock Excel conference will be visiting the following cities:

Take advantage of early bird pricing available only until March 13, 2018. Visit the CPA Australia site for more information and to register.

Extract Data Based on the Previous Row

This is a cool example of how to Extract Data Based on the Previous Row, which came up as a viewer's question inside our Power Query Academy.  Let's look at how we solved it…

What Kind of Data Needs This Treatment?

Here's a picture of the user's raw data after a little bit of cleanup:

SNAGHTML775ec10

So What's the Problem?

The challenge here is all about the two data points highlighted as A and B.  They are categories, and need to be extracted from this data… but how?  There is no common pattern between rows that we can look at to say "this is a category, but this is not."

But there is data on the row above.  Everywhere there is a "------" in the Quotes column, the next row has our Category in the Source column:

SNAGHTML778e971

But how do we get at it?  Power Query doesn't have any easy-to-use facility to refer to the prior row, so what do we do?

How to Extract Data Based on the Previous Row

There are actually a couple of ways to do this.  One is to write a formula that refers to the previous row, the other is to do this via a creative use of merging tables.  The previous row method is covered in M is For Data Monkey (page 185), so this time I'm going to focus on the latter.

Extract Data Based on the Previous Row - Setup

The first thing I did here is add two new columns to the data table above: an Index column starting from 0 and another Index column starting from 1.  To do this:

  • Go to Add Column --> Index Column -->  From 0
  • Go to Add Column --> Index Column -->  From 1

Pretty easy, and gives you this:

SNAGHTML77d9ea5

And at that point we call it a day and create this as a connection only query.  Here's what I did there:

  • Named the query: "Prelim" (but you can call it anything)
  • Went to Home --> Close & Load To… --> Only Create Connection

This gives me a query that I can call again when needed, without loading it to a worksheet or the Data Model.

Extract Data Based on the Previous Row - Completion

Next, I created a new query by right clicking the Prelim query in the Query Pane and choosing Merge.

I then did something really weird… I chose to Merge the query against itself… yes, seriously.  (I've always told people that it seems weird you can do this, but one day you'll need to… and today is the day!)

I configured the Join as follows:

  • Use the Prelim query for both the top and bottom tables
  • Chose to use the Index column on the top as the join key
  • Chose Index.1 on the bottom

Like this:

image

Then, once in the Power Query editor, I expanded just the Quotes column (without the column prefix), and removed the Index and Index.1 columns.  This left me in a pretty good place:

SNAGHTML786d430

With a pattern to exploit, this is now a simple matter of:

  • Creating a Conditional Column (Add Column --> Conditional Column) called "Category" with the following logic:
    • if [Quotes.1] = "-------" then [Source] else null
  • Right click the Category column --> Fill Up
  • Filter the [Quotes.1] column to remove all "------" values
  • Remove the [Quotes.1] column
  • Filter the [Quotes] column to remove all null and "------" values

And honestly, that's pretty much it.  To be fair, I also reordered the column and set the data types before the picture below was taken, but you get the idea.  At the end of the day, the data is totally useable for a PivotTable now!

image

  Interested in Mastering Power Query and the M Language?

Come check out our Academy.  We've got over 13 hours of amazing material that will take you from no skills to mastery and get you hours back in your life.

Adding Try Results is Trying

I was playing around with a scenario this morning where I was adding try results together in order to count how many columns were filled with information.  What I needed to do kind of surprised me a little bit.

The Goal

There's a hundred ways to do this, but I was trying to write a formula that returns the total count of options someone has subscribed to based on the following data:

image

To return this:

Adding try results - Attempt 1

So I figured I'd try to add the position of the first characters (which should always be one) together.  If there is a null value, it won't have a character, so will need the try statement to provide a 0 instead.  I cooked up the following formula to return a 1 or an error:

= Text.PositionOf([Golf Option 1],Text.Middle([Golf Option 1],1,1))

And then, to replace the error with a 0, modified it to this:

= try Text.PositionOf([Golf Option 1],Text.Middle([Golf Option 1],1,1)) otherwise 0
+
try Text.PositionOf([Golf Option 2],Text.Middle([Golf Option 2],1,1)) otherwise 0

But when I tried to commit it, I got this feedback:

image

Adding try results - Attempt 2

Now I've seen this kind of weirdness before, so I knew what do do here.  You wrap the final try clause in parenthesis like this:

= try Text.PositionOf([Golf Option 1],Text.Middle([Golf Option 1],1,1)) otherwise 0
+

(
try Text.PositionOf([Golf Option 2],Text.Middle([Golf Option 2],1,1)) otherwise 0)

At least now the formula compiles.  But the results weren't exactly what I expected…

image

So why am I getting 1 where there should plainly be a result of 2 for the highlighted records?

Adding try results - The fix

Just on a whim, I decided to wrap BOTH try clauses in parenthesis, like this:

= (try Text.PositionOf([Golf Option 1],Text.Middle([Golf Option 1],1,1)) otherwise 0)
+
(try Text.PositionOf([Golf Option 2],Text.Middle([Golf Option 2],1,1)) otherwise 0)

And the results are what I need:

image

So why?

I thought this was pretty weird, but looking back at it in retrospect, it is following the correct order of operations.  The original formula I wrote was "otherwise 0 + …".  So in truth, the entire second try statement was only getting evaluated if no Golf Option 1 was present.

I guess writing formulas is hard in any language!

The Each Keyword in Power Query

This post is a guest article from Marcus Croucher, a finance professional based in Auckland, New Zealand. Marcus instantly fell in love with Power Query after seeing how it can easily transform data in ways Excel finds difficult, and how it can automate repetitive workflows.

I have been using Power Query in a professional capacity for a number of years, but have never fully understood exactly how the each keyword works in Power Query. I did some research around two years ago, but the documentation at the time was quite sparse (and still isn't great) and I did not have enough knowledge about wider programming to fully understand it.

In the meantime, I was looking for a way to use Power Query-like technology on OSX. I ended up learning a lot of Python, which has several libraries that have similar functionality to Power Query (albeit without the amazing graphical interface of Power Query.) Two notable examples are pandas (very popular data analysis library), and petl (a more light-weight and easy to use data processing toolkit).

This general programming knowledge gave me the background to understand some of the underlying concepts behind each, which I will now proceed to attempt to convey to you, the general intermediate to advanced Excel user, so that you can understand and wield the each keyword with confidence. I will use Python as a parallel and will link to some Python articles that expand on the underlying concepts I am trying to explain.

A Deep Dive into How the Each Keyword Works

In this article, I assume that you are familiar with Power Query and how to use it on a practical level. I also assume that you have some (limited) experience with the advanced editor and have seen the underlying M code, and understand the underlying data structures and how to access them.

  • Table
  • List (columns) – notated as {item one, item two, etc.} and accessed by [column header]
  • Record (rows) – notated as [category1: data1, category2: data2, etc] and accessed by {row number}

I suggest going ahead and pasting the code snippets into a blank query in Power Query to see for yourself what is really going on.

When Might You Use the Each Keyword?

The each keyword in Power Query is used by quite a few of the built-in operations. For example, filtering a table based on a certain column features each here:

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

filter_to_score_5 = Table.SelectRows(sample_table,

each ([Score] = 5))

in

filter_to_score_5

Figure 1

Using the Each keyword to filter a table based on a certain column

The purpose is quite clear: we want to keep each row where the value in the Score column is equal to 5. However, how does this work? [Score] looks like it is referring to a full column – how can we do a full column equal to a single value? Can I access values from the row above or below the row we want to keep/discard?

Another example of the each keyword in Power Query is creating custom columns. Here, I create a custom column which is the combination of the index and the name columns:

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

add_ordered_name_column = Table.AddColumn(

sample_table, "ordered_name",
each Number.ToText([Index]) & " " & [Person])

in

add_ordered_name_column

Figure 2

Using the Each keyword to create a custom column which is the combination of the index and the name columns.

Similar questions apply. It is easy to follow the logic, but how it works and the limitations behind it are somewhat of a mystery.

So What is the Each Keyword in Power Query?

The current documentation has this to say about the keyword:

Each Keyword

The each keyword is used to easily create simple functions. “each ...” is syntactic sugar for a function signature that takes the _ parameter “(_) => ...”

 

Each is useful when combined with the lookup operator, which is applied by default to _ For example, each [CustomerID] is the same as each [CustomerID], which is the same as () => _[CustomerID]

Still not very clear (unless you have a background in functional programming languages). It would be really nice to get a full understanding, as the each keyword in Power Query is used in a number of places (filtering, custom columns, etc.) and understanding it would give us an understanding of what we can and can't do with it.

Turns out, you need to understand three things to understand each:

  1. Functions as first class objects
  2. "_" as a temporary variable (and the associated shortcuts within M language)
  3. Anonymous functions (each)

Functions as First Class Objects

If you are used to using Excel, you are used to functions (e.g., =SUM() ) being a bit magical. We cannot touch or modify them, and they are supplied fully formed by Excel. You can create custom functions with VBA, but these are far and few between, and still seem like the lessor cousin to the in-built functions.

In Power Query, functions can be thought of just another "object" – or just another type of data. This means they can be:

  • Assigned to a variable and/or renamed.
    • Just like we can do something like variable = 5, in Power Query, so we can do something like variable = function.
  • Fairly easily created
  • Used as a parameter to another function (!)

To understand this, we need to distinguish between calling a function (using it in our code) and referring to it (to name it or to use it within another function). Similar to other programming languages, to call a function we use the parentheses at the end like:

function()

If we want to refer to a function we just omit the parentheses like:

function

Let's demonstrate the renaming/reassigning of functions. First I take one of the supplied functions which takes a list (i.e., column) and calculates the sum. Next, I build a sample table, and then take a sum of one of the columns using the function that I had defined at the beginning.

let

sum_column = List.Sum,

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

sum = sum_column(sample_table[Score])

in

sum

Figure 3

Calling a function within another function.

It works!

Let's create a basic function

In Power Query, the syntax to create a function is:

(variable) => body of function

The body of the function is like any other query that returns a value, but instead of being delivered to the main Power Query interface it is delivered to whatever called it.

We also want to assign a name to the function, so let's expand the above template. Here, we create a simple function that takes a number and multiplies it by two.

multiply_by_two = (number) =>

let

output = number * 2

in

output

Looking good, but now we want to use this snippet in a full query. Let's build a (trivial) query that uses this function to multiply a variable by two and output the result:

let

multiply_by_two = (number) =>

let

output = number * 2

in

output,

x = 5,
y = multiply_by_two(x)

in

y

Figure 4

Building a simple query that uses the function we created.

Functions as inputs to other functions

Now that we have explored how functions are the same as any other data type or variable, and we have demonstrated how we can create our own functions, let's look at functions that take other functions as inputs. One example from the official documentation is the filtering function, Table.SelectRows.

About

Returns a table containing only the rows that match a condition.

 

Table.SelectRows(table as table, condition as function) as table

So the function expects a table (makes sense), and a function! How does this work? According to the documentation, the condition is "the condition to match".

It's not very well documented, but it turns out that this function expects a function to be supplied. It then applies that function to each row (record) of the table, and expects a true or false response from the function. It then uses this response to decide whether to keep or discard the row.

To recap, a record is a data type representing a row. We can access the items from a record by supplying the column name as follows: record[column name].

Let's create a function which we can then supply to Table.SelectRows on our sample data. Note – this is our case study example which I will develop throughout this article.

let

filterer_score_two_plus = (record) =>

let

value = record[Score],
result = value >= 2

in

result,

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

filtered = Table.SelectRows(

sample_table,
filterer_score_two_plus)

in

filtered

Figure 5

Creating the sample_table function.

Figure 6

Using the Table.SelectRows function to filter the table created by the sample_table function.

What have I done here? First I have created a function which takes a record, extracts the value in the Score column and returns true if it is greater or equal to two. I then construct my sample table and apply the Table.SelectRows function on it, supplying my recently constructed function as the second input. As we can see, the output as expected provides a table with all rows with scores greater or equal to two.

Now, only if there was a quicker and easier way to create such functions, as it looks like we might have to build these one-use functions quite a lot...

"_" as a Temporary Variable

The use of _ as a throw-away variable is common across several programming languages, Python included (see point 4 here). Usually, _ is used to name things that are not going to be used again and so it is not worth using up another name.

Here I write a query creating a table, assigning it to a variable called _. Power Query has no problem whatsoever using _ in this way.

let

_ = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"})

in

_

For Python, using _ is just a convention, but it appears that Power Query has expanded the functionality here. Let's say we just want the column of names from the above table. Usually we can do this by selecting the column by name using [column_name] as the selector.

let

_ = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"})

in

_[Person]

It turns out, we can omit the _ in this statement as Power Query will infer that if we just put [Person], the table we are referring to is the one called _. The example below works just as well as the one above:

let

_ = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"})

in

[Person]

Figure 7

If we omit the _ variable in this statement and just use [Person], Power Query infers that the table we are referring to is the one called _.

I wouldn't recommend this as general practice, as it is not well documented or understood behaviour and explicit is usually better than implicit. However, it does provide nice-looking code when used with the each keyword.

Note that this technique only works for column selections [column_name] rather than row selections {row_number}, as Power Query will interpret {row_number} as a new list.

Applying the _ Variable

With this concept in place, let's revise our filtering query defined above:

let

filterer_score_two_plus = (_) =>

let

result = [Score] >= 2

in

result,

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

 

filtered = Table.SelectRows(sample_table,

filterer_score_two_plus)

in

filtered

The shortest we can actually get this is pretty close to our final stage. Let's put the function definition right into the Table.SelectRows function, and get rid of the let and the in (only really needed if there are multiple steps in the calculation):

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

 

filtered = Table.SelectRows(sample_table,

(_) => [Score] >= 2)

in

filtered

This is already pretty tight and clean. However, the (_) => is pretty scary if you're not familiar with function definitions, and the function definition symbol "=>" is fairly similar to our greater than symbol ">=".

Anonymous Functions

As we saw above, we end up creating one-off functions to supply to other functions quite frequently in Power Query. It seems silly to go through all of the syntax of creating and naming a function if it won't be used again. There is a concept called anonymous functions, which are functions that are defined but not named. They are used as soon as they are created. In Python, these are known as lambda functions.

We can actually use the each keyword in Power Query to define the function. (Yes, we are finally at the each keyword itself!) Each just minimizes the syntax for creating functions, by providing a default input variable name, "_", and removing the need for the => or anything else. So:

(_) =>

let

result = [Score] >= 2

in

result

 

can become:

each [Score] >= 2

You can still name this if you like (filterer = each [Score] >= 2), but using the each keyword in Power Query is much more useful if we use it inline. So we come to our final query, which should look fairly familiar to intermediate Power Query users:

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

filtered = Table.SelectRows(sample_table,

each [Score] >= 2)

in

filtered

 

So What have We Discovered?

Now we have demystified the magic of the each keyword in Power Query, but what have we discovered?

  • Each itself doesn't actually do that much, slightly simplifying our function definitions.
  • Once you understand the concept of supplying functions as inputs to functions, everything becomes a lot clearer.
  • The missing piece of the puzzle comes from an understanding of the special _ variable name, which enables us to take a shortcut when selecting columns from tables or records (we can use [column name] instead of _[column name]).
  • The use of each hinges on the behaviour of the underlying function. Because we know that Table.SelectRows calls the supplied function on each record of a table and expects a true/false response, we can construct a function that works as we expect it to. Unfortunately, this is not very well documented.

How can we use this information? I can think of a few different ways:

  • If we have a complicated add column or filtering step to do, we can separate out the underlying logic into a separate function which can have multiple steps. This removes the complexity from the main body of the code and abstracts it away. It can be easier to read filter_to_current_year_red_cars rather than try to interpret all of the various equivalence statements as you read through the code.
  • Now that we understand the context that is delivered (only the current record/row), we can construct functions that can do more interesting things (these usually require an index column). For example, we can filter a table based on a second table, or add a new column that subtracts the current row from the previous row.

Here's an example that:

  1. Abstracts the logic to a helper function, and
  2. Adds a column based on the difference for each row from the previous row.

let

row_difference = each

if [raw_index] = 0
then [Score]
else [Score] - add_index[Score]{[raw_index] - 1},

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

add_index = Table.AddIndexColumn(sample_table,

"raw_index", 0, 1),

add_difference_column = Table.AddColumn(

add_index, "difference",

row_difference)

in

add_difference_column

Figure 8

Adding a column based on the difference for each row from the previous row

So does this help your understanding of the each keyword in Power Query? Has this given you some new ideas on how to structure your queries? Please let me know if you have any questions!