Power Pivot eBook Coming Soon

It's been a long time coming, but we are putting the finishing touches on the third installment of our free 'DIY BI' series. Consequently, we are excited to announce that the Power Pivot eBook will be officially released on Tuesday, July 3, 2018!

Power Pivot eBook

This brand new book will feature five of Ken's top tips, tricks, and techniques for Power Pivot, including:

  • Hiding fields from a user
  • Hiding zeros in a measure
  • Using DAX variables
  • Retrieving a value from an Excel slicer
  • Comparing data using one field on multiple slicers

Power Pivot eBook

 

About the 'DIY BI' Series

This free eBook series is available to anyone who signs up for the monthly(ish) Excelguru email newsletter. The series includes four books, one edition each for Excel, Power Query, Power Pivot, and Power BI. Each book contains five of our favourite tips, tricks, and techniques which Ken developed over years of research and real-world experience.

DIYBI eBook Series

We first launched this series in the spring of 2017 with the Excel Edition, and the Power Query edition followed later that summer. You can read some more about why Ken decided to create this series in his initial blog post about it.

The Excelguru Newsletter

The monthly Excelguru email newsletter features the latest updates for Excel and Power BI, as well as upcoming training sessions and events, new products, and other information that might be of interest to the Excel and Power BI community.

Don't Miss Out, Get Your Free Copy of the Series

If you're not already a newsletter subscriber, you can sign up here. We will send you the Excel Edition right away, and the Power Query Edition a few days later. All of our current and new subscribers will receive the Power Pivot edition once it is released on July 3, 2018. Be sure to keep an eye on your inbox for the new book.

We will be continuing to work on the fourth and final book, the Power BI Edition, over the coming months so stay tuned for details!

Number rows by group using Power Query

After one of my previous sorting posts, Devin asked if we can number rows by group.  Actually, that's a paraphrase… what he really asked was:

Any thoughts on how to produce something like a ROW_NUMBER function with PARTITION from T-SQL? Similar to this: https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-2017#d-using-rownumber-with-partition

I've never used the PARTITION function in SQL, so I checked in with him just to confirm what he was after.  And here it is: take the first three columns of the table below, and add a final column with index numbers sorted by and restarting with each group:

Number rows by group example

And of course there is.  For this one, though, we actually do need to write a little code.

Preparing to Number rows by Group

Now here's the interesting part.  The source data looks exactly the same as what you see above, the only difference being that in the output we also added a number rows by group.  So how?

Well, we start by grouping the data.  Assuming we start by pulling in a table like the above with only the first 3 columns:

  • Sort the data based on the Sales column (in descending order)
  • Group the data by Group
  • Add a aggregation column for "All Rows"

Like this:

Group By dialog

Which yields this:

Table after aggregation

We are now ready to add the numbering.

Now to Number rows by Group

In order to number rows by group, what we really want to do is add an Index column, but we need to add it for each of the individual tables in our grouped segments.  If we try to add it here, we'll get 2 new values, showing 1 for Alcohol and 2 for Food.  That's not exactly what we need.  But if we expand the column, then the index will not reset when it hits Food, and the numbering won't be right either.

The secret here is to add an Index column to each of the individual tables in the Data column, then expand the columns.  The numbering will then be correct.

To do this, I added a custom column using the following code:

=Table.AddIndexColumn([Data], "Index", 1, 1)

Where did I get this code?  I actually added an Index column to the whole table.  That added the following code in the formula bar:

=Table.AddIndexColumn(#"Grouped Rows", "Index", 1, 1)

I copied that code, and deleted the step. Then I added my custom column, pasted the code, and replaced #"Grouped Rows" (the name of the previous step) with the column I wanted to use.

The result was this:

Table after Index column added

The final steps to clean this up:

  • Remove the Data column
  • Expand all columns from the Partitioned table except the Group (since we already have it)

Which leaves us with our table in place including the new column which does number rows by Group as originally planned.

If you want to play with this one, the example file can be found here.

I would also be remiss if I didn't mention that we have a great video in our Power Query Academy that covers this kind of operation (among others).  It's called "Advanced Row Context" (in our M deep dive section) where Miguel shows all kinds of cool stuff that you can do by adding new columns to Grouped Rows.

Trick to Protect Excel Tables

Slobodan emailed me to describe a trick to protect Excel tables that he is using to drive data validation lists.  The data validation lists are sourced from tables loaded via Power Query, and leverage a little hack to hide them from prying users eyes.  I thought it would be cool if he shared it with everyone, so asked him to write up a little blog post on it, and here it is!

Take it away Slobodan…

Hello everybody,

Recently, my team and I had faced a problem with refreshing PQ tables that we managed to solve with a simple trick (no VBA coding), and shared it with Ken who asked me to share it with community. Thank you Ken for this opportunity! Glad to make some kind of contribution, to all of you PQ users.

Solution Background

We created calculation model for our sales people (Full cost calculation).  Inside this Excel file, they have a lots of drop down lists from which they can choose customer, partner etc. The idea is to make these dropdown lists dynamic.  In other words, whenever a new customer is created in SAP, they should be able to select this customer in Excel using a dropdown list. This is where Power Query comes to the rescue.

We have scheduled daily export of all our customers from SAP to a file on a network drive, and use this file as the data source for a local PQ table in the workbook. We then use our Power Query table “Customers” as the source for dropdown lists in calculation model.

The Challenge

How to make it fully automated? We have two goals here:

  1. We want Power Query to be scheduled for automatic refresh on a daily basis
  2. At the same time, we would like to protect Excel tables sourced via Power Query from careless users

For the first point, we have Power Update - a tool which allows you to schedule daily refresh.

Note from Ken: I haven't seen Slobodan's model, so there may be a need to use Power Update to do what he's doing.  If you only need your Power Queries to update each time the Excel workbook is opened, however, you could force an update by changing the table's connection properties to force an update upon open.

Second issue, in order to protect Power Query table, we need to hide these sheets and protect the workbook.  The end result is that our Customers table is hidden and cannot be unhidden and everything looks promising.

clip_image002

Of course, Excel protects the whole workbook structure using this method, which causes Power Update to fail. In fact, query refreshes also fail if we try to refresh data manually.

clip_image001

So the obvious solution doesn't work.  I spent time Googling for solution to this but could not find one 🙂

Our Solution

I am not VBA guy, but I remembered one tip from Mynda Treacy’s dashboard course which I applied here.

Step 1

  • Hide the worksheet and open the Visual Basic Editor (press Alt+F11)

Step 2

  • In the Project Explorer Window (Ctrl + R if it's not showing) select the sheet which  contains the Power Query table

clip_image003

Step 3

  • In the Properties Window (press F4 to display this), set the Visible property to "2 - xlSheetVeryHidden"

clip_image004

Step 4

  • Go to Tools --> VBAProject Properties --> Protection
  • Check the box next to "Lock Project for Viewing"
  • Set a password so only you can access it
  • Close the Visual Basic Editor

The Effect

Our sheet containing the Customers table is hidden, and there is no possibility to unhide it.  It doesn't even show up in the menu!.

clip_image005

At this point the only way to unhide the worksheet is to go into the Visual Basic Editor, and reset the worksheet's Visible property - but you protected the VBA project with a password so no one can get in there.

The great thing is that refreshing the Power Query tables will work, because you didn’t actually lock the workbook structure.

Caveat

This solution is intended to protect data from regular excel users, who can easily mess up your workbook.  Do be aware that users with VBA skills will be able to break the password, or extract the hidden sheet contents.

Hopefully someone finds this useful 🙂

Take care!

Your Voice Matters – Power Query Quality

Some time ago we embarked on a bit of a crusade to get Microsoft to fix a specific issue with Power Query related to performance.  I posted about it in detail on the Power Pivot Pro blog, and have been encouraging people to vote on this at every possible turn.  Conferences, classes, my free e-Book series, even the Microsoft Data Insights Summit - none of them were immune to hearing me drum up support to get this fixed.

Your voice matters

You voted, and Microsoft listened.  I'm super excited to let you know that they have architected a fix based on your votes, and it's finally out of testing!

Even though the idea is still marked as "started" on UserVoice, (I got this directly from a reliable source,) it is starting to roll out to the Office Insider channels on Office 365.  I'm told that so long as you're on version 1801, build 9001 or higher, you're good to go and have the fix in place.  You can locate your version and build information via File --> Account:

image

Thank you for voting!

While Microsoft does listen to me, my voice only goes so far.  Your voice matters a huge amount in this process, as it validates that it is bigger than just one person.  I want to thank everyone who voted for this fix to raise it's importance along the way!

Can't wait for the fix?

You can get on the Insider channel, and that will get you the fix.  There are two methods to do this:

For consumer licenses of Office 365, you can find out more here.

For commercial users, you have to download and configure an installer, which you can do here.

Will this fix all your Power Query refresh speeds?

Absolutely not.  This deals with one specific technical issue that we identified which was re-doing work multiple times.  There is still much improvement that needs to be done, but at least we've got a start here to bring Excel back to parity with Power BI Desktop.

In the mean time, if your queries are going slow, you might want to consider our Power Query Academy.  We have a module on Query Optimization which teaches how to use Buffer functions, provides a strategy to reduce lag during development and also shows a few settings that can be tweaked to make your code run faster.

Ranking Method Choices in Power Query

My recent post on showing the Top X with Ties inspired a discussion on ranking methods.  Where I was looking to rank using what I now know as a standard competition rank, Daniil chose to use a dense ranking method instead.  Oddly, as an accountant, I've never really been exposed to how many different ways there are to rank things - and I'd certainly never heard the terms skip and dense before. (At least not tied to ranking!)

Naturally, after a few emails with Daniil and a bit of a read over at Wikipedia on 6 different common ranking methods, I had to see if I could reproduce them in Power Query.

What are the 6 different ranking methods?

Let's look at a visual example first.   These were all created in Excel using standard formulae:

image

The first thing I had to do was figure out what each ranking method actually does.  So here's a quick summary according to Wikipedia's article on the subject:

  • Ordinal Ranking - This ranking method uses sequential number for each row of data, without concern for ties
  • Standard Competition Ranking - Also know as a form of a Skip ranking, this method gives ties the same rank, but the following value(s) are skipped.  In this case, our values go 1,2,3,4,4, 6.  (5 is skipped as the 5th item is tied with the 4th)
  • Modified Competition Ranking - This is similar to the Standard Competition ranking method, but the skipped values come before the ties.  In this case, we would get 1,2,3, 5, 5, 6.  (As 4 and 5 are tied, they both get ranked at the lower rank.)
  • Dense Rank - In this ranking method, ties are given the same value, but the next value is not skipped.  In this case we have 1, 2, 3, 4, 4, 5.
  • Fractional Rank - Now this one is just weird to me, and I'd love to know if anyone has actually used this ranking method in the real world.  In this algorithm, ties are ranked as the mean of the tied ordinal rank.  Very strange to me, but it won't stop me from building it!

So know that we know what they all are, let's build them in Power Query so that we can perform them in both Power BI and Excel.

Groundwork for demonstrating the ranking methods

If you download the sample workbook, you'll see that it has the full table shown above.  To make this easy, I set up a staging table called SalesData as via the following steps:

  • Select a cell in the Excel table --> Data --> From Table/Range
  • Select the Item and Sales columns --> right click --> Remove Other Columns
  • Load it as a connection only

This gave me a simple table with only the product names and values as shown here:

image

As you can see, the values column has already been sorted in descending order, something that is key to ranking our ties.

One thing I should just mention now is that - for every ranking method - we will actually start every new query by:

  • Referencing the SalesData query
  • Renaming the new query to represent the ranking method being demonstrated

That means that I'm just going to give the steps each time based on the view above, since that's what we should get from the referencing step.

Ranking Method 1: Ordinal Rank

This ranking method is super easy to create:

  • Sort the Sales column in descending order
  • Sort the Item column in ascending order (to rank ties alphabetically)
  • Go to Add Column --> Index Column --> From 1
  • Rename the Index column to Rank
  • Reorder the columns if desired

Yes, that's it.  It simply adds a row number to the way you sorted your data, as shown  here:

Ordinal Ranking Method in Power Query

Ranking Method 2: Standard Competition Rank

This ranking method involves using a little grouping to get the values correct:

  • Sort the Sales column in descending order
  • Add an Index column from 1
  • Go to Transform --> Group
    • Group by the Sales column
    • Create the following columns:
      • Rank which uses the Min operation on the Index column
      • Data which uses the All Rows operation
  • Expand the Item column
  • Reorder the columns if desired

The result correctly shows that the Dark Lager and Winter Ale - 4th and 5th in the list, but tied at 557, each earn a rank of 4, and the Member Pale Ale (6th in the list) comes in with a rank of 6.  There is no item ranked 5th, since their rank was improved to be in a 4th place tie.

Standard Competition Ranking Method in Power Query

Ranking Method 3: Modified Competition Rank

To create ranking following the Modified Competition ranking method, we need to:

  • Sort the Sales column in descending order
  • Add an Index column from 1
  • Go to Transform --> Group
    • Group by the Sales column
    • Create the following columns:
      • Rank which uses the Max operation on the Index column
      • Data which uses the All Rows operation
  • Expand the Item column
  • Reorder the columns if desired

The only real difference between this ranking method and the standard competition rank is that we create the Rank column using the Max of the Index column instead of the Min used in the previous method.

The result correctly shows that the Dark Lager and Winter Ale - 4th and 5th in the list, but tied at 557, now earn a rank of 5 (not 4 like the standard rank).  There is no item ranked 4th, since their rank was dropped to reflect a 5th place tie.

Modified Competition Ranking Method in Power Query

Ranking Method 4: Dense Rank

The dense ranking method requires a change to the order of the steps from what we did in the standard competition ranking method.  Namely the Group By command must come before the addition of the Index column:

  • Sort the Sales column in descending order
  • Go to Transform --> Group
    • Group by the Sales column
    • Create the following columns:
      • Rank which uses the Max operation on the Index column
      • Data which uses the All Rows operation
  • Add an Index column from 1
  • Expand the Item column
  • Reorder the columns if desired

This method will yield the results found here:

Dense Ranking Method in Power Query

The result correctly shows that the Dark Lager and Winter Ale - 4th and 5th in the list, but tied at 557, ranked in 4th place - just the same as the Standard Competition rank.  But where it differs can be seen in the ranking of the Member Pale Ale.  6th in the list, it is ranked 5th, as there are no gaps left after the ties.

Ranking Method 5: Fractional Rank

As I mentioned at the outset, I find this to be one of the strangest methods of ranking.  Like the others though, it's actually really easy to create when you know how. (And certainly more straight forward than using an Excel formula to calculate it!)

  • Sort the Sales column in descending order
  • Add an Index column from 1
  • Go to Transform --> Group
    • Group by the Sales column
    • Create the following columns:
      • Rank which uses the Average operation on the Index column
      • Data which uses the All Rows operation
  • Expand the Item column
  • Reorder the columns if desired

One thing I will say… it's certainly makes it obvious that there are other ties in the table.  Maybe that's the point of it?

Fractional Ranking Method in Power Query

Final Thoughts

I was actually surprised to see how easy it is to change the ranking methods with just some minor modifications to the order of steps and/or the aggregation chosen when applying the grouping method.  It certainly gives us some robust choices!

And while we can certainly create each ranking method using Excel formulas (each is demonstrated in the sample file if you're curious), this is even more awesome.  Now we don't need to load data and land it in the grid.  We can go straight to Power Pivot or Power BI should be need to.

If you'd like to download a file with each of the methods illustrated, just click here.

Creating Dynamic Parameters in Power Query

A couple of years ago, the Power Query team added Parameters as a proper object, but I kept on Creating Dynamic Parameters in Excel Power Query the same way as I always had.  The reason for this is two-fold: the first is because I was used to it, the second was because the built-in Parameters are quite static.  Sure, you can set up a list and change them at run time, but you have to enter the Power Query editor to do that.  And is that really something you want your boss doing?

So why do we care about creating dynamic parameters, anyway?

Let's take a look my last technical blog post to understand this.  In that post, I pulled a Top 5 value from an Excel cell, and used that to drive how I grouped my items.  It works great, and is truly dynamic.  It puts control of the grouping in Excel, allowing a friendly user interface for the end user to work with.  They simply change a cell value, hit refresh, and all is good.

The challenge here is not from the end user's perspective, it's from the developer's.  One of the instructions I gave in the post last week was to:

  • Create a Custom Column using the following formula:
    • if [Rank] <= TopX then [Item] else "Other"

Why a Custom Column?  Why not just use the Conditional Column dialog?  The answer is simple… TopX in this case was a query that returned a value, but it was not a proper Power Query Parameter.  Does it work the same in code?  Pretty much yes, but you can't see it in the Conditional Column dialog as you're building the query.

Even worse, if you want to make any modifications to the logic, you have to do it in either the formula bar or the Advanced Editor, as the gear icon returns the conditional column builder, but can't resolve the query:

Conditional Column Dialog

Wouldn't it be nice if we could create dynamic parameters that actually show up as valid Parameters to Power Query?  That's the goal of this post.

Groundwork - Creating the dynamic parameters in Excel

There are two different ways we can do this:

 1. Fetching dynamic parameters using a Named Range

This is the super easy method.  To do this:

  • Enter your parameter value in a worksheet cell
  • Go to the Name Manager and define a name for the cell (I called mine rngKeep)
  • Select the cell and pull the data into Power Query
  • Right click the value in the table's cell --> Drill Down
  • Rename the query
  • Load it as a Connection only

For this example, I renamed my query to XL_TopX_NamedCell.  It's a long name, I know, but you'll see why in a bit.

2. Fetching dynamic parameters from a Parameter Table using the fnGetParameter function

I've detailed this technique on the blog before, so if you'd like to review this technique, you can find a detailed post on that here.  The quick summary:

  • Create a two column table called Parameters, with Parameter and Value columns

Parameters Table

  • Copy in the fnGetParameter function (from the other post)
  • Call the function as needed

Just to check the value, I then:

  • Created a blank query
  • Entered the following formula in the formula bar
    • =fnGetParameter("Keep top")
  • Named this query XL_TopX_fnGetParameter
  • Loaded it as a connection only

Query to check the dynamic parameter value

So what makes a parameter a "Real" parameter?

At this point, I decided to create a new parameter and look at what happens.  To do this, go in to the Power Query editor and…

  • Go to Home --> Manage Parameters --> New Parameter
  • Give the Parameter a name (I used Test)
  • Set a Current Value of 0
  • Click OK

Next, right click the Parameter in the Queries pane on the left and go to the Advanced Editor.  You should see code that looks like this:

0 meta [IsParameterQuery=true, Type="Any", IsParameterQueryRequired=true]

So this is interesting… 0 is the value, and the rest is just a meta tag to tell Power Query that this is a real parameter…  This got me wondering… am I stuck with this value, or can I feed it a Power Query code and actually create a dynamic parameter that updates at run time?

Converting a Query to a dynamic Parameter - Take 1

The first thing I did here was copy everything after the 0, then exited this query.  I then:

  • Jumped over to the XL_TopX_NamedCell query
  • Entered the Advanced Editor
  • Pasted the copied line of code at the end
  • Clicked OK

And it didn't work.  Not to give up, I jumped back into the Advanced Editor and wrapped the original query in parenthesis like this:

Wrapping the original query in the Advanced Editor

And this time, something did change:

Dynamic parameter appears in the query list

There are 3 things worth noting here:

  1. It has the parameter icon (Yay!)
  2. It doesn't show a current value but shows an exclamation icon
  3. It shows the value of (…) in the name - meaning it doesn't know what the value is

I wasn't too worried about this last one though.  Dynamic named ranges show the same way in Excel, so would this work to create dynamic parameters?

Conditional Column Dialog

It sure does!  Not only does it show up in any parameter drop down, but the value gets read correctly and allows me to make my comparisons.  How cool is that?  I've actually got a dynamic parameter now!

Converting a Query to a dynamic Parameter - Take 2

Now, as cool as this was, there is something that bothered me about it.  When you tag the meta data at the end of a functional query and turn it into a parameter, you lose the applied steps.  If anything goes wrong, it makes it hard to debug.  (Reminds me of the classic custom function setup.)

To solve this, I decided to remove all the meta tags and parenthesis from the XL_TopX_NamedCell query, returning it to what is was before I turned it into a parameter.  I then created a new blank query called TopX_NamedCell and edited the code in the Advanced Editor to read as follows:

XL_TopX_NamedCell meta [IsParameterQuery=true, Type="Any", IsParameterQueryRequired=true]

Why?  Because I now have the query that pulls in the original data.  When I click on it, I can see the values and debugging steps to get there:

Checking the dynamic parameter value

And I also have a Parameter, which pulls from this value and can be used in my drop downs:

Conditional Column Dialog

Extending dynamic Parameters to leverage the fnGetParameter function

If you've used the fnGetParameter function before, it only makes sense that you'd want to know if we can leverage this function to pull values and return real Parameters.  And indeed you can.

 Parameters that pull from fnGetParameter

Here's the quick and dirty way to create dynamic Parameters by calling the fnGetParameter function directly:

  • Create a new blank query
  • Name your new Parameter  (I called mine TopX_DirectFromFunction)
  • Go into the Advanced Editor
  • Paste in the following code:

fnGetParameter("<Variable Name>") meta [IsParameterQuery=true, Type="Any", IsParameterQueryRequired=true]

  • Replace <Variable Name> with the name of the variable you want from the Excel Parameter table.  In the example this would be fnGetParameter("Keep top")
  • Click OK

Yes, it's just that easy.  You've now got a fully functional and dynamic Parameter… at least, you do if you replaced the variable name correctly with one that exists in the Parameter table!

NOTE:  I recommend that you rename your query before you edit the M code since you lose the applied steps window during the process.  You can still rename a parameter, but you'll need to right click it in the queries pane on the left and choose Rename to do so.

Making dynamic parameters that pull from fnGetParameter auditable

There's only one problem with the above approach. How do you test the value is resolving correctly before you try to use it?  Or how do you look to see what is actually happening when your downstream queries return an error?

For this reason, I actually recommend that you don't use the fnGetParameter query in a real Parameter as outlined in the previous section.  What I recommend you do is create an intermediary query which leverages fnGetParameter to pull the value from the Excel table, then reference that query from the Parameter query.  So in short:

Create an intermediary query

This is also fairly easy to set up.  The full process would be:

    • Copy in the fnGetParameter function
    • Set up the Parameters table in Excel and populate it with data
    • Create a new blank query to retrieve the parameter value
      • Name it
      • Enter the following in the formula bar:
        • =fnGetParameter("<variable name>")
        • replace <variable name> with the name of the parameter you wish to retrieve
      • Load as Connection only
    • Create a new blank query to be the real Parameter
      • Name the parameter as you'd like to see it in drop down lists
      • Go into the Advanced Editor and enter the following
        • QueryName meta [IsParameterQuery=true, Type="Any", IsParameterQueryRequired=true]
        • Replace QueryName with the name of the query you created above
      • NOTE: Parameters will automatically load as Connection Only queries
    • Use the new Parameter in other queries

See it in action…

The attached sample file contains three different variables based on the methods above, any of which can be used to drive the Conditional Columns step in the Grouped query:

Dynamic Parameters listed in the Conditional Column dialog

And if you're curious, they are related as shown below.  The TopX_NamedCell parameter is driving the version I saved, but as per the above, you can change that out easily.  (Naturally, in this case they all serve up the same value though!)

Query Dependencies View

Some Observations

As I was playing around with this, I noticed a couple of things that are worth bringing up here.

Yes, these work in the Power BI service!

To test this out, I cooked up a small sample that used a dynamic parameter using the methods outlined above to read the most recent year's data from a SharePoint folder.  I then published it to the Power BI service, added a new file to the server and refreshed the data in Power BI online.  Worked like a charm.

For the record, I haven't tested, but don't anticipate that this will work well with Power BI templates, as they will most likely clear the parameters and prompt you for values.  Any data points you wish to be preserved should be left as queries.

The Convert to Parameter function

Assume you created a new query, then typed a value into the formula bar (not a formula, but it could be numeric or text).  This would return a single (scalar) value that is static.  You'd then be able to right click the query in the Queries pane and choose Convert to Parameter.  Unfortunately, if your query returns anything that is dynamic or has multiple data points, this option is greyed out.  That's too bad, as this would be a really cool thing to be able to do.

Avoid the Add/Manage Parameter UI

Unfortunately, adding even a single dynamically-driven parameter renders the Manage Parameter dialog useless to you.  The reason is that as soon as you try to say OK to any parameter in that list (whether modifying or creating a new one), it appears to try to validate the current value of each of the listed parameters:

Add/Manage Parameter UI

This is unfortunate, as it means that you'd need to kick over to a blank query to create any new Parameters or debug any existing ones.

UPDATE:  Thanks to Andrew in the comments, I know that you can uncheck the Required value when creating your parameter.  If you do that the M code upon the initial creation comes up as:

0 meta [IsParameterQuery=true, Type="Any", IsParameterQueryRequired=false]

If the required setting is false, then the manage queries dialog can still be used without forcing an update!

 The Parameter meta tag

The only part of the Parameter meta tag that is actually required is the following:

meta [IsParameterQuery=true]

Having said that, I got mixed results doing this.  Sometimes the Parameters were not presented in my drop down list.  Editing those queries and restoring the full meta tag to the end resolved that immediately.  I.e.:

meta [IsParameterQuery=true, Type="Any", IsParameterQueryRequired=true]

Changing a Single File Query to a From Folder Query

This post on changing a single file query to a From Folder query is a guest article by John MacDougall, a Microsoft Excel MVP from Canada and owner of HowToExcel.org.

Recently, I had to import a large-ish CSV file at around 1.5 million rows into Excel using Power Query.

I used a From Text/CSV query then proceeded to transform the data and merge a couple other tables with it. I did this all in the one query then loaded it into Power Pivot to analyze. It worked great and Power Query/Pivot handled it easily.

Then I had to look at a longer time period.

I tried to export a new set of data with a longer timeline. The system I was exporting from had trouble before with a much smaller time period, and sure enough, it wasn't working with the new larger time period.

This meant I had to export a couple smaller files and glue them together. I needed a From Folder query instead of the single CSV query.

My original query had a lot of transformation steps in it after importing, so I didn't want to just throw it away and start over.

In this post, we'll look at how we can convert a From Text/CSV query into a From Folder query.

You can download the before and after workbooks along with the CSV files used in this post here.

The Original Query

Ok, this isn't my original query or data and the transformations are spurious and just for example.

Original file import and transformations

Here's a From Text/CSV query. Excel will automatically create the first three steps in the query which define the source of the data, promote the first row of data to headers and then changes the data types.

The remaining steps are the transformations I've added to the query. These are the transformations I want to preserve and don't want to have to go through creating again by starting from scratch with a new From Folder query.

I've loaded this query into a table in the workbook and named it ReportData.

In this example, there aren't a lot of added steps. But you can imagine with multiple merges, added custom columns and other transformations it could be a pain to reproduce within a new query.

Create a New From Folder Query

We are going to need to create a new From Folder query.

Creating a From Folder query

Go to the Data tab and press the Get Data button then choose From File and then From Folder.

Inserting the folder path

In the Folder dialog box we can either copy and paste the path of the folder which contains the set of reports or browse to it.

Choose Combine & Load To

The next window will show a preview of the files in the folder and we can press the Combine button and then choose the Combine & Load To option.

The Combine Files window will appear and we can go with the default options and press the Ok button.

Load data as a Connection Only

Now select Only Create Connection in the Import Data dialog box.

Power Query creates a new query named after the folder. In our example here, mine is called Reports.

Edit the M Code of our Previous Query

Now we are going to edit the M code of our original query which was called ReportData. In the Queries & Connections window pane, right click on the ReportData query and select Edit to open up the query editor.

We need to make it reference our newly created folder query as its source.

Open the Advanced Editor

Go to the Home tab and press the Advanced Editor button to open up the M code editor.

Original M code to be modified

We now need to edit a couple of lines of the code highlighted in yellow above. These are the lines of code from the original single file CSV import.

  1. The first line of code is the source of the data. It points to a single CSV file with a folder path and file name. We need to delete this part and replace it with a reference to our new folder query. The new source is now going to be the Reports folder query so we can replace this with #"Reports".
  2. The second line is another step automatically added to our original query which promotes the first row of data to column headers. This step now happens in our new folder query, so we don't need this and can delete it.
  3. The third line changes data types and is not needed we can also delete this step.
  4. The fourth line is the first step in our data transformation, but it references the previous step which we deleted. We need to update it to reference the Source step.

Finishing and Loading the Query

The revised M code should now look like this:

Revised M code

We have updated the source to reference our new folder query. We have removed the promote headers and change data type steps. The remaining step's references have been updated to reference the previous step.

Now we can press the Done button and then Close & Load the query. We should see our table update to include all the data from the files in our folder.

Conclusions

It's not a terribly complicated process to change the source of a query and update it from a single file import to a folder import. But why not avoid it in the first place?

Making mistakes allows for the opportunity to learn new thing and in this instance, I learned two things about importing text, CSV or Excel files into Power Query:

  1. Import a clean untouched version of the data as a connection only, then reference it in a new query for any transformation steps.
  2. Use a From Folder query instead of a From Text/CSV or From Workbook query. This way you can easily add more files to the folder for import at a later time.

Results: Top X with Ties Challenge

Last week I posted a Top X with Ties Challenge to see how our readers would approach this problem using Power Query in Excel or Power BI.  I'm really glad I did this, and am thinking that I need to do more of these kinds of posts in future.

Response for the Top X with Ties Challenge

I had come up with two different approaches to the Top X with Ties Challenge on my own, but it was interesting to see the different solutions posted by you all.  In total, we had 10 different files submitted from 9 people, some with multiple approaches to the problem.  I want to thank all of you for participating in this, and today's post will look at the different approaches which have been summarized in the attached workbook.  (Yes, the techniques will work in Power BI, I just use Excel to demo as it's easier to keep the data contained in the same file.)

In order to keep this to a manageable level, I had to group the solutions as well and to do that I focussed on the meat of the overall solution.  This meant that I left some of the slick tricks that some of you included if they didn't really impact the overall solution.

Just a quick summary of those:

  • Maxim rolled his solution into a custom function so that he could easily invoke it again.  One trick that he had in there was to dynamically create a dynamic "Top X" column header for the grouped column.  It was really cool, but it did kick me when I made my input dynamic and changed to 10 items, as my Changed Type step was trying to set the data type on the Top 5 column.  Winking smile
  • Sam used a slick trick to replace the values in the Item column with the new grouped names.  Also very cool, but to be consistent with the other solutions I wanted to keep the original column as well as add the grouped column.
  • Daniil's submission actually used a different ranking method than what the rest of the solutions (including mine) used.  I've also modified his to be consistent with the rest, although his solution has generated what will become a future blog post too.
  • Many of the submissions did things inline in a single query, referencing prior steps.  For the sake of transparency and explanation, I've broken most of these down into separate components so it's easier to follow.

One thing I did incorporate here was that I added some extra ties a bit lower down, suggested by Kevin.  This turned out to be super important, as it enabled me to validate that all the approaches were truly in sync (and fix them when they weren't). With the ability to change the "Top X" from a worksheet cell, we can now dynamically compare the results and see that they are the same.  So the revised table now looks like this:

image

Apart from those little nuances, everything (with some small mods) fell neatly into the buckets.  And all in all, I've got 5 different solutions to showcase that return the same result.

Laying the groundwork

Before jumping in to the techniques used to solve the Top X with Ties Challenge, I just want to lay out the groundwork I used to illustrate them all.

Groundwork - Dynamic portion

I wanted to be able to pull the Top X from a worksheet cell so that it was easy to update.  To do this I:

  • Created a named range called "rngKeep"
  • Entered the value of 5
  • Selected the cell and pulled it in to Power Query
  • Renamed the query "TopX"
  • Right clicked the whitespace beside my value --> Drill Down
  • Loaded the query as a Connection (not to table)

SNAGHTML1e541326

And with that, we've now got the framework to dynamically update our Top X via a worksheet cell.

Groundwork - Staging Query

The next piece I needed was a simple base query to use to demo all the different techniques.  To do this I:

  • Selected a cell in the data table (which I had renamed to "Sales")
  • Pulled the data into Power Query
  • Change the Data Types to Text and Whole Number
  • Sorted the rows in Descending Order based on the Sales column
  • Renamed the query to SourceTable
  • Loaded the query as a Connection

And that's it.  We're now ready to explore the solutions that came in. Smile

Top X with Ties Challenge - Solution 1 - Using Merges

We had a total of 3 submissions that used this method (including mine).  This solution (as illustrated) requires two queries (although they could be merged into one if you wanted to):

Creating the TopXItems query

To do this you just need to:

  • Reference the SourceTable
  • Keep the top 5 rows, then replace the 5 in the formula bar with "TopX" that it can update dynamically
  • Load it as a Connection only query (called TopXItems) for later use.

image

Creating the Grouping (via Merge) query

  • Reference the SourceTable again
  • Merge the Sales columns against the same column of the TopXItems query using a Left Outer join
  • Expand the Item field from the new column

SNAGHTML1e79c7bc

  • Add a new Conditional Column that replicates the following logic:
    • if [Item.1] = null then "Other" else [Item]
  • Remove the [Item.1] column
  • Select the Item and Sales columns (together) --> Remove Duplicates
  • Load the data to the destination

Relatively easy and it works well.  Just a quick note here about the duplicates part is that due to the join type you could end up duplicating values, so the removal is needed at the end.

Top X with Ties Challenge - Solution 2 - Using Grouping

Next up, we've got grouping as a method to work this out - a technique that was featured in 6 of the submitted examples in one form or another.  Funny enough, I didn't solve my issue this way, although I think it has to be a front runner for solving this issue.

To solve the issue using this method:

  • Add an Index column from 1
  • Go to Transform --> Group --> Advanced
  • Group by the Sales column
  • Create a column called Rank to aggregate the Min from Index
  • Create a column called Original to group All Rows

image

  • Expand the [Item] field from the Data column
  • Add a Custom Column to create the Group column using the following formula:
    • if [Rank] <= TopX then [Item] else "Other"

SNAGHTML1edaa73c

  • Remove the Rank column
  • Reorder the columns (if desired)
  • Load to the final destination

What I love about this is that it's super easy, 100% user interface driven and actually provides a lot of ability to play with the ranking.

I'll follow up on that last comment in a future post, including why I put the Index column where I did, and why it was part of the grouping (as I know some of you did this after grouping by Sales.)  We'll devote an entire post to those nuances as they are important and I've got 3 more methods to cover.

Top X with Ties Challenge - Solution 3 - Using Chris Webb's Rankfx Function

This one was an interesting one, as it uses a technique that Chris Webb first wrote about back in 2014 in his Power Query book.  (Note, this was not submitted by Chris!)

Here's how this query is built up:

  • Reference the SourceTable query
  • Click the fx in the Formula bar to get a new step and replace =Source with:
    • = (SalesValue) => Table.RowCount(Table.SelectRows(Source, each [Sales]>SalesValue)) + 1
  • Right click and rename the Custom1 to Rankfx
  • Click the fx in the formula bar again, and =Rankfx with =Source
  • Add a Custom Column using the following formula:
    • =Rankfx([Sales])

SNAGHTML1e87f9f4

  • Remove the Rank Column
  • Load to the final destination

I haven't checked with Chris, but I suspect that his technique precedes the existence of any UI to do grouping.  (I know it precedes the UI for merging, as that wasn't in place when we wrote M is for Data Monkey.)  Still, it works nicely to get the job done.

Top X with Ties Challenge - Solution 4 - Using List.Contains

While I wasn't the only one who drove to a List function to solve this issue, I was the only one that tried to do this by using a List.Contains function. My logic here was that list functions should be fast, and if I can check each line to see if it's included in the List, then I should be golden.

Creating the TopXList List

The first thing I needed was my list to examine.  This was pretty easy to accomplish:

  • Reference the SourceTable query
  • Right click the Sales column header --> Drill Down
  • List Tools --> Transform --> Keep Items --> Keep Top Items --> 5
  • Replace 5 in the formula bar with TopX

image

  • Rename to TopXList
  • Load as a Connection

Note that because the original column is called Sales, the list seems to inherit a name of Sales1.  I assume this is to disambiguate it in the code.

Grouping via List.Contains

Armed with my list, I could now figure out the grouping in a new query.  To do this:

  • Reference the SourceTable
  • Add a Custom Column called Group using the following formula:
    • if List.Contains(TopXList,[Sales]) then [Item] else "Other"

SNAGHTML1e950ea5

  • Load the query to the final destination

I have to be honest, it surprised me how easy this was to do.  Yes, I did need to do a quick bit of reading about the List.Contains function (since we don't have any Intellisense in Power Query yet), but overall, this was pretty slick to build.  Using list functions, this should be very quick to execute and - if it needs more speed - I could always buffer the list as well.  (With data this small it's irrelevant.)

The only thing I don't like about this one is that you do have to go and get jiggy with formulas.  And if you're not comfortable reading MSDN documents (which tends to be code centric and poorly illustrated for Excel people), then you could end up getting turned off by this.

Top X with Ties Challenge - Solution 5 - Using List Drilldown

The final solution I'm showcasing is the only other list function that was submitted.  This was submitted by Daniil and approaches the job a bit differently than I did.  It's still amazingly simple at the end though.  Broken down it has two parts:

  1. Identify the value representing the TopN value
  2. Run logic to change the description for items greater than the TopN value

Identifying the TopN item from the list

To get started, we can create the TopN as follows:

  • Reference the SourceTable query
  • Right click the Sales column --> Drill Down
  • Right click the row representing our max row --> Drill Down

image

  • Replace the value (which will be one less than the row number pictured above) with "TopX-1"

image

  • Rename the query to TopN
  • Load as a Connection only

The key to remember here is that Power Query always counts from 0, which is why we need to subtract 1 from out TopX value when we are drilling in to the table to retrieve that item.

Adding a Group based on the TopN

With the TopN identified, we can now run some conditional logic against our original table to group our data appropriately:

  • Reference the SourceTable
  • Add a Custom Column called Group with the following formula
    • =if [Sales] >= SaleN then [Item] else "Other"
  • Load the table to the destination

And that's it.  The only really tricky part in this is that you need to shift that TopX value by one when you're modifying the drill down.  But apart from that, this is just as easy to build as any other of the solutions.

Final thoughts on the Top X with Ties Challenge

Again, the first thing I want to say here is how much I appreciate everyone sending in their solutions to the Top X with Ties Challenge.  It's always awesome to see how others approach things, and you've got no idea what cool things you'd made me start thinking about.  Smile

The second thing, and it's kind of tied to the first, is how cool it is that we've got multiple approaches to do the same thing. I'm always fond of saying that there are at least 3 ways to do everything in Excel, and what's more is that I know what we've got here isn't an exhaustive list.  How cool is that?

Feel free to download the illustrated solution that I've described above.  Above the output table for each method is the first name for the people who submitted their solution (so you can see how I grouped you in on this).  And if you want to test out the refresh, here's what you need to do:

  • Change the value in cell F1

Visualizing the Top X with Ties Challenge

  • Go to Data --> Refresh All
  • Go to Data --> Refresh All (yes, a 2nd time… so it updates the PivotTable & PivotChart)

image

No matter which query results you're looking at, you'll see that they are identical.

Which will perform fastest?  Honestly, I didn't test this with more than 300,000 rows of data.  When I tested, there was no real difference in speed between the solutions.

Which method should we use?  Great question, and I don't have an answer.  The List function methods should be fast, but the grouping is VERY versatile and is going to be my recommended method going forward.  I know this is cruel, but in a couple of weeks I'm going to put up a post as to why that is.  Stay tuned for that one!

Let me know your thoughts and comments.  Did anyone else pick up any new techniques here?  Any challenges you'd like to see posted?

Come Master Your Data in Amsterdam

I can't believe that it's less than a month until I'll be at the Amsterdam Excel Summit.  We've got a full day of conference sessions on June 7, but the really cool part is that I get to teach Master Your Data with Power Query in Amsterdam for the first time.  I've taught this course in half a dozen countries around the world (so far), and saved people from thousands of hours of manual effort.  And now it's available to you in the EU!

What is Master Your Data?

Master Your Data with Power Query is my full day course on how to really get started with Power Query in Excel. Why is it so important? It's because it will help you  to get hours of your life back.  You spend 80-90% of your life PREPARING data for analysis.  This tool is all about getting those hours back so that you can analyze the data and add value to your company.

Power Query is a FREE add-in (from Microsoft) for Excel 2010/2013, and is so important that it is now built in to Excel 2016 and higher.

What is covered in Master Your Data?

We start with a quick PivotTable review (because that's the whole reason we need good data), then dive in to using Power Query with real world data.  We look at importing data from CSV, text and Excel files, and even entire folders full of files all at once.  And when we're done, we just update it for the next month by pressing the Refresh All button.  If you do any copying and pasting of data on a monthly basis, you're wasting your time today.

Why do you care?

The items above alone will save you a ton of time. Here's a couple of examples:

  • One of my clients saved 8 hours per week of manual labour using these techniques - that's 52 days per year she could work on other things!
  • At the Australia conference we were able to automate a workflow in 30 seconds, saving the attendee 6 hours per week - hours that she had been putting in on the weekend because she couldn't get it done at work.  30 seconds to get 6 hours of family time back!

Yes, there is a cost to your class.  And yes, you'll need to pay for your travel, meals and hotel.  And yes, that can look like a lot of money - let's say that all told it's 2,000 Euros...  Divide that by your hourly rate though, and see how many hours you need in order to break even on the training.

Let's be conservative and say it will save you 2 hours per week… how many weeks until you break even?  What if it were to save you 4 hours per week?  Or - like my client - 8 hours per week?  This is probably the most impactful thing you can do for your Excel career today.

But remember that the raw cost is only part of the equation here… what is your time really worth?  What opportunities are you forgoing because you're wrapped up in data preparation and not analysis?  What could you do for your company with another 2 or 4 hours per week?  That's where the real value comes in.

Is there more in Master Your Data?

Of course there is!

I'll show you 7 different ways to merge data tables (with no VLOOKUPS), which will allow you to identify the items you need to focus on reconciling (not focussing on the items that don't.)  I'll show you how to perform an approximate match in Power Query, as well as how many-to-many joins can be created and why you'll use them.

You'll learn how to convert data from one format to another - all refreshable with a button.  Pivoted data like this:

image

Stacked data like this:

image

And even subcategorized data like this:

image

Whether you need to feed a PivotTable, a chart or a Power Pivot Data Model, the steps shown in this course show you how to create refreshable transformations in seconds that you can update with a click next month.  How cool is that?

I'll show you how to deal with conditional logic patterns that can be built through user interface buttons like this:

image

As well as some that require a little more work and error handling, like this:

SNAGHTML395ab66

And - of course - more.  There's always stuff I include that is not in the outline.

Oh… and did I mention that this training is 100% transferable to Power BI as well?  Even if you're not in that space today, you're learning a skill that is 100% transferrable to other programs in the future.  It's double the bang for your buck!

Trust me when I say this… This toolset is the future of your data analysis systems - and the future is available to you today.

What do you get in Master Your Data?

You get:

  • A full day with one of the world's leaders in the Power Query technology (who also happens to be one of the most passionate presenters you'll ever see)
  • Copies of my slides to keep - many of which are data transformation recipes that you'll refer to again and again
  • Copies of both the before and after files
  • The chance to ask me questions about YOUR data as well
  • Hours of your life back (this is not a training expense - it's an INVESTMENT in your future!)

More details on Master Your Data

The full details on this class can be found here.  And you can register here.  I hope to see you join me so that we can super charge your data preparation process and free up your time for work that actually adds value!

Challenge – Top x with Ties

Please note: the submission period for the Top x with Ties Challenge is now closed. You can check out the follow-up post where the results are summarized here.

Yesterday I was working on a data set, and needed to work out the Top x with Ties in Power Query.  This posed to be a bit more challenging than I first thought it would be.

Why did I want Top x With Ties?

Let's take a look at the source data here, on a simplified data set.  That data looks (in part) like this:

SNAGHTML3e20433

I needed to chart it, and wanted it to look like this:

image

Now, appreciating you can't see all the data, the deal is this… I wanted to add a column to the data set that shows the item name for the top 5 items, and shows "Other" for all the rest.  This allows me to group things to show how the major sellers compare to the rest of the business.  Make sense so far?

The trick here is that I ended up with two item (Pint Winter Ale and Caesar) that have exactly the same value, which meant that I needed 6 categories in this case.  But Power Query doesn't have a native function to keep the Top x with Ties, it only has Keep Top Rows.

So how would you generate the Top x with Ties for this scenario?

Basically, what I need in order to serve up the chart correctly is this:

SNAGHTML3eff6f0

So here's the criteria… we want it to:

  • Automatically provide the Top x with Ties
  • Be easy to update to change the value of x (in case we want top 10)

Now you might think "Hey that's easy... I'll just sort it, add an Index column and then I can use a conditional column to choose anything over a certain number."  That's perfect if you're not worried about ties, but in this case we want the Top x WITH Ties.  So now what?  Can you filter to keep the Top x rows?  Nope, because again, that only keeps those rows, and not the ties.

Now I have two solutions for this already, but I'm curious how you'd approach this challenge.

Just to make this more fun…

Do NOT post your answer below.  (We don't want to spoil it for anyone.)

Instead, email your workbook to Rebekah at (you know) Excelguru dot ca.  We'll collect them and share the best ones (along with mine) next week.

You can download the source data here.  Let's see what you've got!  Smile