Monkey Tools is Here

We are super excited to announce that we’ve (at last) released the first version of our Monkey Tools software!  Ken has been working on this software on and off for the better part of 8 years now.  But after showing it to a friend in Wellington last year, we decided it was finally time to get serious.  We hired a full-time developer last summer and are finally ready to go live with the initial release!

What is Monkey Tools?

Monkey Tools is an Excel add-in (supported in Excel 2016 and higher) which provides tools for you - as a business intelligence author/reviewer - to:

  • Build models more rapidly
  • Follow recommended practices
  • Document your work
  • Audit files that you receive

It is targeted primarily at modelers and analysts who work primarily in Excel, but also push their models into Power BI.  (Our philosophy at Excelguru is to model in Excel first, then export to Power BI for reporting, sharing and security control.)

Oh, and super important… it installs on your system without requiring admin rights on your PC.  How cool is that?

What does Monkey Tools actually do?

Well… lots!  We’ve collected all the cool features under some themed buttons including:

  • QueryMonkey (for inserting new queries)
  • DestinationSleuth (to provide information on query load destinations)
  • QuerySleuth (helping understand your actual queries)
  • TimeSleuth (to benchmark query load times)
  • PivotSleuth (helping you diagnose Pivot Table field issues)
  • DAXSleuth (tools especially for working with DAX measures)
  • ModelSleuth (reporting on the properties of your queries and data model)

Cute names, right?  The Monkey builds things, and the Sleuths investigate things.  Here’s a high-level view of what they each contain.

QueryMonkey

Query Monkey gives you the ability to insert key queries like:

  • The famous “fnGetParameter” query and table (from Chapter 24 of M is for Data Monkey)
  • A “From Folder” setup that works with local and/or SharePoint hosted files
  • Dynamic calendar tables based on your data (for custom calendars, it even provides the option to insert the "periodicity" columns for Rob Collie's GFITW DAX pattern!)

The QueryMonkey provides a Dynamic Calendar generator

DestinationSleuth

Today, this is simply a viewer to visually indicate the load destinations of your tables (better than just “Connection Only” or “x Rows Loaded”).

The DestinationSleuth user form displays four different load destination types

QuerySleuth

This is a single form, packed with information and features such as:

  • A dependency/precedent tree view layout
  • Full colour display based on load destination
  • Colourful and indented M code
  • The ability to modify the M code and write it back to the editor WITHOUT LOCKING YOUR EXCEL User Interface!

The QuerySleuth shows a query dependency tree as well as indented and colourful M code

TimeSleuth

This feature allows you to time query execution in Excel, and even chart comparisons between them with or without privacy settings enabled.  If you’ve ever wondered which query is slowing down your workbook, or wanted to time test two different approaches, you may find this helpful!

A chart generated by Monkey Tools TimeSleuth user form

PivotSleuth

Have you ever seen that irritating “relationships may be needed” error when building a Power Pivot based Pivot Table, and wondered why?  Pivot Sleuth can tell you…

  • See the real, fully qualified names of the fields used in your Pivot Tables
  • Highlight potential or current issues in Pivot Table configurations
  • Debug cross filtering issues, “relationships may be needed” errors and errors where grand totals are returned for all rows on the Pivot Table

Debugging PivotTable errors with the PivotSleuth

DAXSleuth

We believe that measure dependencies are just as important as query dependencies, and this is the reason we build the DAXSleuth.  This form:

  • Displays a dependency/precedent treeview of your DAX measures
  • Provides a full colour display of Implicit and Explicit measures (with or without children), as well as Calculated Columns
  • Shows your DAX measures with colour highlighting in an indented format
  • Allows you to Indent, Un-Indent, Flatten, Duplicate and even Update measures without leaving the DAXSleuth
  • Exposes all locations a DAX Measure has been used (Pivot Tables, Pivot Charts, OLAP Formulae and Named Ranges), and even allows you to select those objects right from the DAX Sleuth!

Monkey Tools DAXSleuth user form in action

ModelSleuth

Have you ever had to provide documentation for your model?  Or picked up a model from someone else and had to review it?  The ModelSleuth provides reports and utilities such as:

  • A full model summary report showing key statistics about your tables, relationships, columns, measures and queries. (Trial and Free licenses are limited to every other record in this report.)
  • A model memory usage report, complete with how much memory is recoverable (for Excel based data models).
  • An unused columns report (for Excel based data models).
  • A DMV Explorer (for those who want to build their own reports).

Showing the impact of unused columns on memory via Monkey Tools ModelSleuth feature

Monkey Tools Supported File Types

The Monkey Tools add-in is compatible with Excel 2016 or higher, and can read from:

  • Excel files
  • Power BI Desktop files
  • Backup files (that you can export from the Monkey Tools software)

Will Monkey Tools get updates?

Oh yes, we have plans for many more features!

Our intended model is to deliver features (and bug fixes) as we develop them.  That means that there could be periods with no updates as we work on something big, or periods with multiple updates delivered in a single week.  We know that some people love frequent updates and some people don’t, so we let you control how often you get them:

Monkey Tools allows you to control update frequency

The key thing to recognize here is that we are not holding new features for a vNext. They’ll be delivered when they’re ready.

Can I try Monkey Tools before I buy it?

Ken did not become or remain a Microsoft MVP without contributing a large portion of tools and help to the community for free, and that won’t change.  Having said that, we’re paying a developer to work on this product full time and need to recoup those costs.  For that reason, we will always have both a Free version, as well as a Pro version.

Naturally, we want you to look at it, as we're convinced you'll like it.  And that's why we have a two-week trial that provides full access to almost all of the full feature set.  Once your trial expires, your license will automatically revert to a free license.  You’ll still get fixes and new features, they’ll just render in free mode (without colour, without field advice, etc.).  We do believe that you’ll still find the tool useful, just maybe not as useful without a Pro license.

Ready to learn more about pricing options and download the free trial?  Click here!

Unpivot Stacked Sets with Inconsistent Rows

I'm currently hanging out in New Zealand, with a friend who has generously let me stay at his place instead of a hotel.  What I didn't know when he offered a bed though, was that the cost of admission was a solution for a gnarly Power Query issue he was facing: How to unpivot stacked sets with inconsistent rows.

The data Jeff provided me with looked similar to this:

3 sets of data with products on rows and dates on columns

If it were only the first two tables that we were facing, this wouldn't be too difficult.  We cover unpivoting stacked data sets in both our Power Query Academy and our Power Query Recipes, whether they are single or multi column.  But the killer here is the third table... it has more rows than the first two.  So the question becomes how do we unpivot stacked sets with inconsistent rows?

Preparing the Data

Obviously the first piece we need to do is to get the data into Power Query and remove the irrelevant rows and columns.  The steps I went through were these:

  • Pull the data in to Power Query
  • Filter Column1 to remove null values
  • Remove the Total column

Once done, my data looked like this:

Stacked Pivoted Data Sets with 3, 3 and 4 rows

So now the data is nice and clean, but how to you unpivot it?

Separating the Data

The first trick is to find a way to separate this data into individual tables that represent each of the stacked data sets.  The good news here is that there is an indicator that we are looking at a new table.  Every time we see "Products" in Column 1, we know that's a header row and a new table will begin.  So we'll use this to separate the data into blocks, starting a new block each time we see that term.  To do this:

  • Go to Add Column --> Index Column --> From 1
  • Go to Add Column --> Conditional Column and configure it as follows:
    • Name:  Set
    • Formula:  if [Column1] = Products then [Index] else null

Shown below is the image view of the Conditional Column, as well as the results that it will create:

Building a Set column returning the Column1 if Column1 equals Products or null

As you can see, we've pulled out the number from the Index column if - and only if - the value in the first column is "Products".  The reason we want the null is that we can then:

  • Right click the [Set] column --> Fill Down
  • Select the [Index] column --> press DEL

You're now left with a nice table where the Set column shows a unique value for each data group:

Stacked Data with a "Set" column showing a unique value for each set

Grouping Into Tables

With an indicator for each group of data, we can now leverage this to separate the data into the individual data sets.  The method to do this is Grouping.

Select the Set column and then:

  • Go to Transform --> Group By and configure as follows:
    • Group by: Set
    • New Column Name: Stage1
    • Operation: All Rows

Grouping the Set column and adding an aggregation called Stage1 for All Rows

The data will then be grouped by the values in the Set column, and show the original data that was used to generate those groups.  Clicking in the whitespace beside the Table keyword will show each of these rows that were used in the grouping for that data point:

Results of the Grouped table, shown by clicking in the whitespace next to a group

Cleaning up the Grouped Tables

The challenge we have here is that we want to unpivot the data, but we've got some extra data here that will pollute the set: the values in the "Set" column which were added to allow the grouping.  We need to remove that.  To do so:

  • Go to Add Column --> Custom Column and configure it as follows:
    • Name:  Stage2
    • Formula: =Table.RemoveColumns( [Stage1], "Set"

Compare the results to that of the Stage1 column:

The Stage2 data table looks like the Stage1 data table, except the Set column has been removed

Before we can unpivot data, we need to promote that first row to headers... but we need to do it for each column.  No problem, we'll just break out another custom column:

  • Go to Add Column --> Custom Column and configure it as follows:
    • Name:  Stage3
    • Formula: =Table.PromoteHeaders( [Stage2], [PromoteAllScalars=true] )

Wait... what?  How do you figure that out?  I cheated.  I grabbed another table, promoted headers, then looked in the formula bar to figure out the syntax I needed.  The function name and table name were pretty obvious but unfortunately - even with intellisense - that final PromoteAllScalars part doesn't auto-complete.  Even worse, if you don't included it, it essentially just eats the top one row.  Once I had it correct, the results are exactly what I needed:

The Stage3 table now shows the headers promoted

As you can see in the image below, the Stage 3 table contains columns that have headers, as we wanted.  The 3rd table (carrying the identifier of Set 9), shows four rows, while the other tables show 3 rows.  So the data is now separated into tables, but they still have an inconsistent number of rows.

The Set1 group has 3 rows, and Set9 has 4 rows

Unpivot the Data

We have done everything we need to do in order Unpivot Stacked Sets with Inconsistent Rows.  We now just need to unpivot the data.  So let's do it:

  • Go to Add Column --> Custom Column and configure it as follows:
    • Name:  Stage4
    • Formula: =Table.UnpivotOtherColumns( [Stage3], {"Products"}, "Date", "Units" )

An indented version of the formula, as well as the results it produces, is shown here:

Displaying the Unpivot formula and the results for Set1

How do you learn to write this?  Click on one of tables to drill in to it, unpivot the single table, copy the code from the formula bar, then delete the temporary steps to back up.  You may need to do some tweaking, of course, but at least you can easily get the syntax that way.

Now that we have this, we can finish extracting the data:

  • Right click the Stage4 column --> Remove Other Columns
  • Click the Expand icon at the top of the Stage4 column
  • Set the data types
  • Load it to your destination

Sample File

If you'd like to download the sample file, you can do so here.

 

Unfill in Power Query

Recently I received a question on how to Unfill in Power Query.  In other words, we want the opposite of the Fill feature, which fills data into blank cells (cells that contain a null value.)  If we see repeating values, we’d like to keep only the first, then replace all subsequent duplicate values with the null keyword.

Now I’ll be honest that I’d typically never do this.  I’d load the values into a table, then use a PivotTable to show the data the way I want to see it:

A table with repeating values, and a pivottable that suppresses repeating values

But having said this, if you need to have your data look like this…

A table of Animals, Colour and Amount that shows blanks under each repeating animal

… well then why not?

Unfill data with Power Query – Step 1

The first thing we need to do is run our recipe for numbering grouped rows.  (You can find this in our Power Query Recipe Cards, or in our Power Query Academy videos.)

Namely, it looks like this:

  • Sort the data by Animal to order it
  • Group the data by Animal
    • Add a single aggregation called “Data” for All Rows
  • Go to Add Column -> Custom Column and use the following formula
    • =Table.AddIndexColumn([Data],"Row",1,1)
  • Right click the “Custom” column -> Remove Other Columns
  • Expand all columns from the Custom Column

You’ve now got your rows numbered:

A Power Query showing Animal, Amount, Colour and a Row Number where each row with the same animal has a unique value starting from one

Unfill data with Power Query – Step 2

Once you’re in this state, it actually becomes pretty easy:

  • Go to Add Column -> Custom Column and use the following formula
    • = if [Row] = 1 then [Animal] else null
  • Remove the [Animal] column and the [Row] columns
  • Re-order the columns as desired
  • Rename [Custom] to Animal
  • Set the data types

Once done, you’ll notice that we have unfilled the data nicely.

A Power Query showing Animal, Colour and Amount, but only the first instance of a Animal is shown in the Animal column with duplicates showing as null

Final Thoughts

As I mentioned at the outset, this isn’t something I ever anticipate myself doing.  But if you do have a good business use case, I’d be curious to know what it is.  (I assume the asker did – although it came from a comment on an old blog post, so haven’t been able to ask.)  Please share in the comments. ?

Using Script Lab to Create a PivotTable with JavaScript

"Microsoft Excel is not a spreadsheet application anymore, it’s a medium to store, present, and analyze data." - Puneet Gogia

The way Excel has evolved in the past few years, it’s almost impossible to work without it in this present world.

Apart from all the stuff you can learn to do in Excel, the thing which I love most is that I can code in it.

Once you know how to use VBA, you can create macros and automate all the activities which you do manually otherwise.

Today, I want to talk about something else. Something powerful! Let’s talk about using JavaScript (TYPESCRIPT) in Excel.

Before we get into this make sure to subscribe to Excelguru's Newsletter to get tips like this in your mailbox.

Script Lab

There’s a whole bunch of APIs which you can use and code with in your Office applications.

But, with Script Lab you can directly write and test your code into your applications. It’s an add-in which you can install from Microsoft’s App Store.

The Script Lab Ribbon

In Script Lab, there are three different sections where you can write JavaScript, HTML, and CSS code.

The Coding Window

Create a PivotTable in Excel using Javascript

The PivotTable is one of the most useful things which you can use in Excel to summarize and analyze your data.

Even though I like using VBA to create a PivotTable, writing code in JavaScript instead is real fun. In this post I’d like to share with you how to do this.

I’ve split it into a 5-steps process and you can download the files from here to follow along:

  1. Insert Worksheets and Data
  2. Enter a Blank PivotTable
  3. Insert a Row
  4. Insert a Column
  5. Add Values to the Pivot Table

1. Insert Worksheets and Data

The very first thing which you need to create a PivotTable is the source data. So, first of all, you need to write the code which adds source data to the worksheet.

Below is the code:

async function setup() {
await Excel.run(async (context) => {
context.workbook.worksheets.getItemOrNullObject("Data Sheet").delete();
const dataSheet = context.workbook.worksheets.add("Data Sheet");
context.workbook.worksheets.getItemOrNullObject("Pivot Table").delete();
const pivotSheet = context.workbook.worksheets.add("Pivot Table");
const data = [
["Farm", "Type", "Classification", "Crates Sold at Farm", "Crates Sold Wholesale"],
["A Farms", "Lime", "Organic", 300, 2000],
["A Farms", "Lemon", "Organic", 250, 1800],
["A Farms", "Orange", "Organic", 200, 2200],
["B Farms", "Lime", "Conventional", 80, 1000],
["B Farms", "Lemon", "Conventional", 75, 1230],
["B Farms", "Orange", "Conventional", 25, 800],
["B Farms", "Orange", "Organic", 20, 500],
["B Farms", "Lemon", "Organic", 10, 770],
["B Farms", "Kiwi", "Conventional", 30, 300],
["B Farms", "Lime", "Organic", 50, 400],
["C Farms", "Apple", "Organic", 275, 220],
["C Farms", "Kiwi", "Organic", 200, 120],
["D Farms", "Apple", "Conventional", 100, 3000],
["D Farms", "Apple", "Organic", 80, 2800],
["E Farms", "Lime", "Conventional", 160, 2700],
["E Farms", "Orange", "Conventional", 180, 2000],
["E Farms", "Apple", "Conventional", 245, 2200],
["E Farms", "Kiwi", "Conventional", 200, 1500],
["F Farms", "Kiwi", "Organic", 100, 150],
["F Farms", "Lemon", "Conventional", 150, 270]
];
const range = dataSheet.getRange("A1:E21");
range.values = data;
range.format.autofitColumns();
pivotSheet.activate();
await context.sync();
});
}

In this code, you have a function called “setup” which does the following things:

  • It will first check if “Data Sheet” and “Pivot Table” worksheets already exist, and if they do exist then it will delete them and insert new.
  • Next, in the “Data Sheet” worksheet it will add a set of data that will be the source data for the PivotTable.

When you run this code you’ll have something like this in your workbook:

Sample Data Set

2. Enter a Blank PivotTable

When you normally build a PivotTable in Excel, it first inserts a blank PivotTable and then you have to add a row, a column, and values manually.

So now you need a code that inserts a blank PivotTable in the “Pivot Table” worksheet. (You have already written the code in the previous section to insert the “Pivot Table” worksheet.)

Below is the code for this:

async function createWithNames() {
await Excel.run(async (context) => {
const rangeToAnalyze = context.workbook.worksheets.getItem("Data Sheet").getRange("A1:E21");
const rangeToPlacePivot = context.workbook.worksheets.getItem("Pivot Table").getRange("A2");
context.workbook.worksheets.getItem("Pivot Table").pivotTables.add("Farm Sales", rangeToAnalyze, rangeToPlacePivot);
await context.sync();
});
}

Now in this part of the code, we have a function called “createWithNames” which works as follows:

  • First of all, it takes range A1:E21 from the “Data Sheet” worksheet where you have the source data.
  • After that, it specifies the cell A2 in the “Pivot Table” worksheet for inserting the blank PivotTable.
  • Finally, it inserts a blank PivotTable.

After running this part of code you’ll have something like this in your “Pivot Table” worksheet:

Blank PivotTable

3. Insert Row Label

Once you insert a blank PivotTable, the next thing is to insert rows in it. Below is the code which can do this for you:

async function addRow() {
await Excel.run(async (context) => {
const pivotTable = context.workbook.worksheets.getActiveWorksheet().pivotTables.getItem("Farm Sales");
// check if the PivotTable already has rows
const farmRow = pivotTable.rowHierarchies.getItemOrNullObject("Farm");
const typeRow = pivotTable.rowHierarchies.getItemOrNullObject("Type");
const classificationRow = pivotTable.rowHierarchies.getItemOrNullObject("Classification");
pivotTable.rowHierarchies.load();
await context.sync();
if (typeRow.isNullObject) {
pivotTable.rowHierarchies.add(pivotTable.hierarchies.getItem("Type"));
} else if (farmRow.isNullObject) {
pivotTable.rowHierarchies.add(pivotTable.hierarchies.getItem("Farm"));
} else if (classificationRow.isNullObject) {
pivotTable.rowHierarchies.add(pivotTable.hierarchies.getItem("Classification"));
}
await context.sync();
});
}

Now in this part of the code, we have a function called “AddRows” which works as follows:

  • First of all, it checks if there are any rows already in the PivotTable (the first three columns of the source data would be matched one by one).
  • After that (as you are considering the first three columns), it will add them as rows one by one.

As we are using an HTML button here, you can use that button to add all the three rows one by one with it. At this point, you have you’ll have a PivotTable with rows like this:

PivotTable with rows added

4. Insert Column Labels

The next thing is to insert columns, and below is the code which you need to use:

async function toggleColumn() {
await Excel.run(async (context) => {
const pivotTable = context.workbook.worksheets.getActiveWorksheet().pivotTables.getItem("Farm Sales");
// check if the PivotTable already has a column
const column = pivotTable.columnHierarchies.getItemOrNullObject("Classification");
column.load("id");
await context.sync();
if (column.isNullObject) {
// adding the farm column to the column hierarchy automatically removes it from the row hierarchy
pivotTable.columnHierarchies.add(pivotTable.hierarchies.getItem("Classification"));
} else {
pivotTable.columnHierarchies.remove(column);
}
await context.sync();
});
}

In this code, you have referred only to the “Classification” column and it works in the following manner:

  • First of all, it checks if the column “Classification” is already there on not.
  • If the column is not already there, then it adds it to the column label.

As we are using an HTML button here, you can add and remove the column label with this code.

At this point, you’ll have a PivotTable with rows and columns like this:

PivotTable with rows and columns added

5. Add Data Values

The key thing in a PivotTable is the values, and with the below code you can add these in:

async function addValues() {
await Excel.run(async (context) => {
const pivotTable = context.workbook.worksheets.getActiveWorksheet().pivotTables.getItem("Farm Sales");
pivotTable.dataHierarchies.add(pivotTable.hierarchies.getItem("Crates Sold at Farm"));
await context.sync();
});
}
/** Default helper for invoking an action and handling errors. */
async function tryCatch(callback) {
try {
await callback();
} catch (error) {
// Note: In a production add-in, you'd want to notify the user through your add-in's UI.
console.error(error);
}
}

In the source data, you have two different columns where you have values:

  • Crates Sold at Farm
  • Crates Sold Wholesale

But here in this code, you have only one column to use as values: the "Crates Sold at Farm". However, you can change it or even use both of the columns as well.

And your completed PivotTable is here:

Completed PivotTable

How to Add this Code into Script Lab

As I have already mentioned, you can also use HTML and CSS in the Script Lab to create a side-pane where you can run and test your code.

Note: In the code, stored on GitHub, I have added a small bunch of code lines to handle errors, here is the download link.

Make sure to install the Script Lab add-on and then follow the below steps to enter this code into the Script Lab:

  • First of all, click on the Code button on the Script Lab tab to open the Code Editor.

Click on Code in the Script Lab tab to open the Code Editor

  • After that, click on the Menu button and select New Snippet.

Adding a New Snippet in the Script Lab Code Editor

  • Next, you need to add JavaScript, HTML, and CSS into the Script (1), HTML (2), and CSS (3) tabs respectively.

Tabs for adding your JavaScript, HTML, and CSS codes in the Code Editor

  • From here, you need to run the code. For this, you need to click on the Run button from the Script Lab tab.

Clicking the Run button in the Script Lab tab will run the code

Once you click on the Run button, the side pane appears, just like below:

The side pane shows the code being run

Because  you have built your code in five parts, you will have five different buttons that execute those codes and create a PivotTable.

You just need to click on the button to execute each step.

Isn’t it COOL?

Make sure to share this post with your friends, I am sure they will appreciate it.

Quick Thing: Apart from learning about and using JavaScript in Excel, there's one more thing which I'm curious to learn these days, and that's Power Query's M language. Don't forget to check out M is for Data Monkey.


About the Author: Puneet Gogia

Puneet is the Co-Founder at ExcelChamps and has been using Excel since his college days. He's helped thousands of people to understand the power of spreadsheets and learn Excel and VBA. You can find him online, tweeting about Excel, on a running track, or sometimes hiking up a mountain.

Column Quality and Distribution Features are Finally Here

If you've worked with Power Query in Power BI, you've seen Column Quality and Distribution features already.  But I'm super exited to say that I've finally received it in the Insider build of Excel!

What are the Column Quality and Distribution Features??

If you're not familiar with these, let's break them down a bit:

Data Quality Indicators in Column Headers

In its most simplistic form, you'll see that you now get an indicator in the header of each column as shown here:

Column quality in Power Query Headers

Notice that red bar at the top of column C?  That's happening because there is an error in the column!  This is SUPER awesome, as we've been teaching for years that you have to scroll down looking for errors each time you change from text to a numeric data type.  This should really help, as the column header will make it obvious that something went wrong.  All green?  You should be good to go.  See red?  You've got potential problems to deal with.  (To be fair, you may still get errors after loading, as it is working with a data preview, but still, this will be a BIG help.)

The nice thing about this is that it is always on.  Nothing needs to be set up to enable this by default.  (And so far as I know, you can't turn it off - but then why would you want to?)

The Column Quality Feature

This one is enabled by going to the View tab and choosing "Column Quality", resulting in a view that looks like this:

Power Query Column Preview Settings

Which adds 3 new lines to the top of our columns:

Column Quality in Power Query

Now, I don't plan on leaving this on all the time. However, the cool thing here is that I can very quickly identify that 20% of my data rows in column C are errors.  And while I don't have any empty rows, it would show me that at a glance if I did.

The Column Distribution Feature

The idea behind the Column Distribution feature is to give you a quick glance at the distribution of values in your columns.  Is there a single value that shows up more than others, or is the data uniformly distributed?  This can help you quickly identify if the data has any consistency to it or not:

Column Distribution in Power Query

Notice that in Column a, we have 3 distinct values (b, d, e), but only one value that is truly unique (b only occurs once in the entire data set).  Meanwhile, in column b, each value occurs in the data set once and once only, with no values appearing more than this.  On a data set this small, this isn't going to be super insightful. But say you are pulling in data that spans your monthly summaries for the last year. If suddenly you see 13 values where you are expecting 12... that's a good indicator you might want to look a little deeper.

Something else that is worth noting is that you don't have to keep the Column Quality on to get the Column Distribution chart.  These features can be used together or individually.

The less obvious Column Profile Feature

The final checkbox in the Data Preview menu is the Column Profile feature.  This one is very cool as it only activates when you select the entire column.  And when you do so, it takes over the preview window providing most of the information from the previous two settings.

Column Profile in Power Query

We've got even more stats (although I wish it also showed the % of values like in Column Quality), as well as the value distribution chart.

What I love about this is that it can be called up on-demand by selecting the column, but doesn't get in my way when I don't need it.

My Thoughts on this Feature

I've been waiting for this to hit Excel for over a year now, and am SUPER happy to see it finally show up.  Personally, I'll be running with Column Profile (the last option) selected, but not Column Quality or Column Distribution. The reason is pretty simple... I want to minimize the amount of info between me and my data.  The Column Profile feature will give me what I need in an on-demand fashion when I see the data colours change at the top of my column. 🙂

How do I Get These Features?

This feature is so hot and fresh that it showed for Office Insiders this morning (v1910 build 12112.x.)  So you need to get to that version as a minimum to get these features.  Hopefully it will start rolling through the regular update channels next month!

 

Power Query Challenge 7 – Phone Words

Yesterday, Nick (one of our forum users) posted a Phone Words challenge in our forum, which looks like a great candidate for Power Query Challenge #7.  I haven't had time to tackle it myself, but if you're up for a Friday challenge, why not give it a go?

1-800-Get-Data converts to 1-800-438-3282

Here's the Challenge (word for word):

Take any Phone # and convert it to all possible Letters aka Phone Words.
Phone # can be input any format (strip out all other characters)

Example:

536-7857 = Ken Puls, etc...

Here's the thread where you can post your solution.

Thanks Nick!

PS, if anyone else has challenges they'd like to post, I'm more than happy to set them up in this series.  Obviously it's been a bit quiet here lately, for reasons that we hope to announce soon!

Do Data Types Matter in Power Query?

One of the things that I find a bit misunderstood in Power Query is whether or not Data Types matter. I mean, I’m sure everyone agrees to some point that they do, but you do know just how much they matter, and why?

Over the weekend, I received an email from one of our loyal students which read, in part:

I am going under the assumption that in Power BI nothing is free. Steps, calculated columns, measures and so on could be cheap but they are never free. It is with this premise in mind that I pose the following theory.

Users should sparingly use Change Type in Power Query. It is fine to Change Type when converting Date/Time to Date, Decimal Number to Whole Number, and others that actually change the value. It is a waste of resources to Change Type from Decimal or Whole Number to Currency. Even if you want the column to look like Currency, you can apply that format in the Data Model view and save one Power Query step.

On the face, this theory seems somewhat reasonable. After all, adding additional steps is bound to add some overhead to the Power Query process in most cases. And let’s be honest, in small models, it may make no difference to you at all. But when things get bigger…

Data Types vs Formats

To understand what Data Types matter, we need to get something very clear right off the bat: Data Types and Formats are not the same thing. Data Types dictate what kind of data you have and determine how much memory is allocated to store a value. Formatting, on the other hand, tell you how you want the values to appear. To see this in practice, have a look at the following values in Power Query, where the Data Type has been set to Currency:

Column set to Currency Data Type

Notice that they only place you see a $ sign is in the header. And see how the decimal numbers do not line up? I can tell you from many years of teaching accountants, that this drives them bonkers. That 1.7 needs to be 1.70! But you don’t do this here, you do that in the Excel worksheet, Power Pivot model or Power BI visual.

They key to remember here:

  • In Power Query, you define Data TYPES
  • In Power Pivot, Excel or Power BI, you define Data FORMATTING

Excel’s Data Types

In the classic days of Excel, we only had four data types that we had to be worried about. Those were:

  • Text,
  • Numbers,
  • Blanks, and
  • Errors

(Although represented by textual patterns like #REF!, #N/A, #NAME?, and the like, they actually count as a different data type.) Today it gets a bit more complicated with Stock and Geography data types, as I blogged about here, but anything else was just a number that was formatted with a number formatting string. Examples of these include:

  • Date: format it as a number, and you’ll see the number of days since Jan 1, 1900)
  • Time: a little trickier, but this is just a fraction of a day where .25 is 6:00 AM, .5 is 12:00 PM and .75 is 6:00 PM. And if all you have is the time, and you format it as a DateTime, you get that time on Jan 1, 1900.
  • True/False (Boolean): while this shows in the cell as TRUE/FALSE, which looks like text, if you multiply them by 1 you’ll see that TRUE equates to 1, where FALSE equates to 0.

But that was Excel, where data types and number formats where the same thing. That’s not Power Query.

Power Query’s Data Types

Unlike Excel, which has a pretty short list of data types, Power Query seemingly has a ton:

  • Numeric Data Types:
    • Decimal Number
    • Currency (Fixed Decimal)
    • Whole Number
    • Percentage
  • DateTime Data Types:
    • Date/Time
    • Date
    • Time
    • Date/Time/Zone
    • Duration
  • True/False (Boolean)
  • Text
  • Others (Binary, Tables, Records, Lists, and more)
  • Any (the dangerous “undefined” type which allows the application to determine the correct data type)

The key to recognize, is that each of these data types is DISTINCT, meaning that each of these is different in some way from every other. (While we won’t get into it in this post, unlike Excel which implicitly converts data from one type to another, Power Query requires explicit type conversion via use of functions like Number.From() or Date.ToText(), which can be frustrating at times!)

For the purpose of this post, however, I want to focus on the first three numeric types: Decimal Number, Currency and Whole Number, and ask the question: Does the Data Type matter?

Illustration Background

Let’s have a look at an example. For the illustration, I set up a Power Query chain that looks like this:

Viewing our Query Chain

The Data table had 100,000 randomly generated [Sales] records that vary between 1.27317262341058 and 100000.017761279, and randomly generated [Customers] that vary between 1 and 1000.

The only change I made in the Whole, Currency and Decimal types was to set the Data Type for the [Sales] column accordingly. In the Customers table, I removed the [Sales] column and duplicates from the Customers column.

I then created 3 quick measures to sum the column up, and dropped them on a Pivot:

Summing up our 3 measures

I don’t think it would surprise anyone here that the [Whole Sum] is slightly different than the [Currency Sum] and [Decimal Sum]. After all, the numbers were rounded at the source before being added up. And so far, we can see that the [Currency Sum] and [Decimal Sum] look pretty much the same. At least until we expand them a bit:

Decimal differences between Currency and Decimal Data Types

The only thing that should be a surprise here is that currency only holds up to four decimals, not two as most people expect. This actually makes sense when you start thinking about foreign exchange transactions, and how they are always carried to four decimal places.

But is that it? Is a Data Type just about rounding? As it turns out the answer to that is both yes and no.

Testing the Model Memory

The next step here was to test the model memory and see how Power Pivot is storing the data. To that end, here’s a little table that shows exactly that:

Table illustrating how Power Pivot is storing the data

Before we get into this, I want to call out something important here. The Data Type that is showing has been read from Power Pivot. Notice that Whole Number shows as such, consistent with Power Query. But the Currency and Decimal tables both show Decimal. As it turns out, Power Pivot doesn’t make a distinction between these two data types. However, the distinction between these two Data Types matters to you anyway, as I’ll explain.

So, what does that all mean?

In the Whole Number table, I had rounded off all the decimals. This left 63,815 unique values. And because Power Pivot compresses based on unique values, it deals with this column very well, resulting in a total of 260.54 KB to store these values.

In the Currency table, I effectively rounded all the values off to four decimal places. This left 99,996 unique values in the table (only 4 values were repeats). Despite this, Power Pivot did a good job of compressing the values in memory, resulting in 390.75 KB to store them.

Then we get to the Decimal column. There are only 4 more unique values than in the Currency column, but the memory takes a colossal 5,234.47 KB to store the values, vs the 390.75 KB of Currency. What the heck is going on?

The answer lies in the fact that Power Pivot has to carry all of those decimal places, and once it does, it can flip to storing data using Hash Encoded memory. That’s a deep topic for another time but suffice it to say that this is a bad thing, as Value encoding is much more efficient. (Hash is generally used for Text, and Values for… you guessed it… values!)

Interestingly, if you round the Decimal Number to 5 decimals you end up with 99,999 unique values in the column and a very minor change to the memory used. But if you round it to 4 decimals, the memory of the Decimals column compresses the same as Currency, and the memory need drops to the same 390.75 KB.

Table illustrating memory requirements of the different Data Types

Why Model Memory Matters

And this is the final part of the equation to me. Not all time is created equal. My users will accept a 2-minute refresh of the data model. They might say it takes time, but +/-30 seconds in a refresh isn’t anything that they’ll get overly concerned with. They’ll click Refresh, go grab a coffee, then come back to work with the file.

But if they click a slicer and it takes 10 seconds to redraw the Pivot Table or Power BI visuals? Watch out! They’ll claim the model is too slow, ineffective, wrong and useless. I’d MUCH rather push resource consumption into the initial refresh in order to build a memory-efficient model that performs well when being used for analysis.

What, wait happened to the Data Types?

To be fair, Power Pivot’s compression mechanism is more about unique values and the length of precision than it is about Data Types. But it’s up to you to choose the correct Data Type to future-proof your model and make sure that the compression algorithms can be applied.

But due to the way Excel has always worked, the way the Data Types are named, and the fact that most modellers don’t have a clear understanding of Formatting vs Data Types… users are more likely to pick Decimal over Currency. I mean, why would I ever format my units as Currency? (In Power BI this is called a Fixed Decimal, although it still shows with the currency symbol.)

We need to recognize that lagging decimals really do happen in business. Let’s look at that Units column for a second. Naturally we never sell a partial unit… or do we? I’ll give you 3 examples of things I’ve seen in the Food and Beverage industry that forever changed my opinion on this:

  1. We had a couple of customers who insisted that they be able to purchase half a muffin. I’m not even kidding here. The killer for me was that it only showed up every 10,000 rows of transactions or so, meaning that the column often got set to Whole Number incorrectly.
  2. The salesperson agrees to provide those 300 units for $5,000. No big deal except that they monkey the price field in your database to make it work and you suddenly you have a sales price of $16.666667. Fortunately, this one usually gets rounded via setting it to a Currency Data Type, as that’s logical. But what if you set it to decimal or left it undefined?
  3. Things go the other way and the customer agrees to a package price of $5,000 for something that usually carries a price of 17.00 per unit. It gets rung into the system with a reverse engineered quantity of 294.1176470588235 to get it exactly right.

The last is the real kicker as the memory suddenly flips from Value to Hash, the RAM needed to refresh the data expands exponentially and brings your model to its knees. And now every slicer click has gone from 1 second to update your visuals to the better part of 10 seconds. And your audience is screaming that the model is “unusable”.

My Recommended Practice

The final step in every query I build which loads to a destination (an Excel Table or the Data Model) is to correctly define my Data Types. This does two things for me:

  1. It ensures that I never accidentally load a column with an undefined Data Type. (In the case of Dates, they load to Excel as numbers, and to the Data Model as text!)
  2. It ensures that I’m cutting down to the minimum number of decimals I’ll ever need for my analysis.

Does it take more processing time? Maybe marginally. But does it future-proof my solution to keep it performant? Definitely. And it ensures the time to do so happens in the refresh, not in when the model is being used for analysis.

Some Final Thoughts

The issues you saw here with memory also affect DateTimes in a big way, as they are quite similar to decimal numbers, where time is represented as the fraction of a day.

The article touches on some things from a much bigger topic: how to optimize a Power Pivot/Power BI data model. I actually teach a full day course on this topic for CPABC, where we cover how the Vertipaq engine compresses its data, why you want to avoid calculated columns and fall in love with measures. We make extensive use of Power Query in this course to reshape tables (applying correct data types along the way) so that they can be consumed by the data model in a more efficient manger. We dissect a poorly built model, test its memory and then rebuild it in stages seeing how our changes made an impact.

And if you’re looking for tools to help with this process… stay tuned to this blog. I’m hoping to have an announcement on that front within the next couple of months.

Solutions for Power Query Challenge 6

This morning I logged in to check the solutions for Power Query Challenge 6 that were submitted and... Wow!  There were a bunch, and some cool variety there.  So now it's time to show you all what I came up with here.

What was Power Query Challenge 6?

The full description and source data can be found in yesterday's post, but in short it was to convert this:

Data table with nested data sets

Data table with multiple data points per cell

To this:

Data in 1NF

Data shown in First Normal Form (1NF)

So how do we do it?

Two Solutions for Power Query Challenge 6

Wait, two solutions?  Why?

As it turns out, I put together two for this one. My first attempt was cooked up to solve the issue on short notice.  Then I built another that seems a bit more elegant.  So I'll show them both, although quickly.

Solution 1 - Splitting Individual Columns and Merging Back Together

The first of my solutions for Power Query Challenge 6 is actually quite similar to what Ali posted in the solution thread.  It basically follows this method:

  • Step 1:
    • Create a Data query that connects to the source data, and load it as connection only
  • Step 2:
    • Create 3 new queries for ItemID, Quantity and Price which
      • Reference the data query
      • Keep the InvoiceID column and the other relevant column
      • Split the relevant column by delimiter, ensuring it splits to rows (not columns as it defaults to)
      • Add an Index column
  • Step 3:
    • Reference one of the Step 2 tables, and merge the other two tables to it, based on matching the Index column in each

So when complete the query chain looks like this:

And returns the table we're after:

The only real trick to this one is that - when you are building the Price query - the Price column will pick the decimal as the delimiter, so you have to force it to a line feed.  So building the Price query would go through the following steps:

  • Right click the Data query --> Reference
  • Select the InvoiceID and Price columns --> Right click --> Remove Other Columns
  • Right click the Price column --> Split column --> By Delimiter
    • Clear the decimal from the Custom area
    • Click the arrow to open the Advanced area
    • Change the selection to split to Rows
    • Check "Split using special characters"
    • Choose to insert a Line Feed character
    • Click OK
  • Set the Data Types
  • Go to Add Column --> Add Index Columns

Resulting in this:

The ItemID and Quantity queries follow the same steps, except that Power Query now correctly identifies the Line Feed as the character to split on.

Solution 2 - Group and Split

While the first solution to Power Query Challenge 6 worked, it left me less than satisfied as it took a bunch of queries.  While effective, it didn't feel elegant.  So I cooked up another solution that uses Grouping.  It's actually quite similar to the first solution that Bill Szysz posted.

The basic method is as follows:

  • Connect to the data
  • Right click the InvoiceID column --> UnPivot Other Columns
  • Right click the Value column --> Split Column --> By Delimiter --> OK

Following the steps above gets us to this state:

To unwind this, we group it:

  • Go to Transform --> Group By
    • Group By InvoiceID, Attribute
    • Aggregate a "Data" column using the All Rows operation

Grouping using the All Rows feature

At this point, we need to number these rows, so I just busted out the pattern to do that from our Power Query Recipe cards (recipe 50.125).

  • Go to Add Column --> Custom
    • Column Name:  Custom
    • Formula:  =Table.AddIndexColumn( [Data], "Row", 1, 1)
  • Right click the Custom column --> Remove Other Columns
  • Expand all fields from the Custom column

Leaving us with this data:

Data Grouped with Numbered Rows

The next step is to Pivot it:

  • Select the Attribute column --> Transform --> Pivot Column
    • Choose Value for the Values
    • Expand the Advanced arrow
    • Change the Aggregation to "Don't Aggregate"
    • Click OK
  • Select the "Row" column and Remove it.  (Yes, it was needed to unpivot this correctly, but now adds no value.)
  • Set the data types
  • Load it to the desired destination

At this point, the query (again) looks perfect:

The desired output

Now, I must admit, this felt far more professional and left me feeling good about it.

Which Solution to Power Query Challenge 6 is Better?

Naturally, solution 2 is better.  It takes less queries, and looks way cooler.  Right?  Not so fast...

The real question is in the performance.  And for this one I thought I'd test it.  But I needed more data.  I expanded the set to 11,000 rows and then used a tool we're working on to time the refreshes.  Privacy was left on, and all times shown are in seconds:

  • Solution 1:  1.43, 1.48, 1.11, 1.27  Avg ~1.32 seconds
  • Solution 2:  2.77, 2.65, 2.63, 2.68  Avg ~2.68 seconds

I'll be honest, this surprised me.  So I went back and added the GroupKind.Local parameter into the Grouped Rows step, like this (as that often speeds things up):

Table.Group(#"Changed Type1", {"InvoiceID", "Attribute"}, {{"Data", each _, type table [InvoiceID=number, Attribute=text, Value=number]}}, GroupKind.Local)

The revised timing for Solution 2 now gave me this:

  • Solution 2A:  2.54, 2.49, 2.56, 2.61.  Avg ~2.55 seconds

So while the local grouping did have a small impact, the message became pretty clear here.  Splitting this into smaller chunks was actually way more efficient than building a more elegant "all in one" solution!

My solution (including 5,000 rows of the data), can be found in the solution thread here.

Power Query Challenge 6

Are you ready for Power Query Challenge 6?  In this challenge we'll look at splitting nested data sets into a nice table in First Normal Form.

Let's take a look at Power Query Challenge 6:

Where did the challenge come from?

The inspiration for this challenge came from last week's Self Service BI Boot Camp that we hosted in Vancouver, BC (you should have been there, it was awesome).  At the outset, we talked about how to identify if your data is in First Normal Form (the format that is ideal for a PivotTable), and I showed this data set:

Data table with nested data sets

Data table with multiple data points per cell

Notice how the invoice has multiple ItemID, multiple Quantity and multiple Price fields per cell?  That's not good at all.

What we Really Need - Data in First Normal Form

As I explained in the Boot Camp, it is essential that the data source be in First Normal Form in order for a PivotTable to consume it.  What does that mean?  It means that it needs to look like this:

Data in 1NF

Data shown in First Normal Form (1NF)

Notice that in this case the data is atomic - it only has one data point per cell.  In addition, there is now one complete record per row.  The InvoiceID shows on every row now, and each data point has been correctly split up and applied to them.

So what's the thrust of the Power Query Challenge 6?

Well, as it turns out this is a bit tricky.  There are a few issues at play here:

  • The data is separated by line feeds within the cells
  • There are a different number of line feeds in each row
  • At least the number of line feeds is consistent for each cell in the entire row though!

So the challenge is this: break the table apart so that the data is in First Normal Form like the second image.

You can download the data source file (and post your solutions) in our forum here.  I'll give you my version tomorrow.

One Solution to Power Query Challenge 5

Hopefully you’ve had a chance to try and build a solution to Power Query Challenge 5.  In this thread, I’ll show you how I approached it.

Challenge Goals

The challenge in this case was to allocate a fixed amount for a series of account IDs based on the data in this list:

Can you make a series from one to the other?

Can you make a series from one to the other?

Creating this…

Desired Power Query Output

Here is the data we need to create

And as a reminder, the rules were:

  • The first 7 digits won’t change for the series
  • If there is no “To” value, then we only need the single value
  • If the final character is text, it will never exceed Z. (i.e. we’ll never go from 10A to 11C)

And you can click here to get the original data file (as well as see solutions posted by myself and others).

I would also like to quickly restate that another of my main goals was to do as much using the UI as possible, and make certain that the students in my workshop could maintain this solution.  During the three day workshop (similar to my upcoming self service BI bootcamp), we did a full day of Power Query work, but mainly around reshaping data through the user interface.  The deepest we got into M code was creating a list using a structure like {1..10}.  So moving to crazy conditional logic and List.Generate solutions was out of the scope of where I wanted to take this for them. 😉

One solution to Power Query Challenge 5

I broke my solution down into four distinct queries as follows:

Listing of all queries used

Let’s look at each of those steps.

The Base Query

The point of this query was basically to carve up the “From” and “To” portions into the components that I would need in order to get my solution.  With the preface remaining the same for every ID in the list, it was all about separating the last 3 digits to go from this table:

Can you make a series from one to the other?

To this one:

Adding From and To columns

The steps I used were as follows:

  • Select the [From] column --> Add Column --> Extract --> First Characters --> 7
  • Name the new column “Preface”
  • Select the [From] column --> Add Column --> Extract --> Last Characters --> 3
  • Name the new column “From3”
  • Add a new conditional column called “To3” with the following formula:
    • =if [To] = null then [From3] else Text.End([To],3)
  • Remove the [From] and [To] columns
  • Set the data types for each column (notice [From3] and [To3] are text, not numbers)
  • Load the query as a Connection Only query

The Numeric Query

Next up, I needed to generate the series for the numeric series.  This was easy, as it works similar to the Power Query Calendar table recipe.  (Recipe card 60.105.x which you can find in our Power Query Recipe Cards)

The steps I used do this were:

  • Reference the Base query
  • Change the type of the [From3] column to Whole Number
  • Filter the [From3] column to remove errors
  • Added a new Custom Column with the following formula:
    • ={ [From3]..[To3] }
  • Removed the [From3] and [To3] columns
  • Expanded the new [Custom] column
  • Merged the [Preface] and [Custom] columns into a new “ID” column
  • Loaded this query as Connection only

This left me with the query output shown here:

Creating the Numeric Series

And the cool this is that using this technique, the final series – which only had a “From” ID and no “To” ID just creates the single item

The Alpha Query

Next up was the alpha query.  Once again, I referenced the Base query, meaning that I started from here:

Adding From and To columns

The steps I followed here were:

  • Duplicate the [From3] column
  • Change the data type on the [From3 – Copy] column to Whole Number
  • Filtered the [From3 – Copy] column to KEEP errors
  • Removed the [From3 – Copy] column
  • Split the [From3] column by Number of Characters, splitting at the left-most 2 characters
  • Split the [To3] column by Number of Characters, splitting at the left-most 2 characters
  • Added a new Custom Column with the following formula
    • = { [From3.2]..[To3.2] }

So at this point, I’m looking at this:

Creating a text based list

Continuing on, I…

  • Removed columns [From 3.2], [To 3.1] and [To 3.2]
  • Expanded the list from the [Custom] column
  • Merged the [Preface] , [From3.1] and [Custom] columns into a new “ID” column
  • Loaded this query as Connection only

And these steps left me with a nice list of the alphanumeric series as follows:

Creating the Alphanumeric Series

The IDs Query

At this point, things became super simple.  There’s no real magic to this query, as the data has already been prepared and normalized in the Numeric and Alpha queries.  So all that needs to be done here is:

  • Reference the Numeric query
  • Append the Alpha query
  • Set the data types (just to be sure)
  • Load the query to the intended destination

And that’s it.

That’s One Solution to Power Query Challenge 5

Of course, there are countless other ways to solve this.  My goal in this case was specifically to avoid a custom function, as I didn’t want to obfuscate the code for a new Power Query user.  Could I have done this all in one query with use of more complicated Custom Columns?  Sure.  But again, would it be maintainable by people newly introduced to Power Query?  Hopefully intellisense in the Custom Column Dialog will change my answer, but right now I’d say that’s unlikely.

One of the things I love about Power Query is the ability to break this down into separate queries, then put them back together.  The reality is that this series of data was mixed, needing slightly different treatment in each case.  This setup allowed me to clearly focus on the goal, getting it done without noise from the other type of data in the set, then just append them afterwards.  I call that a win as it’s easily auditable by someone closer to the beginner side of the Power Query learning curve.