Introducing the Measure Monkey

You know the drill… extract, transform and load your data, relate your tables, then create basic DAX measures.  All work that needs to be done before you can really get started on analyzing your data.  Today we’ve unleashed the Measure Monkey to help speed up that process a bit for you.  (You can think of the Measure Monkey as Quick Measures for Excel.)

If you follow Monkey Tools already, you’ll know that our goal is to help you build better models faster.  We already include helpful functions such as:

  • the ability to inject a query that can automatically switch between local folders and SharePoint folders
  • manage your queries via our Query Sleuth
  • build calendar tables on the fly against your data
  • and so much more...

But while we’ve had a nice tool to trace DAX query chains, we haven’t included a lot of DAX functionality to date.  That is changing today.  And oh… before we dive into it, I want to be clear that this feature will be available to ALL users of Monkey Tools.  Yes, even those of you using a Free license!

The Sample Model

Before we dive into this, let’s take a look at a sample data model:

Notice that everything is nicely created and linked (by the way - we created that calendar in a few seconds with Monkey Tools’ Calendar Monkey…) but that there are no DAX Measures on our Sales and Budget tables.  Date and Category are both foreign keys that link each of the those tables to the Calendar and Categories tables.  However, we really want explicit measures to sum both the Sales[Amount] and Budget[Amount] columns.

Of course, these measures are easy to write, but what if your model is a bit more complicated and there are ton of them to do?

Creating Explicit Measures in Bulk with the Measure Monkey

As of version 1.0.7599.31348, you’ll find a new Measure Monkey menu on the Monkey Tools ribbon for this exact purpose:

Step 1A: Which Tables Host The Columns To Aggregate?

When you launch the new feature, you’ll be taken to a screen that looks like this:

This screen in intended to allow you to tell the Measure Monkey which tables hold the columns you need to aggregate.  Our aim in this screen is to give you the highest possible chance of just clicking "Next". That being said, we realize that this may not work for everyone, so we also allow some flexibility here.

In the top left, we pre-select the tables which we believe have the highest chance of needing aggregation: your fact tables.  (Those tables with only ‘many’ sides of relationships attached to them.)  But if we get this wrong for you, you simply need to check the other boxes to include basic aggregations for other tables.  (Ideally, you shouldn’t be aggregating dimensions, but there are – of course – exceptions to every rule.)  You’ll get immediate feedback in the box in the bottom left, as we show all the tables that will be included based on your checkbox selections.

Step 1B:  Tell the Measure Monkey Where to Store Your New Measures

In the top right, we also allow you to tell us where you want to store the measures.  If you have created a specific “Measures” table, we’ll provide that by default.  If you haven’t, we’ll offer to store the measures on the Host Table.  (In other words, all measures created to aggregate columns from the Budget table will be stored on that table, whereas columns from Sales will be stored on Sales.)

Forgot to set up a new Measures table before doing this?  No worries, click the + to add a measures table on the fly, give it a name, and we’ll create it for you:

There are a couple of Advanced options as well, but we believe most people will want to leave these set based on their defaults.  So let’s click Next, to go to page 2…

Step 2: Choose Your Aggregations

This page contains a ton of info, but again we’re trying to provide you the biggest chance of clicking “Create” right away. Unfortunately, this is something that we can’t do in the image above…

The reason our Create button is disabled is that we have two measures offered with the name “Sum of Amount”.  The blue one is the first instance, and any subsequent measures with the same name will highlight in red.  So let’s fix those, and choose a default data type format:

It’s all good to go now, except that I want to add a “Transactions” measure that counts the rows of the Sales table.  So I’m going to click the “Add another aggregation” button in the Sales table. Then I choose the name of the table from the drop down list:

That will give me a new row with a “Count Rows of Sales” measure, which I can quickly rename to “Transactions” before clicking “Create”.

During this process, the Measure Monkey will create your measures for you. Plus, if you created a Measures table, it gives you some advice on how to make it an “official” measures table.  You can see the results in my data model here:

That was Easy…

The demo above was obviously a fairly simple model. Yet it cuts my explicit measure creation time down to less than a minute to create these two measures.  Now consider the time savings when you get a bit more complicated:

So how do you get the Measure Monkey menu?

This update to Monkey Tools is available in Monkey Tools 1.0.7599.31348 or higher.  And it's will be a “forever free” feature, so you’ll be able to use it on either a Free or Pro license!

To try our free trial, head over to the Monkey Tools product page to download your copy.

If you already have Monkey Tools installed, it will automatically update within a couple of weeks. Alternatively, you can request the update now by going to Monkey Tools -> Options -> Check For Update Now…

Solutions for Power Query Challenge 6

This morning I logged in to check the solutions for Power Query Challenge 6 that were submitted and... Wow!  There were a bunch, and some cool variety there.  So now it's time to show you all what I came up with here.

What was Power Query Challenge 6?

The full description and source data can be found in yesterday's post, but in short it was to convert this:

Data table with nested data sets

Data table with multiple data points per cell

To this:

Data in 1NF

Data shown in First Normal Form (1NF)

So how do we do it?

Two Solutions for Power Query Challenge 6

Wait, two solutions?  Why?

As it turns out, I put together two for this one. My first attempt was cooked up to solve the issue on short notice.  Then I built another that seems a bit more elegant.  So I'll show them both, although quickly.

Solution 1 - Splitting Individual Columns and Merging Back Together

The first of my solutions for Power Query Challenge 6 is actually quite similar to what Ali posted in the solution thread.  It basically follows this method:

  • Step 1:
    • Create a Data query that connects to the source data, and load it as connection only
  • Step 2:
    • Create 3 new queries for ItemID, Quantity and Price which
      • Reference the data query
      • Keep the InvoiceID column and the other relevant column
      • Split the relevant column by delimiter, ensuring it splits to rows (not columns as it defaults to)
      • Add an Index column
  • Step 3:
    • Reference one of the Step 2 tables, and merge the other two tables to it, based on matching the Index column in each

So when complete the query chain looks like this:

And returns the table we're after:

The only real trick to this one is that - when you are building the Price query - the Price column will pick the decimal as the delimiter, so you have to force it to a line feed.  So building the Price query would go through the following steps:

  • Right click the Data query --> Reference
  • Select the InvoiceID and Price columns --> Right click --> Remove Other Columns
  • Right click the Price column --> Split column --> By Delimiter
    • Clear the decimal from the Custom area
    • Click the arrow to open the Advanced area
    • Change the selection to split to Rows
    • Check "Split using special characters"
    • Choose to insert a Line Feed character
    • Click OK
  • Set the Data Types
  • Go to Add Column --> Add Index Columns

Resulting in this:

The ItemID and Quantity queries follow the same steps, except that Power Query now correctly identifies the Line Feed as the character to split on.

Solution 2 - Group and Split

While the first solution to Power Query Challenge 6 worked, it left me less than satisfied as it took a bunch of queries.  While effective, it didn't feel elegant.  So I cooked up another solution that uses Grouping.  It's actually quite similar to the first solution that Bill Szysz posted.

The basic method is as follows:

  • Connect to the data
  • Right click the InvoiceID column --> UnPivot Other Columns
  • Right click the Value column --> Split Column --> By Delimiter --> OK

Following the steps above gets us to this state:

To unwind this, we group it:

  • Go to Transform --> Group By
    • Group By InvoiceID, Attribute
    • Aggregate a "Data" column using the All Rows operation

Grouping using the All Rows feature

At this point, we need to number these rows, so I just busted out the pattern to do that from our Power Query Recipe cards (recipe 50.125).

  • Go to Add Column --> Custom
    • Column Name:  Custom
    • Formula:  =Table.AddIndexColumn( [Data], "Row", 1, 1)
  • Right click the Custom column --> Remove Other Columns
  • Expand all fields from the Custom column

Leaving us with this data:

Data Grouped with Numbered Rows

The next step is to Pivot it:

  • Select the Attribute column --> Transform --> Pivot Column
    • Choose Value for the Values
    • Expand the Advanced arrow
    • Change the Aggregation to "Don't Aggregate"
    • Click OK
  • Select the "Row" column and Remove it.  (Yes, it was needed to unpivot this correctly, but now adds no value.)
  • Set the data types
  • Load it to the desired destination

At this point, the query (again) looks perfect:

The desired output

Now, I must admit, this felt far more professional and left me feeling good about it.

Which Solution to Power Query Challenge 6 is Better?

Naturally, solution 2 is better.  It takes less queries, and looks way cooler.  Right?  Not so fast...

The real question is in the performance.  And for this one I thought I'd test it.  But I needed more data.  I expanded the set to 11,000 rows and then used a tool we're working on to time the refreshes.  Privacy was left on, and all times shown are in seconds:

  • Solution 1:  1.43, 1.48, 1.11, 1.27  Avg ~1.32 seconds
  • Solution 2:  2.77, 2.65, 2.63, 2.68  Avg ~2.68 seconds

I'll be honest, this surprised me.  So I went back and added the GroupKind.Local parameter into the Grouped Rows step, like this (as that often speeds things up):

Table.Group(#"Changed Type1", {"InvoiceID", "Attribute"}, {{"Data", each _, type table [InvoiceID=number, Attribute=text, Value=number]}}, GroupKind.Local)

The revised timing for Solution 2 now gave me this:

  • Solution 2A:  2.54, 2.49, 2.56, 2.61.  Avg ~2.55 seconds

So while the local grouping did have a small impact, the message became pretty clear here.  Splitting this into smaller chunks was actually way more efficient than building a more elegant "all in one" solution!

My solution (including 5,000 rows of the data), can be found in the solution thread here.

Power Query Challenge 6

Are you ready for Power Query Challenge 6?  In this challenge we'll look at splitting nested data sets into a nice table in First Normal Form.

Let's take a look at Power Query Challenge 6:

Where did the challenge come from?

The inspiration for this challenge came from last week's Self Service BI Boot Camp that we hosted in Vancouver, BC (you should have been there, it was awesome).  At the outset, we talked about how to identify if your data is in First Normal Form (the format that is ideal for a PivotTable), and I showed this data set:

Data table with nested data sets

Data table with multiple data points per cell

Notice how the invoice has multiple ItemID, multiple Quantity and multiple Price fields per cell?  That's not good at all.

What we Really Need - Data in First Normal Form

As I explained in the Boot Camp, it is essential that the data source be in First Normal Form in order for a PivotTable to consume it.  What does that mean?  It means that it needs to look like this:

Data in 1NF

Data shown in First Normal Form (1NF)

Notice that in this case the data is atomic - it only has one data point per cell.  In addition, there is now one complete record per row.  The InvoiceID shows on every row now, and each data point has been correctly split up and applied to them.

So what's the thrust of the Power Query Challenge 6?

Well, as it turns out this is a bit tricky.  There are a few issues at play here:

  • The data is separated by line feeds within the cells
  • There are a different number of line feeds in each row
  • At least the number of line feeds is consistent for each cell in the entire row though!

So the challenge is this: break the table apart so that the data is in First Normal Form like the second image.

You can download the data source file (and post your solutions) in our forum here.  I'll give you my version tomorrow.