Creating Power Query Based Calendars

We’re super excited to announce that we have given the Monkey Tools calendar creator feature an upgrade.  In fact, it is so much of a power up, that this feature has graduated into its own full grown monkey!  We call it the Calendar Monkey.

In our initial version, the Calendar Creator would create the queries necessary to load a single column calendar of unique dates into the data model and add PeriodID columns for 364-day calendars like 445 and 13 fiscal periods.

The Calendar Monkey adds a couple of pretty powerful features to this original mix.  Let’s walk through the experience quickly to see what I mean.

Step 1:  Define the Calendar Boundaries

Step 1 screen of the Calendar Monkey, allowing you to choose your calendar type, the calendar start/end boundaries and year end

Not much has really changed here.  You can still:

  • Pick your calendar type (12 month, 13 month, 445, 454, 544), and define your custom year end (including a different month for 12 month calendars.)
  • Define a name for your calendar table query
  • Choose the load destination
  • Pick any valid date columns for the Start and End date of your data

These settings allow our monkey to build the calendar to dynamically span the entire range of your data on every refresh.

The only real difference here is that we’ve added a checkbox and a Next button.  (Notice that Create is still available, if you just want to accept the monkey’s default choices for the rest of the options you’re about to see.)

Step 2:  Choose Calendar Columns

One of the things that always bothered us about our original version is that it created the calendar’s Date column, but then left it up to you to add the different date formats that you wanted.  So we decided to improve that, as you can see here:

Step 2 of the Calendar Monkey screen provides a series of check boxes allowing you to specify which date format columns you'd like to add to your model

Our monkey pre-selects the most common date formats, but if you ever check/uncheck one, it will learn your preferred defaults and provide those next time you go to inject a calendar.  The columns shown dynamically react to your choices in Step 1 as well… if you use a Dec 31 year end for a 12 month calendar, you only need the first two columns – so that’s what the Calendar Monkey will show you.  If you have a custom year end (like Jun 30 or Sep 30), you may also want Fiscal columns, so the monkey provides those as options too.  And if you work with a 364-day calendar like a 445 variant or a 13 fiscal periods calendar… there is a final column of PeriodID’s that shows up in that blank spot too.

Step 3: Adding Relationships

Depending on your choices in Step 1, the Calendar Monkey will determine if you will be presented with this page or if it will be skipped.  If you choose to load your calendar to the Data Model (or Data Model & Worksheet), the monkey will list every date column loaded to the data model.  The purpose of this is simple; let you decide if your new calendar table should be linked to any of those columns listed.

For any columns you check, Calendar Monkey will do its best to create those relationships after loading your table to the data model.  (There are some things that can prevent the monkey from accomplishing this, such as creating an inactive relationship.)

Step 3 of the Calendar Monkey provides checkboxes for each date column in the data model, allowing you to declare which tables you'd like to relate your calendar table to.

Step 4: Creating the Calendar

Even though the Calendar Monkey has a lot of work to do when you click create, it also knows the value of good feedback.  For this reason, it will update you as to the progress as it completes all the individual tasks, as you can see here:

This screen shows feedback from the Calendar Monkey, letting you know what has been done, as well as what you need to do need.

Now unfortunately, there are a couple of things that the monkey is unable to do (thanks to a lack of security clearance with the Excel data model).  Rather than just ignore these essential tasks, however, it will tell you what needs to be done, with the exact steps to do so.

As a bit of a pro-tip here… you don’t actually need to close the Calendar Monkey window to take action on those steps… so keep it open until you’ve hopped into the data model and made the advised changes!  Of course, if you understand what’s happening, and know the steps you’ll need to take, there is also an option to automatically close the summary screen upon completion as well.

Looking at the data model with the new calendar table already related to the other tables

What the Calendar Monkey cannot do

There are three things that the Calendar Monkey can’t do at this time:

  1. Automatically hide the keys on the “many” side of the relationship (the foreign keys)
  2. Automatically create the sorting hierarchies to sort Month Name by Month Number and Day Name by Weekday Number
  3. Create the calendar in a Power BI file

Honestly, while we joked earlier that the Calendar Monkey doesn’t have security clearance, the reality is that there is a limitation with the data model’s extensibility model which is preventing us from solving the first two items.  The third… its on our backlog.

So how do you get the new Calendar Monkey?

You need Monkey Tools version 1.0.7493.29574 or higher, and you'll have the Calendar Monkey ready to do your bidding.

If you haven’t already, head over to the Monkey Tools product page to download a copy

If you already have Monkey Tools installed, it will automatically update within a couple of weeks, or you can request the update now by going to Monkey Tools -> Options -> Check For Update Now…

Name Worksheets After Queries

Have you ever loaded a Power Query to a worksheet and then changed the name to match the query? It's a shame that there is no option to name worksheets after queries, as this would be handy.

Well, after seeing this request come up in the forums last week, we thought that this would be a great feature to add to the DestinationSleuth in Monkey Tools.  So as of build 1.0.7433.38066... it's done!

How to Name Worksheets After Queries

We wish that we could add this as an option in the Close & Load dialog, but sadly that's not possible.  So we did the next best thing...  Once you've loaded your queries, you simply need to open our DestinationSleuth and:

  1. Select the queries you're after
  2. Click the Rename Sheets button

Using DestinationSleuth to select queries and change the names of their host worksheets

At that point, we'll quickly loop through the host worksheets and rename them to match the query landed to that sheet.

Name Worksheets After Queries While Changing Load Destinations

You might also notice a new checkbox called "Name Sheets After Queries".  This checkbox allows you to name worksheets after queries while changing a load destination to create a new table.  It's also super easy to use:

  1. Select the query (or queries) you wish to change
  2. Choose to change the load destination to a Table
  3. Check the Name Sheets After Queries checkbox
  4. Click the Update Load Destinations button

Using DestinationSleuth to change a load destination from Connection Only to Table, and update the worksheet to the Query name at the same time.

We'll change the Load Destinations, creating the worksheets AND naming them to match the query in one step.

This is also a "Forever Free" Feature

As mentioned in my last post, while the DestinationSleuth's colour highlighting is only available in the trial and pro versions of Monkey Tools, the ability to Change Multiple Load Destinations at Once is a "forever free" feature.  And so is the ability to rename worksheets after queries!  All you need is Monkey Tools version 1.0.7433.38066 or higher, and you'll have that ability at your disposal.

If you haven’t already, head over to the Monkey Tools product page to download a copy

If you already have Monkey Tools installed, it will automatically update within a couple of weeks, or you can request the update now by going to Monkey Tools -> Options -> Check For Update Now…

 

 

Change Multiple Load Destinations at Once

Have you ever built multiple queries in one Power Query session?  You get to choose one load destination, then have to change each of the incorrect ones... one by one.  Have you ever wished you could change multiple load destinations at once?  Well now you can - you just need Monkey Tools!

Check out the New and Improved DestinationSleuth!

Following on the heels of last week's feature updates, we've added functionality to the DestinationSleuth that allows you to change load destinations of any one or more queries.

Naturally, DestinationSleuth still gives you a very visual view of the different query load destinations all in one place, but do you see that new option just to the left of the Exit button?

DestinationSleuth showing the different load destinations in colour

Assume that you've created 5 new queries in the workbook, and load them all to Connection Only.  But now you want to change 4 of them to the data model.  Rather than click each query, change the load destination, and wait, then move to the next one... Now you can just launch DestinationSleuth and:

  1. Select the load destination
  2. Select each of the queries you want to repoint (hold down your CTRL key)
  3. Click Change

DestinationSleuth being using to change multiple load destinations at once

Sure, it still takes time, but at least you can walk away and let it cook, rather than slowly shepherd it through the entire process.

Delete Host Worksheets

You'll also notice a little checkbox called "Delete Host Worksheets".  You know that issue where you accidentally load a table to the worksheet, then change the query to Connection Only?  It leaves the worksheet behind.  Now true, you can always just delete the worksheet (which will actually set the query to Connection Only automatically). But what if you need to change 5 of them?  It's easy with DestinationSleuth:

  1. Select the load destination
  2. Check the "Delete Host Worksheets"
  3. Select each of the queries you want to repoint (hold down your CTRL key)
  4. Click Change

We will repoint your queries AND remove the worksheets that were holding the query results.  (Naturally, you want to be really sure you acutally want to do this, but it's handy if you do.)

This is a "Forever Free" Feature

While the DestinationSleuth's colour highlighting is only available in the trial and pro versions of Monkey Tools, the ability to Change Multiple Load Destinations at Once is a "forever free" feature.  All you need is Monkey Tools version 1.0.7423.41125 or higher, and you'll have that ability at your disposal.

If you haven’t already, head over to the Monkey Tools product page to download a copy

If you already have Monkey Tools installed, it will automatically update within a couple of weeks, or you can request the update now by going to Monkey Tools -> Options -> Check For Update Now…

Monkey Tools Update Now Available

We’re pleased to announce the first Monkey Tools update is now available for download! This one contains a new feature, some new logic and an update to one of our data connectors. Read on for more information!

I can’t believe it’s been a month since our initial release, but here we are.  Since that release, a couple of notable things happened:

  • We published a blog post on PivotSleuth and Measure Tables. This was a learning experience for me, as I discovered something new about Measure Tables.  I always knew that you could mark your Fact tables as Measure tables by hiding all the unaggregated columns, but I didn’t realize that this also means that disconnected Measure tables will then suppress the “Relationships between tables may be needed error.”
  • One of my friends hit me with an interesting curve ball: he turned on “Store datasets using enhanced metadata format” in the Power BI preview features. And as it turned out, some of the methods we’d been using to analyze the Power BI model disappear when you do that.

Both of these have led to some improvements in the software, which we’re proud to say are finally available to you.

What’s new in version 1.0.7418.29970?

There are three major things that are new in this version:

A New Power BI Connector

This was actually a huge amount of work for us, as not only did we have to build a new connector to read the new Power BI file format, but we also had to analyze the file as it was opening to see if it was in the classic format or not.  And to make it harder, if you have the Enhanced Metadata Format turned on, a legacy file requires using our initial connector, but any refresh must be done with the new connector.  Fun times for a developer and, as you can imagine, it took as a bit of effort to pull it off.  It’s actually this piece that has held us back on the other features, as the connector MUST work and impacted everything.

While most of this work is invisible to the end user, there are two things that hope you do notice:

  • Connecting to the new file format is much faster than using the legacy format.
  • We also took the time to remove the reliance on configuring the version of Power BI that launches for you by default. We now just launch Power BI using your default connector, then bind to it, no matter how many versions you have on your machine, or which they are.

Updated Functionality in the PivotSleuth

As we learned in the PivotSleuth and Measure Tables blog post, hiding all unaggregated columns on a disconnected measure table flags the table as an official Measure table and suppresses the “Relationships between tables may be needed” message.  For that reason, PivotSleuth needed to recognize that this is acceptable.

In other words, when the Measures table is a properly formatted disconnected table, it needed to (and now does) show that there are no issues:

PivotSleuth giving a clean bill of health for a measures from a disconnected measure table

But when that disconnected Measures table has a visible column, not only should it have shown the issues, but also tell you what needs to be done to fix them.  And now it does:

PivotSleuth showing issues for measures used from an improperly formatted measure table

New QueryMonkey Feature: Add Measure Table

As mentioned before, our philosophy is “Build better, faster”.  For that reason we’ve added a new QueryMonkey feature for you:  Add Measure Table.

This feature will prompt you for a name for the table…

Prompting the user to enter a name for the new Measures table

And then create a new empty table in the data model for you:

An empty table in the data model

Unfortunately, the Excel team hasn’t given us a way to programmatically hide columns in the data model (we could REALLY use that ability), so we can’t take that last critical step for you:  Hiding the Measures column to prevent the "Relationships between tables may be needed" message.  But never fear, we do tell you exactly what needs to be done:

Advice from PivotSleuth on what to do to turn the new table into a proper measure table

So while we typically store our measures on the Fact tables, rather than a disconnected Measure table, we totally get that a lot of people like this approach.  Hopefully this make it a lot easier for you!

How do I get the Monkey Tools Update?

The answer to this depends on whether or not you’ve installed Monkey Tools yet.

If you haven’t, then head over to the Monkey Tools product page to download a copy

If you already have Monkey Tools installed, it will automatically update within a couple of weeks, or you can request the update now by going to Monkey Tools -> Options -> Check For Update Now…

Happy sleuthing! 🙂

How do You Design Your Data Model?

On this blog, I showcase a lot of different techniques for manipulating and reshaping data.  For anyone that follows the blog, you already know this, and you know it's a pretty important topic to me.  But the thing we shouldn't lose site of is WHY we do this. It's to drive analytics. I'm fairly convinced that the majority of the loyal readers here already know this. Thus, I wanted to ask your opinion on something...

Raw data to data model, passing through the human brain

How do you design your data model?

What I'm specifically interested in is how you approach designing the Fact and Dimension tables you use for your Power Pivot model.  And I'm not specifically talking about Power Query here. We all know you should be using what you learned from our recently relaunched Power Query Academy to do the technical parts.  😉

What I'm more interested in is the thought process you go through before you get to the technical bit of doing the data reshaping.

If you read books on setting up a data model, you'll probably be told that you need to do the following four steps:

  1. Identify the business process
  2. Determine the grain of the model
  3. Design your Dimension tables
  4. Design the Fact tables

So if you're asked "how do you design your data model", do these steps resonate with you, and why?

Do you consciously sit down, and work through each of these steps in order?  I suspect that many self-service BI analysts skip the first step entirely as they are implicitly familiar with their business process.  (As a consultant, I ask a lot of questions in this area to try and understand this before building anything.)

Do you design the reports on paper, then work backwards to the data you'll need, go find it and reshape it?  Or do you go the other way, trying to collect and reshape the data, then build reports once you think you have what you need?

Do you explicitly define the model grain?  And if you do, what does that mean to you?  Is it restricted to "I want transactions at an monthly/daily/hourly basis"? Or do you do deeper like "I want transactions at a daily basis and want to break them down by customer, region and product"?

Why the question?

There's actually two reasons why I'm asking this question:

Reason 1 is that I'd I think healthy discussion makes all of us better.  I'd like to hear your thoughts on this as I'm probably going to learn something that I haven't discovered in my own learning journey.

Reason 2 is that my whole business is around teaching people how to do these things, and I'm always looking to make things clearer.  The more opinions I hear (even if they contrast with each other), the more I can help people understand his topic.

So sound off, please!  We'd all love to hear how you approach the task of building a data model.

Do Data Types Matter in Power Query?

One of the things that I find a bit misunderstood in Power Query is whether or not Data Types matter. I mean, I’m sure everyone agrees to some point that they do, but you do know just how much they matter, and why?

Over the weekend, I received an email from one of our loyal students which read, in part:

I am going under the assumption that in Power BI nothing is free. Steps, calculated columns, measures and so on could be cheap but they are never free. It is with this premise in mind that I pose the following theory.

Users should sparingly use Change Type in Power Query. It is fine to Change Type when converting Date/Time to Date, Decimal Number to Whole Number, and others that actually change the value. It is a waste of resources to Change Type from Decimal or Whole Number to Currency. Even if you want the column to look like Currency, you can apply that format in the Data Model view and save one Power Query step.

On the face, this theory seems somewhat reasonable. After all, adding additional steps is bound to add some overhead to the Power Query process in most cases. And let’s be honest, in small models, it may make no difference to you at all. But when things get bigger…

Data Types vs Formats

To understand what Data Types matter, we need to get something very clear right off the bat: Data Types and Formats are not the same thing. Data Types dictate what kind of data you have and determine how much memory is allocated to store a value. Formatting, on the other hand, tell you how you want the values to appear. To see this in practice, have a look at the following values in Power Query, where the Data Type has been set to Currency:

Column set to Currency Data Type

Notice that they only place you see a $ sign is in the header. And see how the decimal numbers do not line up? I can tell you from many years of teaching accountants, that this drives them bonkers. That 1.7 needs to be 1.70! But you don’t do this here, you do that in the Excel worksheet, Power Pivot model or Power BI visual.

They key to remember here:

  • In Power Query, you define Data TYPES
  • In Power Pivot, Excel or Power BI, you define Data FORMATTING

Excel’s Data Types

In the classic days of Excel, we only had four data types that we had to be worried about. Those were:

  • Text,
  • Numbers,
  • Blanks, and
  • Errors

(Although represented by textual patterns like #REF!, #N/A, #NAME?, and the like, they actually count as a different data type.) Today it gets a bit more complicated with Stock and Geography data types, as I blogged about here, but anything else was just a number that was formatted with a number formatting string. Examples of these include:

  • Date: format it as a number, and you’ll see the number of days since Jan 1, 1900)
  • Time: a little trickier, but this is just a fraction of a day where .25 is 6:00 AM, .5 is 12:00 PM and .75 is 6:00 PM. And if all you have is the time, and you format it as a DateTime, you get that time on Jan 1, 1900.
  • True/False (Boolean): while this shows in the cell as TRUE/FALSE, which looks like text, if you multiply them by 1 you’ll see that TRUE equates to 1, where FALSE equates to 0.

But that was Excel, where data types and number formats where the same thing. That’s not Power Query.

Power Query’s Data Types

Unlike Excel, which has a pretty short list of data types, Power Query seemingly has a ton:

  • Numeric Data Types:
    • Decimal Number
    • Currency (Fixed Decimal)
    • Whole Number
    • Percentage
  • DateTime Data Types:
    • Date/Time
    • Date
    • Time
    • Date/Time/Zone
    • Duration
  • True/False (Boolean)
  • Text
  • Others (Binary, Tables, Records, Lists, and more)
  • Any (the dangerous “undefined” type which allows the application to determine the correct data type)

The key to recognize, is that each of these data types is DISTINCT, meaning that each of these is different in some way from every other. (While we won’t get into it in this post, unlike Excel which implicitly converts data from one type to another, Power Query requires explicit type conversion via use of functions like Number.From() or Date.ToText(), which can be frustrating at times!)

For the purpose of this post, however, I want to focus on the first three numeric types: Decimal Number, Currency and Whole Number, and ask the question: Does the Data Type matter?

Illustration Background

Let’s have a look at an example. For the illustration, I set up a Power Query chain that looks like this:

Viewing our Query Chain

The Data table had 100,000 randomly generated [Sales] records that vary between 1.27317262341058 and 100000.017761279, and randomly generated [Customers] that vary between 1 and 1000.

The only change I made in the Whole, Currency and Decimal types was to set the Data Type for the [Sales] column accordingly. In the Customers table, I removed the [Sales] column and duplicates from the Customers column.

I then created 3 quick measures to sum the column up, and dropped them on a Pivot:

Summing up our 3 measures

I don’t think it would surprise anyone here that the [Whole Sum] is slightly different than the [Currency Sum] and [Decimal Sum]. After all, the numbers were rounded at the source before being added up. And so far, we can see that the [Currency Sum] and [Decimal Sum] look pretty much the same. At least until we expand them a bit:

Decimal differences between Currency and Decimal Data Types

The only thing that should be a surprise here is that currency only holds up to four decimals, not two as most people expect. This actually makes sense when you start thinking about foreign exchange transactions, and how they are always carried to four decimal places.

But is that it? Is a Data Type just about rounding? As it turns out the answer to that is both yes and no.

Testing the Model Memory

The next step here was to test the model memory and see how Power Pivot is storing the data. To that end, here’s a little table that shows exactly that:

Table illustrating how Power Pivot is storing the data

Before we get into this, I want to call out something important here. The Data Type that is showing has been read from Power Pivot. Notice that Whole Number shows as such, consistent with Power Query. But the Currency and Decimal tables both show Decimal. As it turns out, Power Pivot doesn’t make a distinction between these two data types. However, the distinction between these two Data Types matters to you anyway, as I’ll explain.

So, what does that all mean?

In the Whole Number table, I had rounded off all the decimals. This left 63,815 unique values. And because Power Pivot compresses based on unique values, it deals with this column very well, resulting in a total of 260.54 KB to store these values.

In the Currency table, I effectively rounded all the values off to four decimal places. This left 99,996 unique values in the table (only 4 values were repeats). Despite this, Power Pivot did a good job of compressing the values in memory, resulting in 390.75 KB to store them.

Then we get to the Decimal column. There are only 4 more unique values than in the Currency column, but the memory takes a colossal 5,234.47 KB to store the values, vs the 390.75 KB of Currency. What the heck is going on?

The answer lies in the fact that Power Pivot has to carry all of those decimal places, and once it does, it can flip to storing data using Hash Encoded memory. That’s a deep topic for another time but suffice it to say that this is a bad thing, as Value encoding is much more efficient. (Hash is generally used for Text, and Values for… you guessed it… values!)

Interestingly, if you round the Decimal Number to 5 decimals you end up with 99,999 unique values in the column and a very minor change to the memory used. But if you round it to 4 decimals, the memory of the Decimals column compresses the same as Currency, and the memory need drops to the same 390.75 KB.

Table illustrating memory requirements of the different Data Types

Why Model Memory Matters

And this is the final part of the equation to me. Not all time is created equal. My users will accept a 2-minute refresh of the data model. They might say it takes time, but +/-30 seconds in a refresh isn’t anything that they’ll get overly concerned with. They’ll click Refresh, go grab a coffee, then come back to work with the file.

But if they click a slicer and it takes 10 seconds to redraw the Pivot Table or Power BI visuals? Watch out! They’ll claim the model is too slow, ineffective, wrong and useless. I’d MUCH rather push resource consumption into the initial refresh in order to build a memory-efficient model that performs well when being used for analysis.

What, wait happened to the Data Types?

To be fair, Power Pivot’s compression mechanism is more about unique values and the length of precision than it is about Data Types. But it’s up to you to choose the correct Data Type to future-proof your model and make sure that the compression algorithms can be applied.

But due to the way Excel has always worked, the way the Data Types are named, and the fact that most modellers don’t have a clear understanding of Formatting vs Data Types… users are more likely to pick Decimal over Currency. I mean, why would I ever format my units as Currency? (In Power BI this is called a Fixed Decimal, although it still shows with the currency symbol.)

We need to recognize that lagging decimals really do happen in business. Let’s look at that Units column for a second. Naturally we never sell a partial unit… or do we? I’ll give you 3 examples of things I’ve seen in the Food and Beverage industry that forever changed my opinion on this:

  1. We had a couple of customers who insisted that they be able to purchase half a muffin. I’m not even kidding here. The killer for me was that it only showed up every 10,000 rows of transactions or so, meaning that the column often got set to Whole Number incorrectly.
  2. The salesperson agrees to provide those 300 units for $5,000. No big deal except that they monkey the price field in your database to make it work and you suddenly you have a sales price of $16.666667. Fortunately, this one usually gets rounded via setting it to a Currency Data Type, as that’s logical. But what if you set it to decimal or left it undefined?
  3. Things go the other way and the customer agrees to a package price of $5,000 for something that usually carries a price of 17.00 per unit. It gets rung into the system with a reverse engineered quantity of 294.1176470588235 to get it exactly right.

The last is the real kicker as the memory suddenly flips from Value to Hash, the RAM needed to refresh the data expands exponentially and brings your model to its knees. And now every slicer click has gone from 1 second to update your visuals to the better part of 10 seconds. And your audience is screaming that the model is “unusable”.

My Recommended Practice

The final step in every query I build which loads to a destination (an Excel Table or the Data Model) is to correctly define my Data Types. This does two things for me:

  1. It ensures that I never accidentally load a column with an undefined Data Type. (In the case of Dates, they load to Excel as numbers, and to the Data Model as text!)
  2. It ensures that I’m cutting down to the minimum number of decimals I’ll ever need for my analysis.

Does it take more processing time? Maybe marginally. But does it future-proof my solution to keep it performant? Definitely. And it ensures the time to do so happens in the refresh, not in when the model is being used for analysis.

Some Final Thoughts

The issues you saw here with memory also affect DateTimes in a big way, as they are quite similar to decimal numbers, where time is represented as the fraction of a day.

The article touches on some things from a much bigger topic: how to optimize a Power Pivot/Power BI data model. I actually teach a full day course on this topic for CPABC, where we cover how the Vertipaq engine compresses its data, why you want to avoid calculated columns and fall in love with measures. We make extensive use of Power Query in this course to reshape tables (applying correct data types along the way) so that they can be consumed by the data model in a more efficient manger. We dissect a poorly built model, test its memory and then rebuild it in stages seeing how our changes made an impact.

And if you’re looking for tools to help with this process… stay tuned to this blog. I’m hoping to have an announcement on that front within the next couple of months.