Column Quality and Distribution Features are Finally Here

If you've worked with Power Query in Power BI, you've seen Column Quality and Distribution features already.  But I'm super exited to say that I've finally received it in the Insider build of Excel!

What are the Column Quality and Distribution Features??

If you're not familiar with these, let's break them down a bit:

Data Quality Indicators in Column Headers

In its most simplistic form, you'll see that you now get an indicator in the header of each column as shown here:

Column quality in Power Query Headers

Notice that red bar at the top of column C?  That's happening because there is an error in the column!  This is SUPER awesome, as we've been teaching for years that you have to scroll down looking for errors each time you change from text to a numeric data type.  This should really help, as the column header will make it obvious that something went wrong.  All green?  You should be good to go.  See red?  You've got potential problems to deal with.  (To be fair, you may still get errors after loading, as it is working with a data preview, but still, this will be a BIG help.)

The nice thing about this is that it is always on.  Nothing needs to be set up to enable this by default.  (And so far as I know, you can't turn it off - but then why would you want to?)

The Column Quality Feature

This one is enabled by going to the View tab and choosing "Column Quality", resulting in a view that looks like this:

Power Query Column Preview Settings

Which adds 3 new lines to the top of our columns:

Column Quality in Power Query

Now, I don't plan on leaving this on all the time. However, the cool thing here is that I can very quickly identify that 20% of my data rows in column C are errors.  And while I don't have any empty rows, it would show me that at a glance if I did.

The Column Distribution Feature

The idea behind the Column Distribution feature is to give you a quick glance at the distribution of values in your columns.  Is there a single value that shows up more than others, or is the data uniformly distributed?  This can help you quickly identify if the data has any consistency to it or not:

Column Distribution in Power Query

Notice that in Column a, we have 3 distinct values (b, d, e), but only one value that is truly unique (b only occurs once in the entire data set).  Meanwhile, in column b, each value occurs in the data set once and once only, with no values appearing more than this.  On a data set this small, this isn't going to be super insightful. But say you are pulling in data that spans your monthly summaries for the last year. If suddenly you see 13 values where you are expecting 12... that's a good indicator you might want to look a little deeper.

Something else that is worth noting is that you don't have to keep the Column Quality on to get the Column Distribution chart.  These features can be used together or individually.

The less obvious Column Profile Feature

The final checkbox in the Data Preview menu is the Column Profile feature.  This one is very cool as it only activates when you select the entire column.  And when you do so, it takes over the preview window providing most of the information from the previous two settings.

Column Profile in Power Query

We've got even more stats (although I wish it also showed the % of values like in Column Quality), as well as the value distribution chart.

What I love about this is that it can be called up on-demand by selecting the column, but doesn't get in my way when I don't need it.

My Thoughts on this Feature

I've been waiting for this to hit Excel for over a year now, and am SUPER happy to see it finally show up.  Personally, I'll be running with Column Profile (the last option) selected, but not Column Quality or Column Distribution. The reason is pretty simple... I want to minimize the amount of info between me and my data.  The Column Profile feature will give me what I need in an on-demand fashion when I see the data colours change at the top of my column. 🙂

How do I Get These Features?

This feature is so hot and fresh that it showed for Office Insiders this morning (v1910 build 12112.x.)  So you need to get to that version as a minimum to get these features.  Hopefully it will start rolling through the regular update channels next month!

 

Do Data Types Matter in Power Query?

One of the things that I find a bit misunderstood in Power Query is whether or not Data Types matter. I mean, I’m sure everyone agrees to some point that they do, but you do know just how much they matter, and why?

Over the weekend, I received an email from one of our loyal students which read, in part:

I am going under the assumption that in Power BI nothing is free. Steps, calculated columns, measures and so on could be cheap but they are never free. It is with this premise in mind that I pose the following theory.

Users should sparingly use Change Type in Power Query. It is fine to Change Type when converting Date/Time to Date, Decimal Number to Whole Number, and others that actually change the value. It is a waste of resources to Change Type from Decimal or Whole Number to Currency. Even if you want the column to look like Currency, you can apply that format in the Data Model view and save one Power Query step.

On the face, this theory seems somewhat reasonable. After all, adding additional steps is bound to add some overhead to the Power Query process in most cases. And let’s be honest, in small models, it may make no difference to you at all. But when things get bigger…

Data Types vs Formats

To understand what Data Types matter, we need to get something very clear right off the bat: Data Types and Formats are not the same thing. Data Types dictate what kind of data you have and determine how much memory is allocated to store a value. Formatting, on the other hand, tell you how you want the values to appear. To see this in practice, have a look at the following values in Power Query, where the Data Type has been set to Currency:

Column set to Currency Data Type

Notice that they only place you see a $ sign is in the header. And see how the decimal numbers do not line up? I can tell you from many years of teaching accountants, that this drives them bonkers. That 1.7 needs to be 1.70! But you don’t do this here, you do that in the Excel worksheet, Power Pivot model or Power BI visual.

They key to remember here:

  • In Power Query, you define Data TYPES
  • In Power Pivot, Excel or Power BI, you define Data FORMATTING

Excel’s Data Types

In the classic days of Excel, we only had four data types that we had to be worried about. Those were:

  • Text,
  • Numbers,
  • Blanks, and
  • Errors

(Although represented by textual patterns like #REF!, #N/A, #NAME?, and the like, they actually count as a different data type.) Today it gets a bit more complicated with Stock and Geography data types, as I blogged about here, but anything else was just a number that was formatted with a number formatting string. Examples of these include:

  • Date: format it as a number, and you’ll see the number of days since Jan 1, 1900)
  • Time: a little trickier, but this is just a fraction of a day where .25 is 6:00 AM, .5 is 12:00 PM and .75 is 6:00 PM. And if all you have is the time, and you format it as a DateTime, you get that time on Jan 1, 1900.
  • True/False (Boolean): while this shows in the cell as TRUE/FALSE, which looks like text, if you multiply them by 1 you’ll see that TRUE equates to 1, where FALSE equates to 0.

But that was Excel, where data types and number formats where the same thing. That’s not Power Query.

Power Query’s Data Types

Unlike Excel, which has a pretty short list of data types, Power Query seemingly has a ton:

  • Numeric Data Types:
    • Decimal Number
    • Currency (Fixed Decimal)
    • Whole Number
    • Percentage
  • DateTime Data Types:
    • Date/Time
    • Date
    • Time
    • Date/Time/Zone
    • Duration
  • True/False (Boolean)
  • Text
  • Others (Binary, Tables, Records, Lists, and more)
  • Any (the dangerous “undefined” type which allows the application to determine the correct data type)

The key to recognize, is that each of these data types is DISTINCT, meaning that each of these is different in some way from every other. (While we won’t get into it in this post, unlike Excel which implicitly converts data from one type to another, Power Query requires explicit type conversion via use of functions like Number.From() or Date.ToText(), which can be frustrating at times!)

For the purpose of this post, however, I want to focus on the first three numeric types: Decimal Number, Currency and Whole Number, and ask the question: Does the Data Type matter?

Illustration Background

Let’s have a look at an example. For the illustration, I set up a Power Query chain that looks like this:

Viewing our Query Chain

The Data table had 100,000 randomly generated [Sales] records that vary between 1.27317262341058 and 100000.017761279, and randomly generated [Customers] that vary between 1 and 1000.

The only change I made in the Whole, Currency and Decimal types was to set the Data Type for the [Sales] column accordingly. In the Customers table, I removed the [Sales] column and duplicates from the Customers column.

I then created 3 quick measures to sum the column up, and dropped them on a Pivot:

Summing up our 3 measures

I don’t think it would surprise anyone here that the [Whole Sum] is slightly different than the [Currency Sum] and [Decimal Sum]. After all, the numbers were rounded at the source before being added up. And so far, we can see that the [Currency Sum] and [Decimal Sum] look pretty much the same. At least until we expand them a bit:

Decimal differences between Currency and Decimal Data Types

The only thing that should be a surprise here is that currency only holds up to four decimals, not two as most people expect. This actually makes sense when you start thinking about foreign exchange transactions, and how they are always carried to four decimal places.

But is that it? Is a Data Type just about rounding? As it turns out the answer to that is both yes and no.

Testing the Model Memory

The next step here was to test the model memory and see how Power Pivot is storing the data. To that end, here’s a little table that shows exactly that:

Table illustrating how Power Pivot is storing the data

Before we get into this, I want to call out something important here. The Data Type that is showing has been read from Power Pivot. Notice that Whole Number shows as such, consistent with Power Query. But the Currency and Decimal tables both show Decimal. As it turns out, Power Pivot doesn’t make a distinction between these two data types. However, the distinction between these two Data Types matters to you anyway, as I’ll explain.

So, what does that all mean?

In the Whole Number table, I had rounded off all the decimals. This left 63,815 unique values. And because Power Pivot compresses based on unique values, it deals with this column very well, resulting in a total of 260.54 KB to store these values.

In the Currency table, I effectively rounded all the values off to four decimal places. This left 99,996 unique values in the table (only 4 values were repeats). Despite this, Power Pivot did a good job of compressing the values in memory, resulting in 390.75 KB to store them.

Then we get to the Decimal column. There are only 4 more unique values than in the Currency column, but the memory takes a colossal 5,234.47 KB to store the values, vs the 390.75 KB of Currency. What the heck is going on?

The answer lies in the fact that Power Pivot has to carry all of those decimal places, and once it does, it can flip to storing data using Hash Encoded memory. That’s a deep topic for another time but suffice it to say that this is a bad thing, as Value encoding is much more efficient. (Hash is generally used for Text, and Values for… you guessed it… values!)

Interestingly, if you round the Decimal Number to 5 decimals you end up with 99,999 unique values in the column and a very minor change to the memory used. But if you round it to 4 decimals, the memory of the Decimals column compresses the same as Currency, and the memory need drops to the same 390.75 KB.

Table illustrating memory requirements of the different Data Types

Why Model Memory Matters

And this is the final part of the equation to me. Not all time is created equal. My users will accept a 2-minute refresh of the data model. They might say it takes time, but +/-30 seconds in a refresh isn’t anything that they’ll get overly concerned with. They’ll click Refresh, go grab a coffee, then come back to work with the file.

But if they click a slicer and it takes 10 seconds to redraw the Pivot Table or Power BI visuals? Watch out! They’ll claim the model is too slow, ineffective, wrong and useless. I’d MUCH rather push resource consumption into the initial refresh in order to build a memory-efficient model that performs well when being used for analysis.

What, wait happened to the Data Types?

To be fair, Power Pivot’s compression mechanism is more about unique values and the length of precision than it is about Data Types. But it’s up to you to choose the correct Data Type to future-proof your model and make sure that the compression algorithms can be applied.

But due to the way Excel has always worked, the way the Data Types are named, and the fact that most modellers don’t have a clear understanding of Formatting vs Data Types… users are more likely to pick Decimal over Currency. I mean, why would I ever format my units as Currency? (In Power BI this is called a Fixed Decimal, although it still shows with the currency symbol.)

We need to recognize that lagging decimals really do happen in business. Let’s look at that Units column for a second. Naturally we never sell a partial unit… or do we? I’ll give you 3 examples of things I’ve seen in the Food and Beverage industry that forever changed my opinion on this:

  1. We had a couple of customers who insisted that they be able to purchase half a muffin. I’m not even kidding here. The killer for me was that it only showed up every 10,000 rows of transactions or so, meaning that the column often got set to Whole Number incorrectly.
  2. The salesperson agrees to provide those 300 units for $5,000. No big deal except that they monkey the price field in your database to make it work and you suddenly you have a sales price of $16.666667. Fortunately, this one usually gets rounded via setting it to a Currency Data Type, as that’s logical. But what if you set it to decimal or left it undefined?
  3. Things go the other way and the customer agrees to a package price of $5,000 for something that usually carries a price of 17.00 per unit. It gets rung into the system with a reverse engineered quantity of 294.1176470588235 to get it exactly right.

The last is the real kicker as the memory suddenly flips from Value to Hash, the RAM needed to refresh the data expands exponentially and brings your model to its knees. And now every slicer click has gone from 1 second to update your visuals to the better part of 10 seconds. And your audience is screaming that the model is “unusable”.

My Recommended Practice

The final step in every query I build which loads to a destination (an Excel Table or the Data Model) is to correctly define my Data Types. This does two things for me:

  1. It ensures that I never accidentally load a column with an undefined Data Type. (In the case of Dates, they load to Excel as numbers, and to the Data Model as text!)
  2. It ensures that I’m cutting down to the minimum number of decimals I’ll ever need for my analysis.

Does it take more processing time? Maybe marginally. But does it future-proof my solution to keep it performant? Definitely. And it ensures the time to do so happens in the refresh, not in when the model is being used for analysis.

Some Final Thoughts

The issues you saw here with memory also affect DateTimes in a big way, as they are quite similar to decimal numbers, where time is represented as the fraction of a day.

The article touches on some things from a much bigger topic: how to optimize a Power Pivot/Power BI data model. I actually teach a full day course on this topic for CPABC, where we cover how the Vertipaq engine compresses its data, why you want to avoid calculated columns and fall in love with measures. We make extensive use of Power Query in this course to reshape tables (applying correct data types along the way) so that they can be consumed by the data model in a more efficient manger. We dissect a poorly built model, test its memory and then rebuild it in stages seeing how our changes made an impact.

And if you’re looking for tools to help with this process… stay tuned to this blog. I’m hoping to have an announcement on that front within the next couple of months.

Power Query Intellisense and Highlighting in Excel

I'm super excited that Power Query Intellisense and Highlighting features are finally here in Excel!  It's been a long time coming, with these features debuting in Power BI Desktop months ago.

Where do Intellisense and Highlighting Show up?

Let's look at all three places that Intellisense and Highlighting are now exposed in Excel's Power Query.

Intellisense and Highlighting in the Formula Bar

The first thing you should notice about the image below is that Excel now adds colour highlighting to the text in the formula bar.  "Text" values are shown in red, keywords in blue, and data types in green:

Intellisense and Highlighting in Excel's Power Query Formula Bar

In addition, you can see Intellisense showing in the picture above as well.  In this case the cursor is immediately after the opening parenthesis character.  Unlike in the past, we now get a nice syntax popup box that tells us what needs to come next in this function.

Intellisense in Custom Columns

This is the place where I really wanted to see this happen.  Here's a look at Intellisense working live as I started to build a column using a Text function:

Intellisense in Excel's Custom Column Dialog

Pretty sweet, as this actually starts to display the options.

(Now, to be fair, in my build of Excel, this feature only seems to consistently work if I rename the column first, then tab into the formula builder.  If I just click straight into the formula, bypassing the tab order, it doesn't seem to kick in.)

Syntax Highlighting in Custom Columns

How about syntax highlighting?  The formula here is only meant for demo purposes, but you get the idea of how it can look:

Syntax Highlighting in Excel's Custom Column Dialog

Blue keywords for if, then, else.  Hard coded values (like 3) show up in green like data types, and text is showing in red.

Intellisense and Highlighting in the Advanced Editor

And finally, Intellisense and highlighting work in the Advanced Editor as well.  Here's a look at the syntax highlighting:

Syntax Highlighting in Excel's Advanced Editor

And here's what happens when we start manually adding a new step, which kicks in the Intellisense in the Advanced Editor window:

Intellisense in Excel's Advanced Editor

How do you get these features?

First, you'll need to be on Office 365.  Sorry Excel 2016 and 2019, but my understanding is that these new features are only coming to subscription users.  (That's another reason that - in my opinion - you should never buy another 4 digit numbered release of Excel ever again.)

If you are on subscription, you get them with your regular updates.  This feature set hit the Insider build channel last month in version 1907.  It's currently also been deployed to those on the Monthly Targeted channel running version 1907, build 11901.20080 or newer.

If you're not on the Monthly Targeted channel, you'll need some patience, as it's coming.  Just keep checking those updates!

Microsoft Business Applications Summit Recordings

Ken had a blast in June going down to Atlanta for the Microsoft Business Applications Summit (MBAS). There, he and co-author Miguel Escobar led an in-person workshop together for the first time. Ken also presented a breakout session on best practices for spreadsheet modeling, data shaping and data analysis. However, his real highlight was helping people with their Excel questions at the Ask the Experts booth.

Ken at the Microsoft Business Applications Summit 2019

Ken Puls hanging out at the Ask The Experts booth with his 'M is for Data Monkey' co-author Miguel Escobar and Excel legend Bill Jelen (aka Mr. Excel).

At MBAS, Microsoft also unveiled their new Power Query website. It's wonderful to finally have an official Microsoft site dedicated to this amazing tool. In addition, we're extremely proud that the Excelguru blog, along with the books and website created as part of our Power Query Training project, are listed in the Resources section!

Microsoft Power Query Website

We are thrilled to be included on the Resources page of the new Power Query official website!

On-demand Session Recordings

If you weren't able to make it to MBAS, or didn't get to all the sessions you wanted to, Microsoft has tons of on-demand recordings available for FREE! This is an amazing resource to help you continue to learn about and explore the Microsoft ecosystem. You can check them all out at the Microsoft Business Applications Summit website. Microsoft broke the sessions down into 4 streams: Power BI, PowerApps, Microsoft Flow, and Microsoft Dynamics.

Excel Sessions

Microsoft included Excel sessions in the Power BI stream under the "Other" category. Thus, you may find them a bit tricky to find. Luckily the Excel team gathered together a list of available recordings that we wanted to share. This includes Ken's session with David Monroy of Microsoft on data modeling and analysis best practices:

Power Query Sessions

The Power Query sessions at Microsoft Business Applications Summit were also part of the Power BI stream. Hence, we've compiled a list of available recordings to make them easier to find:

Unfortunately, we do not have a recording of Ken and Miguel's workshop on Working with Data in the Power Platform and Excel.

Microsoft Business Applications Summit 2020

Mark your calendar - next year's event will be held in Anaheim, CA on April 20 and 21. Additionally, you can sign up to receive updates for MBAS 2020 info on the event site. Perhaps we'll see you there!

Self Service BI with Excel and Power BI

This week Ken is at the Microsoft Business Applications Summit in Atlanta. However, we're super excited that he'll be back on home turf in July for our next public training session. The 3-day Self Service BI Boot Camp in Vancouver will change your company's reporting game forever!

Is Self Service BI a Double-Edged Sword?

Matthew Roche, a member of Microsoft's Power BI CAT Team and noted sword enthusiast, recently posted about a trend he's seeing. His global customers often describe self service BI as a "double-edged sword." After thinking about this comparison, he felt the simile held up. But perhaps not in the way you think!

In the post on his blog, Matthew summed it up nicely. Having two sharp edges is only dangerous to the wielder if they don't know how to effectively use each of the distinct and complementary edges together.

Approaching Self Service BI for the First Time

Perhaps you've been thinking about implementing some self service BI solutions within your organization. And you've been hearing a lot of buzz about Power BI. But you're unsure where to start or how Power BI fits in with your existing data platforms and reporting tools.

Our Self Service BI Boot Camp, with 3 full days of immersive hands-on learning, might be just the place to begin. Guided by the Excelguru, Ken Puls, you'll work with both Excel and Power BI to build some really cool self service BI solutions! We'll not only show you which tool is right for which kind of job, but how to use them together in one solution.

What will the Boot Camp Cover?

We know that data is not usually stored in nicely curated databases and often - even when it is - the data analyst doesn’t have access to it. Instead, we must piece together data provided in text files, Excel files, web pages, and even the body of emails to build the business intelligence solutions we need. Until now this has been a painful process with a great deal of cut and paste effort, and updates have been tedious and error prone. That stops with this workshop!

Building Modern BI Solutions

In the Self-Service BI Boot Camp we'll dive into:

  • a deep exploration of Power Query for gathering, cleaning, and preparing your ugly data
  • the benefits, concepts, and key terminology of Dimensional Modeling
  • an introduction to DAX,including:
    • how to create a variety of DAX measures
    • how to control their Filter Context
    • understanding how DAX measures are calculated

Learn how CALCULATE works in our Self Service BI Boot Camp

  • calendar intelligence, such as:
    • building calendar tables
    • using the “Golden Date” pattern
    • extending our model to report based on our own year-end
  • how to leverage the strengths of Power BI for sharing and reporting
  • specific features of Excel and Power BI that every analyst should know, plus:
    • which tool to use for which job
    • how they can be used together
  • recommended best practices for boosting the performance of your self service BI solution

You can read more about the class on the Excelguru website. Likewise, check out the course outline to see what will be covered in each of the 3 days.

Top 5 Reasons to Attend

  1. Our Self Service BI Boot Camp is loaded with hands-on, practical experience. As Ken's friend, Thomas LaRock, recently commented on Twitter, "Telling isn't teaching." Our philosophy is that you need to actually work with a tool in order to really learn it.
  2. The small class size provides a more intimate learning environment. Working with a small group allows Ken to interact directly with attendees, answering your questions and fielding discussions as they come up. You get to pose questions and discuss scenarios with one of the world's leading self service BI experts. However, this means that spots are limited, so don't  miss out on your chance to sign up.
  3. There's no better time to come to Vancouver than in July - it's a beautiful city all year round, but is in its prime this time of year. That being said, while we can't 100% guarantee great weather, we CAN guarantee you'll get some GREAT training!
  4. Early registration pricing is now in effect. You can save $200 if you sign up before June 24, 2019. Simply go to our website and enter the following coupon code at checkout: SSBIJULY2019.
  5. Registration includes some valuable resources to take with you, so you can refer to them later. You'll get a copy of Ken's slides, many of which contain handy recipes to walk you step-by-step through the techniques. Additionally, youreceive copies of all the class example files, including completed versions for later review. These examples are based on real-world scenarios and provide you with techniques that you can being applying to your own data right away.

Unpivot Subcategorized Tables with Power Query

So what are you waiting for? Come and join us to revolutionize your reporting process!

 

Creating Two-Tiered, Multi-Frequency Period Cycles

Sometimes, what should be easy code to write has unexpected pitfalls resulting in opportunities for new learning. I needed to generate a one or two-tiered stream of date values with multi-frequency period cycles, monthly vs annual.

For example, after installing a piece of equipment I wanted to schedule 12 monthly calibrations, beginning with the month after installation, followed by 3 annual calibrations, beginning 6 months after the last monthly calibration.

This is a table of schedule parameters that defines a few sample scenarios:

Example table outlining desired multi-frequency period cycles

Create a Simple List using List.Generate()

My plan was to create generic schedule scenarios, based on specific types of equipment and use period parameters relative to the installation month.

My first thought was to use Power Query’s List.Dates() function to create the sequence of dates, but List.Dates() only increments by combinations of days/hours/minutes/seconds. Not helpful when you need to increment by months. Consequently, I turned to the List.Generate() function to generate the list of periods.

First I wrote some M-Code to test the concept using hard-coded parameters:

Query: JustCreateList
M Code:

M code for JustCreateList

And the results were this:

Results for JustCreateList query

Great! That was easy. If I can make a list and add it to each row of a table, then I can expand each list and build cycle dates.

Use List Values to Create Dates

Query: BuildDates_HardcodedParams
M-Code:

M code for BuildDates_HardcodedParams

And the results are:

Results for BuildDates_HardcodedParams query

So far, so good. (I may leave early today.)

Use Column Values as List.Generate() Parameters

Next, let’s use column values to drive the Lists. What could be easier, right? Here’s one of my many failed attempts:

Query: BuildDates_ColValParams_V1
M-Code:

M code for BuildDates_ColValParams_V1

And the results are:

Error generated by BuildDates_ColValParams_V1

Wait! What??? Where are my lists? And, what “field access”?

I Discover “Internal Context” and “External Context”

I won’t go into all of the iterations I tried to get that darn code to work. Suffice it to say that I tried every conceivable variation of “this”, “each”, “_” and anything else I could think of. All to no avail… Same error message.

Now, I could have built a separate function (which I did, and it worked):

Query: fnCreateList
M-Code:

M code for fnCreateList

I also could have embedded a function within the M-Code (which I did, and it worked):

Query: SingleTier_EmbeddedFunction
M-Code:

M code for SingleTier_EmbeddedFunction

BUT, I wanted a concise, elegant solution. I wanted to just put the List.Generate() function in my code and reference the table columns, just like I would with an Excel function (which I did and…well…you saw the results). I needed help.

So, I posted on Ken’s forum. And who do you think volunteered to assist me? Miguel! (Thank you, Miguel.) Once he established that my interest in referencing column values in List.Generate() had gone way past idle curiosity and quest…all the way to “mission from god”, he gave me a working example and a concise explanation.

Handling Internal and External Context

Because the List.Generate() function is essentially recursive, it has to look at its own sequential results and test them against its internal limits. It needed explicit code to indicate when it should use external context (table references) and when it should use internal context (intermediate function results). I won’t pretend that I understand the “why” of the required syntax, but I quickly picked up on the “how”.

Here’s the working M-Code that uses column values in the List.Generate() function to create the first date sequence:

Query: SingleTier_EmbeddedInnerContext
M-Code:

M code for SingleTier_EmbeddedInnerContext highlighting (listval)

By simply creating a dummy parameter (listval), the List.Generate() function automatically populated it with its own sequential results and it understood that the column references pointed to the Table! I could have named that parameter “moonchild” or “Sue” or anything else. All that mattered is that it had something to populate.

Now, my results were valid:

Results for SingleTier_EmbeddedInnerContext query

Over the first major hurdle and it was a BIG one!

Combining Lists to Generate Two-Tiered Multi-Frequency Period Cycles

The rest of the solution was relatively easy after that. I needed to allow for a second tier of cycle dates.The basic steps were:

  1. If Tier1 begins after the Install Date, make a placeholder for the Install Date… a period zero.
  2. Create the Tier1 date sequence.
  3. If there’s a Tier2 sequence, create it.
  4. Sequentially append the constructed cycle date sequences.

Since that involves M-Code not covered in this blog post, I’ll just post the final solution:

Query: TwoTier_EmbeddedInnerContext
M-Code:

M code for TwoTier_EmbeddedInnerContext highlighted (listval)

And here are some of the results of my table containing the desired two-tiered, multi-frequency period cycles:

Final table displaying our multi-frequency period cycles

Power Query/Get and Transform has steered my career into new, creative, challenging, and rewarding directions. I’d like to thank Ken for giving me the opportunity to share what I learned about internal/external context and my learning process. I welcome any and all questions and comments.

-Ron Coderre

 

Task Tracking with Power Query

Did you know Power Query can be used as a task tracking tool? This might sound quite unusual but the method described here has been used for solving a real business case. The example I will use is rather simplified but still close to reality, and will demonstrate how to build task tracking with Power Query.

Laying out the Scenario

Vicky is a manager of a small team that is dealing with customer questions on various topics. One of her duties is to distribute various questions among her subordinates. After that, each of them should take some actions and report what is the status of each task.

The problem is – how can each employee can see what tasks are assigned to him/her and fill in the respective information for each task? At the same time, Vicky should at any moment be able to assign a new task and review the statuses of old ones. This is the table Vicky needs:
Task Tracking with Power Query

Unfortunately, she has no other tool at hand except Excel. Luckily, she can set up task tracking with Power Query right in Excel, which could work perfectly in this case.

Setting up Task Tracking with Power Query

So let's start building the solution.

1. Load the left table (in this example, called Filled by Manager) into Power Query.
Manager's table to assign tasks

2. Next, create one query for each employee by filtering the Employee column.
Create individual employee queries

3. Load each Employee table into a separate Excel sheet. (Of course they can be on different files linked to the source table).
Sample employee table showing tasks assigned

4. Then, create a table for each employee to fill in the actions and statuses.
Employee's task tracking worksheet

You can see in the above picture what each employee will have in his/her worksheet - a green table on the left with the tasks assigned to them, and a yellow table on the right where he/she has to fill in the respective information.

Creating the Filled By Employees Table

5. Load all the Employee tables into Power Query.
Load all the employee tables into Power Query

6. Append them in a new query (in this example, called Statuses).
Append all the employee tables into new Statuses query

You are probably guessing what the next step is – load the Statuses query into Excel right next to the Filled By Manager table

However, the result is not what we would expect.
The Filled by Manager table is not matching the newly loaded Fill by Employees table

Note that on first row of the Manager’s table is a task assigned to Ivan on 27.01.2019, but row 1 of the Employee’s table shows the task assigned to Maria on 09.02.2019.

In order to fix this mess, we need one additional query.

Building the Task Code Query

7. Once again, load the Manager’s table into Power Query and remove all columns except for Task Code.
Task Code Column

Task Code is a unique identifier of each task. In this example, it is simply composed of the employee's name and the number of occurrences of this name in the register up to the respective row. In Excel language, the formula is:
Use COUNTIIF to create unique task identifiers

The trick is that we fix the first row of column F (containing the employees' names) but the end row is not fixed.

8. Merge the Register Employees and Statuses queries together.
Merge the Register Employees and Statuses tables

9. Finally, expand the table and voila - it is in the required order. The only thing left is to load it back into the Manager’s table.
Final table for task tracking with Power Query

Now, any time she needs to, Vicky can refresh the Filled by Employees table and see the updated statuses of each task.

Likewise, each one of her subordinates can simply refresh the Manager’s table (the green one that is on left of his/her tab) to see any new tasks that have been assigned.

You could also automate the refresh operation VBA. For more details, refer to Chapter 16 of Ken's M is for (Data) Monkey book.

Final Words

This article presents nothing new and unusual as a Power Query technique. What is new and unusual is the way Power Query has been used for solving a typical business problem. This is just additional proof of how powerful and useful this tool is.

You can find the file with example here: Task tracking with PQ

Become a Data Master with Power Query

Ken is really excited to be teaching his popular Master Your Data with Power Query class for the first time in New York City! Join us this spring for a small group hands-on workshop and learn how to become a Data Master with Power Query.

Learn to become a Data Master with Power Query

Join Ken Puls for a live hands-on session in New York, NY on April 17, 2019.

What's so great about Power Query?

If there is one thing you need to learn in Excel today, Power Query is it. With Power Query, you can clean, reshape, and combine your data with ease. No more tedious cutting and pasting between multiple files. No more manually removing garbage rows or adding new columns. And no more repeating the same time-consuming steps whenever you need to refresh the data.

Instead, once you have your data the way you want it, all you have to do is click refresh and it will be ready to be loaded into the next day's/week's/month's/quarter's report. You can even schedule these refreshes to happen automatically!

What will be covered in the workshop?

The day will begin with a quick overview of Excel tables, PivotTables, and what makes "good" data. Next, Ken will show you how to import data from a wide variety of files, including Excel workbooks, CSV and TXT files, databases, and even entire folders. You'll be able to clean, transform, and refresh your data in Power Query with just a few clicks.

Ken will also show you how to append (or stack) data from multiple tables and 7 ways to merge (or join) tables without any VLOOKUPs. You'll be able to pivot data like this:

Pivoting Data with Power Query

and unpivot data like this:

Unpivoting Data with Power Query

But wait, there's more!

Ken will teach you some more advanced techniques using conditional logic. He will also give an overview of best practices for structuring your queries and query folding. Additionally, you'll receive a copy of the course slides to refer back to. Many of these slides contain handy recipes that will lead you step-by-step through the data transformation techniques.

Not only that, you will be able to directly ask a leading Power Query expert to help you with challenges you are currently facing with your own data. That kind of in-person access is invaluable!

How can Power Query help me?

A data wrangler spends the majority of their time just gathering, cleaning, and preparing the data before it can be even used in a report, chart, or other data model. Instead, become a Data Master with Power Query and get hours of your time back. For example, Ken was able to help a workshop attendee automate a workflow in 30 seconds, saving them 6 hours per week!

Power Query is the data preparation tool of the future, not only for Excel but also for Power BI Desktop, Microsoft Flow, and more. Thus, everything you learn in this course is transferable to other technologies - giving you more bang for your buck.

How do I make the case to my boss for sending me to the course?

As Ken mentioned in a previous blog post, the cost of the course can look like a lot up front. This is especially true if you must also pay for travel, hotel, etc. But divide the $499 USD registration fee + any expenses by your hourly rate. You'll see a return on investment pretty quickly with the time you save using Power Query.

However, the real value of the training comes in when you look at what you can do for your company with that extra time. You can now focus on analyzing the data instead of preparing it. See, Power Query turns your data into information. It allows you to identify new opportunities, make better decisions, and add real value to your organization.

Where do I go to become a Data Master with Power Query?

Ken will be leading this full-day workshop on Wednesday, April 17, 2019 at the NYC Seminar and Conference Center in New York City. But this is a small group session so there are only a limited number of spots available. Register today to secure your chance to receive personal guidance from a world-class Power Query expert. Go to the Excelguru website to view the full course description and register online.

Cache Shared Nodes Fix is Live

At long last, we have confirmation that the Cache Shared Nodes Fix is live in Excel.  If you're not familiar with this issue, it's one of the most important changes implemented in Power Query in quite some time.  You can read more about the issue in my guest post on Rob Collie's blog here.

What versions of Excel will get the Cache Shared Nodes Fix?

The Cache Shared Nodes fix is available to:

  • Office 365 subscribers
  • Excel 2019 (non-subscription) versions

This leaves you with the inefficient multi-refresh challenge if you are using Excel 2010, Excel 2013 or Excel 2016.  My understanding is that Microsoft does not intend to back port to these versions.  What that means to you is that in in order to get the fix, you will need to upgrade to a newer version.

Do I have the Cache Shared Nodes Fix?

You need to be running Excel 16.0.10726.* to have the update.  To check if you have it, go to File --> Account --> About Excel.  Your current version and build are listed at the top:

Office 365 Insider's Build

Excel 2019 Professional Plus (non-subscription)

How do I update my Excel 365/2019 to get the Cache Shared Nodes Fix?

For users of Excel 2019, make sure your Windows Update settings include the advanced option to get updates for other Microsoft Software.  If your version is not updated yet, it should come through on your next update cycle.

For users of Office 365, you should actually already have the fix in place.  If not, go to File --> Account --> Update Options.

(There is a possible exception for Office 365 if you're running on the Deferred Channel for updates.  If that's the case, you either need to get onto a more current channel, or... wait until the deferred channel also has the fix.)

Building BI in Excel Course

Are you interested in learning how to clean and shape data with Power Query, as well as how to model it using Power Pivot? Don’t know which of these mysterious skills to tackle first? Want to learn about building BI in Excel where you create refreshable and maintainable solutions?

Good news: Ken Puls will be in Wellington, New Zealand on February 25-26, 2019 leading a live 2-day, hands-on session covering these essential skills!

What does Building BI in Excel cover?

In Day 1, you’ll learn how Power Query can clean up, reshape and combine your data with ease – no matter where it comes from. You can convert ASCII files into tables, combine multiple text files in one shot, and even un-pivot data. These techniques are not only simple, but an investment in the future! With Power Query’s robust feature set at your fingertips, and your prepared data, you can begin building BI in Excel using Power Pivot. The best part is that these dynamic business intelligence models are refreshable with a single click.

Un-pivoting Subcategorized Data

Un-pivoting subcategorized data is easy with Power Query

Day 2 focuses on Power Pivot, a technology that is revolutionizing the way that we look at data inside Microsoft Excel. Power Pivot allows you to link multiple tables together without a single VLOOKUP statement. It also enables you to pull data together from different tables, databases, the web, and other sources like never before. But this just scratches the surface! We'll also focus on proper dimensional modeling techniques and working with DAX formulas to report on your data the way you need to see it.

Top Selling Servers Report

Build dynamic reports that are easy to filter and refresh

Who is this course for?

Building BI in Excel is for anyone who regularly spends hours gathering, cleaning and/or consolidating data in Excel. It's also valuable for anyone responsible for building and maintaining reports. Participants must have experience using PivotTables. Some exposure to Power Pivot and Power Query is not required but is a bonus.

Where do I sign up?

We are offering this course in conjunction with Auldhouse, a leading computer training company in New Zealand. Go to the Auldhouse site and use the following promo code EARLYBIRD20 to give yourself a 20% discount.

This will knock $300 NZ off the course, bringing it down to $1200. That’s $600 per day for pretty much the best introduction to both Power Query and Power Pivot that money can buy! Then use your new skills to free up 90% of your data-wrangling time, giving you time to negotiate a 20% pay increase*. Unbeatable ROI!

Don't miss out, the early bird discount is only available until January 31, 2019! Visit the Auldhouse site today for full details and registration.

*Numbers are indicative only, your mileage may vary. Heck, it may be way better than that!