The Data Insights 2 Day Master Class

I’m super excited to be presenting a Data Insights 2 Day Master Class in Wellington, NZ with my good friend Matt Allington.  This is the first time we’ll be working together to bring our unique strengths to our participants in a joint session format, and it’s going to be AWESOME!

Ad for the Data Insights Masterclass in Wellington NZ

How is the event going to work?

We think you’ll love this.  We’re going to divide our group in two.  You’ll get a one full day with me on Dimensional Modeling, and one full day with Matt, which focuses on the DAX formula language.  These two components are essential to understand when you want to build truly dynamic, scalable and stable data models, and we're going to cover both in detail.

What is covered in the Dimensional Modeling day?

Ken will be looking deeply at how to structure your data for a successful Excel/Power BI data model.  You’ll learn how your data should be shaped, what the data model expects in its tables, and a variety of techniques and patterns to work around common join problems.  Our goal here is very simple: to teach you everything you need to lay the foundation for a data model that will stand the test of time.

But not only will you lean practical hands on techniques to lay this groundwork, you’ll learn the key terminology at play.  By the time you leave this session you’ll be able to identify things like ‘facts’, ‘dimensions’, ‘relationships’, ‘schemas’, ‘slowly moving dimensions’ and much more.  Armed with this knowledge you will be able to not only design your own models properly, but you’ll be able to understand other materials you reference during your career.

As you might expect from one of the world’s leading voices on Power Query, there’s going to be a heavy focus on Power Query in this course.  But it's Power Query with a purpose: to feed a Power Pivot Data Model.

What is covered in the DAX Formula day?

Matt will take you into the world of DAX formulas, exploring how this incredible language can be used to summarize virtually any statistic you want to know.  He’s one of the world’s experts in the DAX language and will teach you not only what you SHOULD do with DAX, but what you SHOULDN’T.

When Is This?

Soon!  It’s going to be hosted in Wellington, NZ on Feb 24 and 25, 2020.  But the good news is that there are still seats available, and we’d LOVE to see you there with us.

How Much and Where Do I Sign Up?

Great questions!  Head over to ExceleratorBI for all those details.

Remove Dynamic Number of Top Rows

Removing the top five rows from a data set is easy in Power Query, but what do you do when the number of rows changes?  There isn’t a built-in Remove Dynamic Number of Top Rows function.  In this post we’ll look at how to set this up.

Illustration of the issue

Assume you have the following report, and you’re only interested in the Cider sales:

With Cider starting in row 9, we’d need to remove the top 8 rows.  That’s fairly easy.  You just need to:

  • Go to Remove Rows -> Remove Top Rows -> 8
  • Promote headers
  • Do whatever else you need to do to the data

But then you get an updated version of the data set, and it looks like this:

Uh oh.  Best case, if you run the previously generated Power Query script, you’ll end up with the following result:

But more likely, if you promoted the clean header row from the original data set, you’ll get a step level error since the revised data set doesn’t yield a “Cider” column when row 1 (shown above) is promoted to header:

Regardless of which one of these scenarios appears worse to you, I think we can agree that neither one is desired.  So how do we make this work on a dynamic basis?

Solution Architecture

The way I approach this issue is to split the job into 3 queries as follows:

Let’s look at how this works in practice…

Query 1:    Raw Data

The purpose of this query is quite simple:

  • Connect to the Raw Data source
  • Perform any preliminary cleanup
  • Rename the query as “Raw Data” (add something descriptive if you have many data sources
  • Set the query to load as a Connection Only query (disable the load in Power BI)

The key thing to note here is that we’re not doing any work to remove top rows beyond things that we know will ALWAYS occur.  We may want to drop columns and other things to reduce our data set, we just don’t want to touch anything we can’t guarantee will be exactly the same when we get updated data.

In the case of the data sample I showed above, I’m just going to connect to the data set and load it as connection only.  (While I could make an argument that the first 3 rows will always need to go, I will get rid of those when filtering to just the cider header anyway.)

Query 2:    Generate the Dynamic Row Number

The next step is to generate the number that will indicates the dynamic number of top rows we are looking for.  Despite the fact that the row which holds our data is changing, this is actually relatively easy once you know how:

  • Right click the Raw Data query -> Reference
  • Go to Add Column -> Add Index Column -> From 0
  • Filter one of the columns to the data you are looking for
  • Right click the [Index] Column -> Remove Other Columns
  • Go to Home -> Keep Rows -> Keep Top Rows -> 1
  • Right click the value in the cell -> Drill Down
  • Rename the query as “HeaderRows”
  • Set the query to load as a Connection Only query (disable the load in Power BI)

You now have a query that will dynamically pick up the number of rows to be removed from the top of the data set before it encounters the text you are looking for.

Step 3:       Remove Dynamic Number of Top Rows

So now comes the magical part:

  • Right click the Raw Data query -> Reference
  • Go to Home -> Remove Rows -> Remove Top Rows
  • Type in the current number of rows to remove (for this example, we’ll assume it is 8 rows)

The formula bar will now be showing the formula =Table.Skip(Source, x ) where x is the value you typed in:

  • Replace the value with “HeaderRows”

CAUTION!  Power Query is case sensitive.  You must spell and case HeaderRows EXACTLY as you did previously.  And if you separated those two words with a space, you need to escape it with hash marks and quotes:  #"Header Rows"

If you’ve replaced everything correctly, you should see that everything still works:

Does it Work?

Here’s what we see when we point RawData to the second data set I showed earlier:

The sample file for this example can be downloaded here.