Excel 2016 Updates

I was a bit surprised to see some Excel 2016 updates when I opened it up this morning.  For reference, I am on an Office 365 early release program – so I might get these a bit before you do – but how cool is this? Some of the key ones that made me take note:

New Formulas

We’ve got some new formulas to add to our arsenal.  I haven’t tried any of them yet, but the ones listed were:

  • CONCAT
  • TEXTJOIN
  • IFS
  • SWITCH

Chris Webb just posted a blog on the first two, IFS sounds useful, but SWITCH… Are you kidding me?

image

I LOVE that function in Power Pivot and am just itching for an excuse to use this one in a real world Excel project.

A New Chart Type

When Excel 2016 first came out, we saw some new chart types added to the product for the first time in… ages.  Those included:

  • Treemap
  • Sunburst
  • Histogram
  • Waterfall

And now we got another for those of us on the subscription:

  • Funnel Charts

This is a pretty simple one, but here’s a sample mocked up in about 3 seconds:

image

A Power Query (Get & Transform) Update

I put this last, but to me this is the biggest deal of the whole bunch.  The Power Query engine has been updated to version 2.29.4217.xxx.  It’s hard to see what’s been added, as the update hasn’t been released for Excel 2010/2013 yet, nor has a detailed feature page…

Having said that, a feature that I asked for a while back has finally been implemented:  Monospaced Fonts.

The importance of this is huge.  Power Query has always been big on using a pretty font, which wasn’t monospaced.  I.e. the characters weren’t the same width.  This is a big problem if you are trying to split by number of characters, as they just don’t line up.

Now, there is still an issue… Power Query is still aggressively trimming spaces (something that started with version 2.28.xxx) as you can see below:

SNAGHTMLdcb06b0

But, if you go to the Advanced tab and click the new Monospaced option, you get this beautiful view:

SNAGHTMLdcbe2f6

How much easier will that be for splitting columns based on width?  Like 1000% easier, that’s how much!

 Dear Power Query team

This is a fantastic feature, thank you.  I’ve got two asks for you:

  1. Can you get us the update for Excel 2010/2013 fairly soon?  We need this there as well.
  2. Can you please give me an option to set Monospaced as the default way to display my queries?  This is not due to the overzealous trimming issue (which I do want to see fixed) but rather because this is the way I need to see my data come in every time.

Thanks!

More about Excel 2016 Updates

If you want to see Microsoft’s official page listing all the new features in this Office update, or if you’d like to get into their early release program, have a read here: https://support.office.com/en-us/article/What-s-New-and-Improved-in-Office-2016-for-Office-365-95c8d81d-08ba-42c1-914f-bca4603e1426?ui=en-US&rs=en-US&ad=US

Excel Summit South

Yes, you read that right.  If you haven't heard yet, I'll be coming to New Zealand and Australia in just under a month!  And the entire purpose of the trip is to come and share Excel knowledge for with my friends and colleagues south of the Equator.

I'm pretty jazzed about this, and not just because I get to go to the southern hemisphere for the first time in my life.  And also not just because I get to talk about Excel when I'm there.  That would be enough, but no… I'm jazzed because I get to do this with some pretty cool friends who are world respected leaders in their area.

Excel Summit South

The main purpose for my trip is the Excel Summit South conference.  Two days, two tracks of advanced Excel material in 3 different cities:

  • Mar 3&4: Auckland, New Zealand
  • Mar 6&7: Syndey, Australia
  • Mar 9&10: Melbourne, Australia

And the best part about this conference is that – while it's sponsorsed by Price Waterhouse Coopers – regisration is open to everyone.  So basically, you can check the schedule, pick the sessions that interest you, and learn things that will impact your Excel skills.  In other words, if Valuation Modelling isn't your thing, then you can go to a Power Query class.  And if Power Query isn't your thing… well… you're kind of odd, but there will be something that is.  Smile

The cast and crew for this conference really can't be beat.  Charles Williams, Bill Jelen, Jon Peltier, Zack Barresse, Liam Bastick are all Excel MVP's on the bill (as well as Ingeborg Hawighorst in our New Zealand apperance.)  Heck, we've even got a couple of guys from Microsoft attending and presenting as well.  This is a fantastic opportunity to not only meet some of the big hitter independant Excel folks out there, but also to talk to Microsoft directly.  How can you pass that up?

My Sessions

If it hasn't shown yet, I'm seriously looking forward to this conference.  Personally I'll be leading two sessions:

An End to Manual Effort: The Power Query Effect

What Power Query is, why you care, and how it can re-shape and transform the data experience.  What's really special about this session is that I'm going to take this data and turn it over to Jon Peltier who is then going to take it and turn it into a dashboard.  This is perfect, as I'm demoing how to automate data cleanup, and Jon will show you how to use it to add true business value… the real life cycle of Excel data in just a couple of hours.

The Impact of Power Pivot

This one will be fascinating, especially for those who have never seen Power PIvot in action before.  In just an hour I'll show you how big business BI (business intelligence) is at the fingertips of anyone with an Excel Pro Plus license.  It's applicable to companies as small as one employee, and scales up to multi employee small businesses, and even large businesses.  (Departments in large corporations eat this up, as they effectively just act as a small business within the larger whole.)

Register for Excel Summit South now!

Tickets are going fast for this event in all cities, so we ecourage you to register sooner rather than later, and hope to see you there!  You can find out more details and register at:  https://excelsummitsouth.wordpress.com/

First Power Query class of 2016!

The intake is closing soon for our first Power Query class of 2016, which starts on February 3, 2016.

If you haven't heard about this, or you've been considering taking it but haven't signed up yet, you've been missing out.  We truly believe that you'll never take a course that can have this much impact on your job.  If you routinely clean up and prepare data before you can analyze it, and you're NOT using Power Query to do it, you're putting in too much effort and doing too many things over again.  Quite simply, you (or your staff) are wasting their time.  You owe it to yourself to join us and find out how you can significantly decrease or eliminate data preparation time and devote your skills to what they were hired for: analyzing results and reacting to them.

What is included?

We've reviewed the course since we started airing it last year, and overall have been very pleased with the feedback that we've received, as well as the way it's been delivered.  In case you weren't aware, every registration includes:

  • Full downloadable recordings of the entire training event.  (So if you have to miss some time, it's okay, as you get to download it later to re-watch it on your schedule.  We've found people really like this, as it helps not only with time zone issues, but also allows you to review the material at a later date when you are trying to implement your own solutions.)
  • Copies of every workbook used in the workshop delivery
  • Access to our SQL Azure database so you can practice working with data in SQL
  • 6 practice labs with full written and video solutions.
  • Real world examples to explain not only how to do the job, but also the value proposition of using Power Query
  • Explanations and demos of pitfalls, hurdles and gotchas!
  • A free digital copy of M is for Data Monkey
  • A Q&A day to ask questions about applying the techniques to YOUR data

Course Improvements

We're really proud of all of that.  But one part bothered us in our intial setup… we felt that our Q&A day came a bit too fast, and didn't allow people enough time to really use Power Query to any great degree.  To that end we are still offering a Q&A day – heck, we think this is a huge value proposition to the course as you can submit your own issues and we will solve them for you! – but we have bumped the date out a bit.  Instead of hosting our Q&A session one week after the main course, we are now hosting it two weeks after the final day.  We feel that this should allow more time for our attendees to experiment with their data and submit even more challenges for Miguel and I to solve and demo for you.  And remember, the entire Q&A session is recorded for you to download too… so even if you can't make it, you can still submit your questions and get them answered!

Long lasting training resources

We've worked really hard on this course, and tried to make this one of the most complete training packages on the planet.  We've included as many resources as possible to get you up and running with kick-butt and maintainable solutions as quickly as possible.  We've worked hard to give you resources that will FAR outlast your time in our class and impact the way you work with data forever.  Don't miss your opportunity to jump on this, as our next intake won't be until some time in April!  Why miss out on a whole 2 months of productivity gains?

Even better, the skills you'll learn here aren't just applicable to Excel 2010 and higher… they are also applicable to Power BI and Power BI desktop.  So you're learning material that will help you with multiple programs in one session!

Need professional development hours?

We are more than happy to provide you with a certificate of completion, as well as the actual hours you are logged in online.

Discounts available if you register now!

This training will pay for itself, we're sure of it.  But to make that even more likely, we're offering you a 10% discount on the list price of $595 USD.  Use code GPCPA1 at checkout and we'll knock $59.50 off the price, but only until January 31…which is coming up in a couple of days!

Register for the first Power Query class of 2016 here

To register or learn more about the course, head on over to http://powerquery.training/course  We hope you'll join us so that we can help transform your Excel skills into a whole new level of awesomely efficient!

Removing offset duplicates

This post solves a tricky issue of removing offset duplicates or, in other words, removing items from a list that exist not only in a different column, but also on different rows.

Problem History

This data format is based on a real life example that my brother in law sent me. He is a partner in a public practice accounting firm*, and has software to track all his clients. As he’s getting prepared for tax season he wants to get in contact with all of his clients, but his tax software dumps out lists in a format like this:

SNAGHTML16416c58

As you can see, the clients are matched to their spouses, but each client (spouse or not) has their own row in the data too. While this is great to build a list of unique clients, we only want to send one letter to each household.

The challenges we have to deal with here is to create a list of unique client households by removing the spouse (whomever shows up second) from the list. The things we need to be careful of:

  • Not accidentally removing too many people based on last name
  • Getting the duplicate removal correct even if the spouse has a different last name

You can download a file with the data and solution here if you'd like to follow along.

The solution

Alright, so how do we deal with this then?  Well, the first thing, naturally, is to pull the data into Power Query:

  • Click in the table –> create a new query –> From Table

This will launch us in to the Power Query editor where we can start to make some magic happen.

The first thing we need to do is to give each line in our client file a unique client ID number.  To do that:

  • Go to Add Column –> Add Index Column
  • Right click the Index column –> Rename –> ClientID

Which creates a nice numbered list for us:

image

So basically, what we have here now is a client ID for each "Client" (not spouse) in our list.

Figuring out the Spouse's ClientID

The next step to this problem is to work out the Spouse's client ID for each row as well.  To do that we're going to employ a little trick I've actually been dying to need to use.  Winking smile

See, ever since I've started teaching Power Query to people, I've mentioned that when you go to append or merge tables, you have to option to use merge the table you're working on against itself.  As I've said for ages "I don't know when I'll need to use this, but one day I will, and it's comforting to know that I can."  Well… that day is finally here!

  • Go to Home –> Merge Queries
  • From the drop down list, pick to merge the query to itself

image

Now comes the tricky part… we want to merge the Client with the Spouse, so that we can get the ClientID number that is applicable to the entries in the Spouse columns.  So:

  • In the top table, select Client FirstName –> hold down CTRL –> select Client LastName
  • In the bottom table, select Spouse FirstName –> hold down CTRL –> select Spouse LastName

The result should look like this:

image

Once you have that set up correctly, follow these steps to merge and extract the necessary data:

  • Click OK

The results look like this:

image

Before you go further, have a look at the order of the ClientID records.  Nothing special, they are in numerical order… remember that…

Now, let's extract the key components from that column of tables (i.e. the ClientID for the Spouse):

  • Click the Expand arrow to the top right of the newly created NewColumn
  • Uncheck all the items in the filter except the ClientID column
  • Uncheck the default prefix option at the bottom
  • Click OK
  • Right click the new ClientID.1 column –> Rename –> SpouseID

And the results look like this:

image

Looks good, and if you check the numbers, you'll see that our new column has essentially looked up the spouse's name and pulled the correct value from the ClientID column.  (Zoe Ng has a client ID of 2.  Zoe is also Tony Fredrickson's spouse – as we can see on row 4 – and the Spouse ID points back to Zoe's value of 2.

Remember how I mentioend to pay attention to the order of the records in the previous step?  Have a look at the ClientID column now.  I have NO IDEA why this changed, but it happend as soon as we expanded the merged column.  I'm sure there must be some logic to it, but it escapes me.  If you know, please share in the comments.  It doesn't affect anything – we could sort it back into ClientID order easily - it's just odd.

At any rate, we can now fully solve the issue!

Removing Offset Duplicates

So we have finally arrived at the magic moment where we can finish this off.  How?  With the use of a custom column:

  • Go to Add Column –> Add Custom Column
  • Provide a name of "Keep?"
  • Enter the following formula:
    • if [ClientID]<[SpouseID] then "Keep" else "Remove"
  • Click OK

And here is what you'll end up with:

image

That's right!  A nice column you can filter on.

The trick here is that we are using the first person in the list as the primary client, and the spouse as the secondary, since the list is numbered from top to bottom.  Since we've looked up the spouses ID number, we can then use some very simple math to check if the ClientID number is less than the Spouse's ClientID. If it is we have the primary client, if not, we have the spouse.

So let's filter this down now:

  • Filter the Keep? column and uncheck the Remove item in the filter
  • Select the ClientID, SpouseID and Keep? columns –> right click –> remove

And finally we can go to Home –> Close & Load

And there you are… a nice list created by removing offset duplicates to leave us with a list of unqiue households:

SNAGHTML166a1007

*Speaking of accountants

Just a quick note to say that even though I'm an accountant, my brother in law Jason is so good at tax that I use him to do mine.  If you need a good accountant in BC, Canada, look him up here.

Aggregate data while expanding columns

Without question, the Expand feature that shows up when you are merging tables using Power Query is very useful.  But one of the things I’ve never called out is that – when you are merging tables – you have the opportunity to aggregate data while expanding columns.

Scenario Background

Let’s say that we have these two tables of data:

SNAGHTML2493b688

The one of the left is Inventory items, and holds the SKU (a unique product identifier), the brand, type and sale price.  The table on the right is our Sales table, and holds the transaction date, SKU sold (many instances here), brand (for some reason) and the sales quantity.  And, as it happens, I already have two queries set up as connections to these tables:

SNAGHTML249f8aea

(Both of these were created by selecting a cell in the table, creating a new query –> From Table, setting the data types, then going to Close & Load To… –> Only Create Connection.)

The goal here is to work out the total sales for each SKU number, and maybe the average as well.  You can follow along with the workbook found here.

Step 1:  Join the Sales table to the Inventory table

The first thing we need to do is merge the two tables together.  We will use the default (Left Outer) join (as described in this post) to make this happen:

  • Go to the Workbook Queries pane –> right click Inventory –> Merge
  • Choose Sales for the bottom table
  • Select the SKU column in each table

image

  • Click OK

And we’ll end up with the following table in Power Query:

image

There are now two ways to get the totals we need.

Method 1:  Expand then Group

This is the approach that I’ve usually taken, as it feels like a logical breakdown to me.  So here goes:

  • Click the Expand icon to the top right of the NewColumn column
  • Choose to Expand the Date and Sales Quantity (as the other columns already exist in the Inventory table.)
  • Uncheck the “Use original column name as prefix” checkbox

 

image

  • Click OK

You should end up with a list of 20 items, as many of the sales items (like Granville Island Ale) are sold many times:

SNAGHTML24ab8bbb

 

Next, we need to group them up, as this is too detailed.

  • Go to Transform –> Group By
  • Set up the grouping levels as follows

image

The key to understanding this is that the fields in the top will be preserved, the fields in the bottom will be aggregated (or grouped) together.  Any columns in your original data set that you don’t specify will just be ignored.

  • Click OK

The results are as we’d hoped for:

image

That’s cool, so let’s finalize this query:

  • Call the query “Expand and Group”
  • Go to Home –> Close & Load

Method 2: Aggregate data while expanding columns

Now let’s look at an alternate method to do the same thing…

Start by following Step 1 exactly as shown above.  None of that changes.  It’s not until we get to the part where we have the tables merged and showing a column of tables that the methods depart.

So this time:

  • Click the expand button
  • Click the Aggregate button at the top of the expand window:

image

Notice how the view changes immediately!

The logic here is that, if the field is a date or text, it defaults to offering a count of the data in that column for each sales item I have.  But if I click on the Sum of Sales Quantity, I get the option to add additional aggregation levels:

image

After selecting the Sum and Average for Sales Quantity:

  • Ensure “Use original column names as prefix” is unchecked
  • Click OK

And, as you can see, the data is already grouped for us, with results consistent to what we created by first expanding and then grouping the data:

image

This is cool, as we don’t have to first expand, then group.  And while I haven’t tested this, it only stands to reason that this method should be faster than having to expand all records then group them afterwards.

One thing that is a bit of a shame is that we can’t name the columns in the original aggregation, so we do have to do that manually now:

  • Right click Sum of Sales Quantity –> Rename –> Total Units Sold
  • Right click Average of Sales Quantity –> Rename –> Avg Units Sold

And finalize the query

  • Rename the query to Expand and Aggregate
  • Go to Home –> Close & Load

Keep rows that begin with alphabetical characters

Even though this hits on techniques used on this blog before, a colleague asked today “I have a lot of garbage names in a column and only want to keep rows that begin with an alphabetical character.”  I figured we’d explore that here.

The issue

The data at hand has a bunch of garbage records, as well as some that are valid.  Anything that starts with a number, punctuation or other special character needs to go, leaving only items that start with alphabetical characters.

So basically, we want to take the table in blue, and convert it to the table shown in green, below:

image

Click here to download the workbook with both the raw data as well as the finished solution.

Getting started

Naturally, the first thing we need to do is bring the data into Power Query:

  • Select a cell in the data range
  • Create a new query –> From Table

Now, with the data in the table, our first temptation is to immediately try to come up with a way to figure out how to filter to only alpha characters.  Unfortunately, there are 26 of them, and a custom filter will only allow 2 at a time:

image

Okay, that won’t work, so we lean to custom columns…

We could write a function to do it that looks like this:

=if Text.StartWith([Name],”A”) then [Name] else if Text.StartsWith([Name],”B” then [Name] else… “Remove”

Of course, we would replace the … with a whole bunch more nested “if” statements to do the rest of the alphabet.  Very slow and tedious though.

Or we could fall back on the technique shown in my post on Separating Values & Text.  So let’s do just that.

Keep rows that begin with alphabetical characters

To get started, let’s do the following:

  • Add a custom column
  • Use the following formula:

=Text.SplitAny(Text.Lower([Name]),”abcdefghijklmnopqrstuvwxyz”){0}

Based on the concepts covered in the previous post, this formula takes the Name column, converts it to lower case, then splits the text at any occurrence of the letters provided between the quotes.  This actually returns a list, so we use {0} to drill into the first instance.  The trick here is that if the text starts with a letter, it splits and results in a blank record.  If it’s not alphabetical, however, it gives us the character(s) that aren’t alphabetical:

image

So using that, we can wrap the function in an “if” statement instead.

  • Click the gear icon next to the Added Custom step
  • Modify the formula as follows

= if Text.SplitAny(Text.Lower([Name]),”abcdefghijklmnopqrstuvwxyz”){0} then “Remove” else null

This results in a column that helpfully tells us what we can remove:

image

So we can now simply:

  • Filter the Custom column to only keep null values (uncheck “Remove”)
  • Remove the Custom column

And voila!  We have now restricted our list to only show names that start with alphabetical characters.

image

Win a copy of M is for Data Monkey

We thought it might be fun to have a contest and give people a chance to win a copy of M is for Data Monkey.  So here goes!

What you can win

We’re going to give away 5 e-book copies of the book, but you need to do something in order to win.

large_323_MDataMonkey700

How you can win a copy of M is for Data Monkey

Basically what you need to do is this:

  1. Download the full instructions and source files for the challenge.
  2. Send the workbook you used to solve the challenge to us at info@powerquery.training before 3PM on Jan 27, 2016.

In a nutshell, what you’re going to need to do is convert the data on the left into a table as shown on the right, using Power Query.

image

We’ll announce the winners on the http://powerquery.training website on February 1, 2016, and contact the winners by email.

Oh by the way…

We sent this challenge out to the people who subscribed to our Power Query newsletter… and got a pretty cool comment back from one entrant:

“After taking your class, this seemed to be a pretty straightforward problem.”

Awesome!  That is EXACTLY what we wanted to hear!  Do you want to feel that comfortable manipulating data?  We’ve still got spots open in our next course intake starting on February 3, 2016.   Register here!

Creating a custom calendar in Power Query

As it’s the beginning of a new year, I thought it might be interesting to show my spin on creating a custom calendar in Power Query. This topic has been covered by many others, but I’ve never put my own signature on it.

Our goal

If you’re building calendar intelligence in Power Pivot for custom calendars, you pretty much need to use Rob Collie’s GFTIW pattern as shown below:

=CALCULATE([Measure],
ALL(Calendar445),
FILTER(
ALL(Calendar445),
Calendar445[PeriodID]=VALUES(Calendar445[PeriodID])-1
)
)

Note:  The pattern as written above assumes that your calendar table is called “Calendar445”.  If it isn’t, you’ll need to change that part.

This pattern is pretty robust, and, as shown above, will allow you to return the value of the measure for the prior period you provide.  But the big question here is how you create the needed columns to do that.  So this article will focus on building a calendar with the proper ID columns that you can use to create a 445, 454, 455 or 13 month/year calendar.  By doing so, we open up our ability to use Rob Collie’s GFITW pattern for a custom calendar intelligence in Power Pivot.

For more on this pattern see http://ppvt.pro/GFITW

A bit of background

If you’ve never used one of these calendars, the main concept is this:  Comparing this month vs last month doesn’t provide an apples to apples comparison for many businesses.  This is because months don’t have an consistent number of days.  In addition, comparing May 1 to May 1 is only good if your business isn’t influenced by the day of the week.  Picture retail for a second.  Wouldn’t it make more sense to compare Monday to Monday?  Or the first Tuesday of this month vs the first Tuesday of last month?  That’s hard to do with a standard 12 month calendar.

So this is the reason for the custom calendar.  It basically breaks up the year into chunks of weeks, with four usual variants:

  • 445:  These calendars have 4 quarters per year, 3 “months” per quarter, with 4 weeks, 4 weeks and 5 weeks respectively.
  • 454:  Similar to the 445, but works in a 4 week, 5 week, 4 week pattern.
  • 544:  Again, similar to 445, but works in a 5 week, 4 week, 4 week pattern
  • 13 periods: These calendars have 13 “months” per year, each made up of 4 weeks

The one commonality here is that, unlike a standard calendar, the custom calendar will always have 364 days per year (52 weeks x 7 days), meaning that their year end is different every year.

Creating a custom calendar

In order to work with Rob’s pattern, we need 5 columns:

  • A contiguous date column (to link to our Fact table)
  • YearID
  • QuarterID
  • MonthID
  • WeekID
  • DayID

With each of those, we can pretty much move backwards or forwards in time using the GFITW pattern.

Creating a custom calendar – Creating a contiguous date column

To create our contiguous date column, we have a few options.  We could follow the steps in this blog post on creating a dynamic calendar table.  Or we could skip the fnGetParameter function, and just directly query our parameter table.  Whichever method you choose, there is one REALLY important thing you need to do:

Your calendar start date must be the first date of (one of) your fiscal year(s).

It can be this year or last year, but you need to determine that.  I’m going to assume for this year that my year will start on Sunday, Jan 3, 2016, so I’ll set up a basic table in Excel to hold the dates for my calendar:

SNAGHTML25692

Notice the headers are “Parameter” and “Value”, and I also named this table “Parameters” via the Table Tools –> Design tab.  For reference, the Start Date is hard coded to Jan 3, 2016, and the End Date is a formula of B4+364*2 (running the calendar out two years plus a day.)

Now I’m ready to pull this into Power Query and build my contiguous list of dates.

  • Select any cell in the table –> Create a new query –> From Table
  • Remove the Changed Type step (as we don’t really need it)

This should leave you with a single step in your query (Source), and a look at your data table.

  • Click the fx button on the formula bar to add a new custom step

SNAGHTML78078[4]

This will create a new step that points to the previous step, showing =Source in the formula bar.  Let’s drill in to one of the values on the table.  Modify the formula to:

=Source[Value]{0}

image

Reading this, we’ve taken the Source step, drilled into the [Value] column, and extracted the value in position 0.  (Remembering that Power Query starts counting from 0.)

Now this is cool, but I’m going to want to use this in a list, and to get a range of values in a list, I need this as a number.  So let’s modify this again.

=Number.From(Source[Value]{0})

 

 

image

Great stuff, we’ve not got the date serial number for our start date. Let’s just rename this step of the query so we can recognize it.

  • Right click the Custom1 step –> Rename –> StartDate

Now, let’s get the end date.

  • Copy everything in the formula bar
  • Click the fx button to create a new step
  • Select everything in the formula bar –> paste the formula you copied
  • Update the formula as follows:

=Number.From(Source[Value]{1})

That should give you the date serial number for the End Date:

image

Let’s rename this step as well:

  • Right click the Custom1 step –> Rename –> EndDate

We’ve now got what we need to create our calendar:

  • Click the fx button to create a new step
  • Replace the text in the formula bar with this:

={StartDate..EndDate}

If you did this right, you’ve got a nice list of numbers (if you didn’t, check the spelling, as Power Query is case sensitive).  Let’s convert this list into something useable:

  • Go to List Tools –> Transform –> To Table –> OK
  • Right click Column1 –> Rename –> DateKey
  • Right click DateKey –> Change Type –> Date
  • Change the query name to Calendar445
  • Right click the Change Type step –> Rename –> DateKey

The result is a nice contiguous table of dates that runs from the first day of the fiscal year through the last date provided:

SNAGHTML16e36f

Creating a custom calendar – Adding the PeriodID columns

Now that we have a list of dates, we need to add our PeriodID columns which will allow the GFITW to function.

 

 

 

 

Creating a custom calendar – DayID column

This column is very useful when calculating other columns, but can also be used in the GFITW formula to navigate back and forward over days that overlap a year end.  To create it:

  • Go to Add Column –> Index –> From 1
  • Change the formula that shows up in the formula bar to:

=Table.AddIndexColumn(DateKey, "DayID", 1, 1)

  • Right click the Added Index step –> Rename –> DayID

NOTE:  The last two steps are optional.  Instead of changing the formula in the formula bar, you could right click and rename the Index column to DayID.  Personally, I like to have less steps in my window though, and by renaming those steps I can see exactly where each column was created when I’m reviewing it later.

What we have now is a number that starts at 1 and goes up for each row in the table.  If you scroll down the table, you’ll see that this value increases to 729 for the last row of the table.  (Day 1 + 364*2 = Day 729).

Creating a custom calendar – YearID column

Next, let’s create a field that will let us navigate over different years.  To do this, we will write a formula that targets the DayID column:

  • Go to Add Column –> Add Custom Column
    • Name:  YearID
    • Formula:  =Number.RoundDown(([DayID]-1)/364)+1
  • Right click the Added Custom step –> Rename –> YearID

If you scroll down the table, you’ll see that our first year shows a YearID of 1, and when we hit day 365 it changes:

image

The reason this works for us is this:  We can divide the DayID by 364 and round it down.  This gives us 0 for the first year values, hence the +1 at the end.  The challenge, however, is that this only works up to the last day of the year, since dividing 364 by 364 equals 1.  For that reason, we subtract 1 from the DayID column before dividing it by 364. The great thing here is that this is a pattern that we can exploit for some other fields…

Creating a custom calendar – QuarterID column

This formula is very similar to the YearID column:

  • Go to Add Column –> Add Custom Column
    • Name:  QuarterID
    • Formula:  =Number.RoundDown(([DayID]-1)/91)+1
  • Right click the Added Custom step –> Rename –> QuarterID

The result is a column that increased its value every 91 days:

image

It’s also worth noting here that this value does not reset at the year end, but rather keeps incrementing every 90 days.

Creating a custom calendar – MonthID column

The formula for this column is the tricky one, and depends on which version of the calendar you are using.  We’re still going to create a new custom column, and we’ll call it MonthID.  But you’ll need to pick the appropriate formula from this list based on the calendar you’re using:

Calendar Type Formula
445 Number.RoundDown([DayID]/91)*3+
( if Number.Mod([DayID],91)=0 then 0
else if Number.Mod([DayID],91)<=28 then 1
else if Number.Mod([DayID],91)<=56 then 2
else 3
)
454 Number.RoundDown([DayID]/91)*3+
( if Number.Mod([DayID],91)=0 then 0
else if Number.Mod([DayID],91)<=28 then 1
else if Number.Mod([DayID],91)<=63 then 2
else 3
)
544 Number.RoundDown([DayID]/91)*3+
( if Number.Mod([DayID],91)=0 then 0
else if Number.Mod([DayID],91)<=35 then 1
else if Number.Mod([DayID],91)<=63 then 2
else 3
)
13 periods Number.RoundDown(([DayID]-1)/28)+1

 

As I’m building a 445 calendar here, I’m going to go with the 445 pattern, which will correctly calculate an ever increasing month ID based on a pattern of 4 weeks, 4 weeks, then 5 weeks.  (Or 28 days + 28 days + 35 days.)

SNAGHTML5ac6c5

This formula is a bit tricky, and – like the GFITW pattern – you honestly don’t have to understand it to make use of it.  In this case this is especially true, as the formula above never changes.

If you’re interested however, the most important part to understand is what is happening in each of the Number.Mod functions.  That is the section that is influencing how many weeks are in each period.  The key values you see there:

  • 0:  Means that you hit the last day of the quarter
  • 28:  This is 4 weeks x 7 days
  • 35:  This is 5 weeks x 7 days
  • 56:  This is 8 weeks x 7 days
  • 63:  This is 9 weeks x 7 days

The Number.RoundDown portion divides the number of days in the DayID column by 91, then rounds down.  That will return results of 0 through 3 for any given value.  We then multiply that number by 3 in order to return values of 0, 3, 6, 9 (which turns out to be the month of the end of the prior quarter.)

The final piece of this equation is to add the appropriate value to the previous step in order to get it in the right quarter.  For this we look at the Mod (remainder) of days after removing all multiples of 91.  In the case of the 445, if the value is <= 28 that means we’re in the first 4 weeks, so we add one.  If it’s >28 but <=56, that means it’s in the second 4 weeks, so we add two.  We can assume that anything else should add 3… except if there was no remainder.  In that case we don’t add anything as it’s already correct.

Creating a custom calendar – WeekID column

WeekID is fortunately much easier, returning to the same pattern we used for the YearID column:

  • Go to Add Column –> Add Custom Column
    • Name:  WeekID
    • Formula:  =Number.RoundDown(([DayID]-1)/7)+1
  • Right click the Added Custom step –> Rename –> WeekID

The result is a column that increases its value every 7 days:

 

SNAGHTML4fc488

Finalizing the custom calendar

The last thing we should do before we load our calendar is define our data types.  Even though they all look like numbers here, the reality is that many are actually defined as the “any” data type.  This is frustrating, as you’d think a Number.Mod function would return a number and not need subsequent conversion.

  • Right click the DateKey column –> Change Type –> Date
  • Right click each ID column –> Change Type –> Decimal Number
  • Go to Home –> Close & Load To…
    • Choose Only Create Connection
    • Check Add to Data Model
    • Click OK

And after a quick sort in the data model, you can see that the numbers have continued to grow right through the last date:

image

Final Thoughts

We now have everything we need in order to use the GFITW pattern and get custom calendar intelligence from Power Pivot.  Simply update the PeriodID with the period you wish to use.  For example, if we had a Sales$ measure defined, we can get last month’s sales using the following:

=CALCULATE([Sales$],
ALL(Calendar445),
FILTER(
ALL(Calendar445),
Calendar445[MonthID]=VALUES(Calendar445[MonthID])-1
)
)

As an added bonus, as we’re using Power Query, the calendar will update every time we refresh the data in the workbook.  We never really have to worry about updating it, as we can use a dynamic formula to drive the start and end dates of the calendar.

As you can see from reading the post, the tricky part is really about grabbing the right formula for the MonthID.  The rest are simple and consistent, it’s just that one that gets a bit wonky, as the number of weeks can change.  (To be fair, this would be a problem for the quarter in a 13 period calendar as well… one of those quarter will need 4 weeks where the rest will need 3.)

One thing we don’t have here is any fields to use as Dimensions (Row or Column labels, Filters, or for Slicers.)  The reason I elected not to include those here is that the post is already very long, and they’re not necessary to the mechanics of the GFITW formula.

 

 

If you’d like a copy of the completed calendar, you can download it here.  Be warned though, that I created this in Excel 2016.  It should work nicely with Excel 2013 and higher, but you may have to rebuilt it in a new workbook if you’re using Excel 2010 due to the version difference on the Power Pivot Data Model.

Best of the Holiday Season

Hey folks,

Just a quick note here to let you know that the blog is closed this week.  (We’re taking a well deserved break to enjoy some family time, and hope you get the chance to do the same.)  But don’t worry, we’ll be back in January with more posts!

I also wanted to throw out a quick thank you to all of you who have been faithfully reading the blog on a weekly basis.  Your support means a great deal, especially since I took the plunge to go full time with Excelguru back in May.  It’s been a great year in that regard, as it’s given me the opportunity to build many things, including M is for Data Monkey.  Much of the material for this book was inspired by blog comments and email, and I feel fortunate to have been able to translate those into a book that is being so well received.

image

We wish you the very best of the holiday season, whichever you celebrate, wherever in the world you are.  See you in 2016!

Merge Tables using Inner and Anti Joins in Power Query

Last week’s blog post on merging tables using Outer joins has proved to be pretty popular.  (I guess I’m not the only one who struggled with this concept!) This week we’ll look at the remaining three options, showing how to merge tables using Inner and Anti joins.

The Inner and Anti join types:

Again, we have three join types to explore this week:

  • Inner Join
  • Left Anti Join
  • Right Anti Join

If you read last week’s article you may already have an idea of what you’ll be seeing here, but we’ll make sure we go through it in full anyway.  (If you HAVEN’T read last week’s article, you might want to do so, as this one just builds on steps that readers will already be comfortable with.)

Sample Data

We’re going to work with the same set of data as we did last week, although we have a different sample file for it (to hold the completed queries.)  That file can be downloaded here.

As a reminder, here’s what we’re set up with:

SNAGHTML4303a120

As I explained last week:

… two tables of data, one called Transactions, and one called ChartOfAccounts.

Now, the key piece you want to watch here is that we need the Account-Dept combination to exist in both tables in order to make a perfect join between them.  Think VLOOKUP for a second… if you tried to make a VLOOKUP for account 10045 from the Transactions table against the ChartOfAccounts table, what would you get?  Of course, you’d get #N/A since 10045 doesn’t exist in the ChartOfAccounts table.

In this case we have items in both tables that don’t exist in the other.  (The yellow records in the Transactions table don’t have a match in the ChartOfAccounts table, and the red records in ChartOfAccounts don’t have a match in the Transactions table.)  With these differences we can test how each of the first three join types available to us behave when we try to merge the data in both tables together.

We’ve already got the ChartOfAccounts query and Transactions queries set up as connection only queries, so we’re ready to jump right into comparing the join types.

Merge Tables using Inner and Anti Joins – Inner Join

This join type stands out somewhat from the others in that there is no “left” or “right” version.  Let’s build the join to explore why that is:

  • Open the Workbook Queries pane
  • Right click the Transactions query and choose Merge
  • Select ChartOfAccounts for the bottom table
  • For the top query (Transactions) select Account, hold down CTRL and select Dept
  • For the bottom query (ChartOfAccounts) select Account, hold down CTRL and select Dept
  • Change the Join type to “Inner (only matching rows)”
  • Click OK

Like last week, the data lands in Power Query, and we’ll take the following steps to expand the rows:

  • Right click the NewColumn column –> Rename –> COA
  • Click the Expand icon on the top right of the COA column
  • Leave all the defaults and click OK

The results of these steps are shown here:

image

We can now see the meaning of the Inner Join.  Unlike the Full Outer Join that pulls in all records from both tables whether there is a match or not, the Inner Join only pulls in rows that exist in both the left and right tables.  In other words… all those red and yellow rows shown in our original data set?  They’re missing from this output.

I can see this being very useful for indentifying matching records, giving a list to show which ones matched without polluting the data set with non-matching items.

Let’s finalize this query and move to the Anti Joins:

  • Change the name of the query to Inner
  • Go to Home –> Close & Load –> Close & Load To… –> Only Create Connection

Merge Tables using Inner and Anti Joins – Left Anti Join

Now that we’ve seen Outer and Inner joins, give some thought as to what an Anti join might do…

Done?  Let’s go see if you’re right.

  • Right click the Transactions query and choose Merge
  • Select ChartOfAccounts for the bottom table
  • For the top query (Transactions) select Account, hold down CTRL and select Dept
  • For the bottom query (ChartOfAccounts) select Account, hold down CTRL and select Dept
  • Change the Join type to “Left Anti (rows only in first)”
  • Click OK

And when the data gets to Power Query:

  • Right click the NewColumn column –> Rename –> COA
  • Click the Expand icon on the top right of the COA column
  • Leave all the defaults and click OK

You should now have this:

image

So this time, we only see the records from the left table (Transactions) that had no matching record in the ChartOfAccounts (right) table.  How cool is that?  This allows us to immediately identify records with no matches at all.

Let’s finalize this query as well:

  • Change the name of the query to Left Anti
  • Go to Home –> Close & Load –> Close & Load To… –> Only Create Connection

Merge Tables using Inner and Anti Joins – Right Anti Join

By this point I’m sure you can predict where this is going.  So let’s get to it:

  • Right click the Transactions query and choose Merge
  • Select ChartOfAccounts for the bottom table
  • For the top query (Transactions) select Account, hold down CTRL and select Dept
  • For the bottom query (ChartOfAccounts) select Account, hold down CTRL and select Dept
  • Change the Join type to “Right Anti (rows only in second)”
  • Click OK

And when the data gets to Power Query:

  • Right click the NewColumn column –> Rename –> COA
  • Click the Expand icon on the top right of the COA column
  • Leave all the defaults and click OK

And, in anti-climactic fashion (ha!) we end up with the following:

image

 

Yes indeed, as you probably predicted, only the records shown highlighted in yellow in the ChartOfAccounts (right) table show up in this set.  No record that has a match in the Transactions (left) table, nor any records in the Transactions table without a match in the ChartOfAccounts table show up.

We can now finish our final comparison:

  • Change the name of the query to Right Anti
  • Go to Home –> Close & Load –> Close & Load To… –> Only Create Connection

Final Thoughts

I think this is pretty cool stuff.  If you’d asked me how many ways there are to join data, I think I would have been hard pressed to answer six until I wrote this up.  But seeing it laid out in full, with each option detailed… I can see where these would each be useful in their own right.  I hope you enjoyed taking the journey with me here!

Oh… and by the way… if you’d like to download one workbook with all six join types included… you can do that here.