I got an email from a reader this morning who asked how to calculate start and end dates for a given employee when they have had multiple terms of employment. Since it's been a while since we've had a technical post on the blog, I thought that this would be a good one to cover.
In this case (which you can download here) we are given the table shown below on the left, and we need to create the table shown on the right:
As you can see, John's start date needs to be listed as Jan 1, 2013 and his end date needs to be listed based on the last date he worked here; Oct 31, 2016.
How to Calculate Start and End Dates using Power Query
My first thought was "we'll need a custom function to do this", but as it turns out, there is a a MUCH easier way to accomplish this, and it's 100% user interface driven as well. I'm virtually certain that the performance will also be much better over larger data sets as well (although I haven't specifically tested this.)
Let's take a look:
Step 1: Connect to the data
This is pretty easy, just select the table and use Power Query to connect to the data:
- Excel 2010/2013: Power Query --> From Table
- Excel 2016: Data --> From Table/Range
We'll be launched into Power Query and will be looking at our short little table:
Step 2: Calculate Start and End Dates via Grouping
The trick here is to actually use Power Query's Grouping feature to calculate the start and end dates. To do this:
- Go to Transform --> Group By
The dialog will open and is already offering to group by Name, which is what we need. Now we just need to select the grouping levels. The first is going to be our Start Date, so we'll rename it as such and change to calculate a Min of the From column:
The effect here is that this will provide the lowest value from the "From" column for each employee. A perfect start.
Next, we need to add a new grouping level to get the End Date. To do that:
- Click Add Aggregation
- Configure the new column as follows:
- New column name: End Date
- Operation: Max
- Column: To
It should look as follows:
And believe it or not, you're done!
How this works
The key here is that the grouping dialog in Power Query works for all records in the group. This is really important, as the first column has no bearing on subsequent columns… if it did, we'd get the max for the first record, which is not at all what we'd be looking for. Instead, the Group By will restrict to find all records for John, then will pull the Min and Max out of the remaining three rows, returning those as the values.
The other thing that is worth noting here is that the order of the source data is irrelevant. We could have provided either of these options and the answers would still have been calculated correctly:
It's also worth mentioning that this technique to calculate start and end dates will also work in both Excel and in Power BI, as the feature set is identical between the two products.
Sometimes things that look hard, are actually really easy when we have the right tools in our hands, and this happens to be one of those situations.