Merge Data Based on Two Columns

This past weekend I attended SQL Saturday in Portland, OR.  While I was there, I attended Reza Rad’s session on Advanced Data Transformations with Power Query.  During that session, Reza showed a cool trick to merge data based on two columns through the user interface… without concatenating the columns first.

The Issue

Assume for a second that we have data that looks like this:

SNAGHTML64de019

There’s two tables, and we want to join the account name to the transaction.  The problem is that the unique key to join these two tables (which isn’t super obvious here) is a combination of the Acct and Dept fields.  (Elsewhere in the data the same account exists in multiple departments.

To get started, I created two connection only queries, one to each table.

  • Select a cell in the left table (Transactions) –> create a new query –> From Table –> Close & Load To… Connection only
  • Select a cell in the right table (COA) –> create a new query –> From Table –> Close & Load To… Connection only

My Original Approach

Now, with both of those created, I want to merge the data so I get the account name on each row of the Transactions table.  So how…?

Originally I would have edited each query, selected the Acct and Dept columns, and merged the two columns together, probably separating them with a custom delimiter.  (This can be done via the Merge command on the Transform or the Add Column tab.)

Essentially, by concatenating the columns, I end up with a single column that I can use to dictate the matches.

Reza’s presentation showed that this isn’t actually necessary, and I don’t need to merge those columns at all…

Merge Data Based on Two Columns

So here’s how we can get those records from the COA Table into the Transactions table:

  • Right click the Transactions query in the Workbook Queries pane
  • Choose Merge
  • Select the COA query

The data now looks like this, asking for us to select the column(s) we wish to use for the merge:

SNAGHTML70855ae

So here’s the secret:

  • Under Transactions, click the Acct column
  • Hold down the CTRL key
  • Click the Dept column

And Power Query indicates the order of the columns you selected.  It will essentially use this as a temporary concatenated value!

image

So now do the same to the COA table:

image

And then complete the merge.  As you can see, you get a new column of data in your query:

image

of course, we can expand NewColumn to get just the Name field, and everything is working perfectly!

image

End Thoughts

This is pretty cool, although not super discoverable.  The really nice piece here is that it can save you the work of creating extra columns if you only need them to merge your data.

I should also mention that Reza showed this trick in Power BI Desktop, not Excel.  But because it’s Power Query dealing with the data in both, it works in both.  How cool is that?

Breaking Power Query via Power Pivot is a thing of the past

I’m pleased to let people know that breaking Power Query via Power Pivot is a thing of the past … at least for users of Excel 2013 or higher.  (Sorry, if you’re on 2010, you still need to be careful.)

The information has been around for a bit, and it’s one of the topics we cover in our http://powerquery.training/course as well: how to break your Power Query by doing one of the following actions in Power Pivot:

  • Renaming a table
  • Renaming a column sourced from Power Query
  • Deleting a column sourced from Power Query

Any of these three actions would set your query into an un-editable state, but worse, nothing would appear to happen.  The query would refresh as normal, until you eventually tried to change it.  At that point all hell would break loose and your only option was to rebuild your query (and related data model table) from scratch.

This has been covered in detail in the following sources:

But now, breaking Power Query via Power Pivot is a thing of the past…

This issue was fixed in Excel 2016, but it left many of us hanging with an older version that still exhibited the problems.  If you’re on 2013, however, that problem has now been fixed.  I share the links at the bottom of the post to make sure you’re updated, but first I’ll demonstrate that the fix is really working.

To set the stage, I created a simple Calendar table in Power Pivot, and loaded it to the Data Model.

Corruption Method #1:  Deleting Columns

My first test was to attempt to delete the Year column in Power Pivot.  At first it looks like nothing has really changed:

SNAGHTMLf5b1a5d

But when I click Yes, Power Pivot comes back with a message to let me know that I can’t do it after all:

SNAGHTMLf5c18b3

Hooray!  This is fantastic news, as it means that I can’t actually destroy my entire data model.  Beautiful!

Corruption Method #2:  Renaming Columns

Next I tried to rename the Year column to myYear.

SNAGHTMLf5d7bae

Nope.  Can’t break the model that way either.

Corruption Method #3:  Renaming the Table

Finally, I tried to rename the table from Calendar to myCalendar:

SNAGHTMLf614112

And it looks like we’re protected from shooting our model in the foot too.

My thoughts on the fix

I’m 99% happy with this fix.  It protects us from accidentally blowing up our data models, which is super important.  Especially because it was possible to break the model and still run for months without every realizing it.  That just shouldn’t be allowed to happen.  So why am I not 100% happy?

Well, the first part is that Excel 2010 users are still susceptible to the issue.  That’s a challenge, although to be fair Microsoft has been pretty forthcoming that the Load to Data Model hack is not truly a supported method anyway.  So really, there’s not much of a surprise there.  I’m not holding any points back on this one.

The last part – the remaining 1% for me - is that the fix, as implemented, means that you cannot ever rename a table in Power Pivot that was source from Power Query.  In fact, even if you go back to Power Query and rename the table there, it still shows under the original name in Power Pivot.  Granted it’s not a total show stopper, but you do want to give some thought to your query naming before you push it into the data model that very first time.

How can you ensure you have the fix?

If you’re running automatic updates for Office 2013, you should already have the fix in place.  But if you want to check (or you don’t), then here’s the deal:

The full support KB article on the subject can be found here.

It will direct you to install the following updates:

  • KB3039800: update for Office 2013 – From October 13, 2015
  • KB3039739: update for Office 2013 – September 8, 2015
  • KB3085502:  MS15-099 security update for Excel 2013 – September 8, 2015

(There is a 32 and 64 bit version of each, so make sure you pick up the right version.)

For reference, I just tried to install them, without checking if they’d been installed first.  Fortunately it does a check first, so for me each of them came back with a message like this:

image

So there, you go.  Great news for users of Power Query and Power Pivot 2013 and higher.  You can now model with the confidence that you won’t accidentally blow up your solution!

MZ-Tools 8.0 for VBA

One of my favourite add-ins of all time just got an upgrade, and I’m super stoked about it.  Why?  Because I can use it again!

As I began my VBA journey, there were two add-ins that I used all the time:

Both were invaluable, with SmartIndenter allowing right click access to re-indent code, and MZ Tools providing a TON of useful content.  (My favourite was the error handling template I could just inject with a couple of clicks.)

It became painful to work on or debug VBA code on anyone’s PC who didn't’ have these tools installed, and the became part of the default installation routine for my machine.

Why I’ve been Add-in free for years

Unfortunately, both MZ Tools (3.0) and SmartIndenter were written in VB6, which meant that they were restricted to the 32 bit versions of Excel.  And that meant that the day I started using Power Pivot, I lost the ability to use either add-in.  (Okay, to be fair I could have stuck with 32 bit Excel for Power Pivot… except there was no way I was doing that.  The need for more memory accessed trumped the tools that made my VBA life easier.)

I’ve now been running without the aid of these tools for about 5 years… which is shocking… and STILL miss them.  A few times over the last few years, I even made some attempts to replicate some of these features on my own, but I could never figure out how to get VB.NET to hook into the VBIDE, so gave up on it.  Instead I focussed on tools I could control, building add-ins and software in other areas.  (It always irked me that I couldn’t figure out how to hook the VBIDE though!)

No longer Add-in Free

For that reason, I was pretty jazzed when Carlos Quintero emailed out to say that he’s updated and released not only MZ-Tools for Visual Studio, but also MZ-Tools 8.0 for VBA.  That is FANTASTIC.  I’ve downloaded it, got it installed, and am already digging through the loads of features to customize my templates.

Unfortunately I’m not such a good judge of what’s new in this version (my memory of it is five years out of date) but here’s some of the stuff that I’m looking forward to (re-)acquainting myself with:

  • Dead code review.  I’ve already scanned a couple of my add-ins and found unused variables and unused routines that can be trimmed.
  • Statistics.  Kind of a vanity thing, maybe, but I’ve always wondered how many lines of code are actually in my XLGFileTools add-in.  As of today, the answer is 6,726.  (Maybe a couple less once I review the Dead Code report above)
  • Code templates:  I can’t wait to rebuild the error handling template.  I also remember in the past the ability to insert a comment block at the top of each routine/module very easily for documentation too.
  • The simple thing of being able to right click the Immediate window and choose Clear.  Oh my how I’ve missed you!

These are just some highlights, there are obviously tons more.

Worth the cost

if you look back you’ll see I don’t endorse many products, and certainly not as passionately as I am here.

The goal of MZ-Tools is to make your everyday programming life easier.  I 100% believe that it does that, and that it is worth the cost to purchase it – something I don’t say very often!  (Understand I’m not making any commission or advertising revenues off this, either.)  The software is just that good and useful.

But even better, if you are in the market for it, Carlos has a 50% sale on through the end of October.  That will save you $40 off the regular $79.95 price tag.  How can you beat that?

You can find it at http://mztools.com/index.aspx 

Happy coding!

Split by line breaks in Power Query

Some more savvy Excel users know that you can break text onto multiple lines in a cell by pressing Alt+Enter mid entry.  Today’s post explores how we can split by line breaks in order to break these types of cell contents into multiple columns.

Set up the data

To start with, let’s set up some simple data:

  • In cell A2, type “Text” and press Enter
  • In cell A3 type “This” –> Alt + Enter –> “is” –> Alt + Enter –> “text” –> Enter

The result should look like this:

SNAGHTMLa795c48

And now we’ll go and pull it in to Power Query:

  • Select the data –> create new query –> From Table

Split by Line Breaks

At this point, you’d certainly be forgiven for thinking that only the first line was pulled in.  But if you select the cell, you’ll see in the preview window that all the data is there:

image

So let’s try and split it up.

  • Right click the Text column –> Split Column –> By Delimiter

Unfortunately, there is no line break or carriage return option in the dialog, which means that you’ll need to pick “Custom”, and enter the special character for a Line Feed:

image

Even worse, with entering this, Power Query is overly aggressive when you click OK.  It assumes that this is special text, so escapes it to text, and appends some commands that actually mess you up:

image

Notice how we have two columns with nothing in the second.  What gives there?

To correct this code, we need to modify the formula in the formula bar to do two things:

  1. Undo the escaping that Power Query did on our #(lf) entry, and
  2. Remove the code that is telling which columns to import

So first, we need to replace:

"#(#)(lf)"

with

"#(lf)"

And second, we need to remove this completely:

,{"Text.1", "Text.2"}

And the results are much better:

image

The Good/Bad News

The bad news is that currently it’s a bit painful to do this.  The good news is that it can be done, and the better news is that Power Query is constantly being updated.  I’m sure it won’t be long before they give us an easier to use/more discoverable mechanism to make this work.

Other Special Characters

Should you need them, here are three special characters that you can refer to in Power Query:

  • Line feed: #(lf)
  • Carriage return: #(cr)
  • Tab: #(tab)

Clean WhiteSpace in PowerQuery

The other day as I was working through a model, I once again tripped upon the fact that Power Query’s Text.Trim function doesn’t clean whitespace inside the text string, only at the ends.  For those who are used to Excel’s TRIM function, this is a frustrating inconsistency.

Just to circle on it, here’s the difference:

Source Function Result
Excel =TRIM(“  trim   me  “) “trim me”
Power Query =Text.Trim(“  trim   me  “) “trim   me“

Typically, I’ve just gone through the cycle of replacing a double space with a single space a few times on the same column to deal with this issue.  The issue, of course, is that you need to do this twice if there are 4 spaces, but add more spaces, and you have to do this more times.  Doesn’t seem like a really robust solution.

At any rate, this time I emailed one of my friends on the Power Query team and suggested that they should implement a function to make this a bit easier.

My Suggestion for a Clean Whitespace Function

The gist of my suggestion was to create a new function that would not only trim the whitespace internally, but would also allow you to specify which character you want to clear out.  This way it would work nicely to clean whitespace in the shape of spaces (the usual culprit in my world), but would also allow you to substitute in other characters if needed.  (Maybe you need to replace all instances of repeating 0’s with a single 0.)

It got referred to another friend on the team, (who wishes to remain nameless,) and he responded with some pretty cool code.  I’ve taken that code, broken it down and modified it a bit, and the end result is a slightly different version that can work the same as Excel’s TRIM() function, but adds an optional parameter to make it even more robust.  For lack of a better name, I’m going to call it “PowerTrim”.  (Just trying to do my part to keep the Power in Power Query!) 😉

Here’s the function:

(text as text, optional char_to_trim as text) =>
let
char = if char_to_trim = null then " " else char_to_trim,
split = Text.Split(text, char),
removeblanks = List.Select(split, each _ <> ""),
result=Text.Combine(removeblanks, char)
in
result

And to implement it, you’d take the following steps:

  • Copy the code above
  • Create a new query –> From Other Sources –> Blank Query
  • Change the query name to PowerTrim
  • Go into the Advanced Editor
  • Select all the text and replace it with the code above –> Done

Like this:

image

How it Works

We’d call this from a custom column, feeding in a column of text, and specifying the character (or even string of characters) we’d like to trim.  The function then works through the following process:

  • It checks to see if the char_to_trim was provided, and uses a space if not
  • It splits the text by that character, resulting in a list:

image

(This list shows the word “bookkeeper” split by “e”)

It then:

  • Filters out any blank rows
  • Combines the remaining items using the original character to split by

(The original version was actually all rolled up in one line, but I find it easier to debug, step through, examine and play with when it’s separated.)

Demo

Here’s some examples of the function in action. I started with a raw table from Excel.  (Create a new query –> From Table)

image

And added a Custom column by going to Add Column –> Add Custom Column

  • Name:  Trim_null
  • Formula:  =PowerTrim([Text])

image

Notice that in the first row it trimmed the leading, trailing and internal spaces.  Just like Excel!  (Remember that if you used Power Query’s default Text.Trim() function, you would return “trim   me”, not “trim me”.)

Now, let’s add another and try with an alternate character… like 0.  Again, we go to Add Column –> Add Custom Column:

  • Name:  Trim_0
  • Formula:  =PowerTrim([Text],”0”)

image

In this case the extraneous zeroes are trimmed out of row 3, leaving only a single one.  Cool stuff.  Now what about the “e”. Let’s see how that one goes.

Once more to Add Column –> Add Custom Column:

  • Name:  Trim_0
  • Formula:  =PowerTrim([Text],”e”)

image

The first time I looked at this, I thought there was an issue with the function.  But then I remembered in this case we are removing all leading and trailing e’s, as well as replacing any duplicate e’s with a single e.  You can see that this is indeed what happened in both rows 2 and 4.

Final Thoughts

I wish there was a way to get this to easily role into the Text functions category, so that I could call it something like Text.PowerTrim() or even replace the Text.Trim() function with my own.  Unfortunately a query name can’t contain the period character, which kind of sucks.  I guess it’s to to protect you from accidentally overwriting a function, but I’d like the ability to do it intentionally.

Allocate Units Based on Dates Using Power Query

I ran into an interesting wrinkle in a model I’m building, where I need to allocate units based on dates.  The idea here is to allow a user to the number of units to allocate, the start date and the end date.  From there, I wanted to use Power Query to work out how many months have elapsed, and then tell me how many units should be allocated to each year in the period.

Background:

Here’s a look at my data (which you can download here):

SNAGHTML65ab32e

So the idea here is that I need to come up with a table that shows that data should be allocated as follows:

10-1-2015 8-53-05 AM

So, if we look at the Traditional Single Family, the sales cycle is the 6 months from Aug 2015 through Jan 2016.  With the first 5 months being in 2015 and the final month being in 2016, that means we need to allocate 5/6 of the total units to 2015 and 1/6 to 2016.

Allocate Units Based on Date: Method

My initial thought was to try and find a date difference or duration type function to return a count of months between two dates.  Unfortunately, such a function doesn’t seem to exist.  For that reason, I decided I’d just go ahead and build my own function to do the job.

Step 1: Create a function to return a list of months

To start with, I needed a list of month end dates.  I started a blank query, jumped into the Advanced Editor and built a simple query to provide a hard coded startdate and enddate, then create a list from one to the other:

let
startdate=#date(2015,8,1),
enddate=#date(2016,1,31),
Source = {Number.From(startdate)..Number.From(enddate)}
in
Source

That list yielded me a list of date serial numbers, so I then:

  • Went to Transform –> To Table
  • Changed the column’s data type to Date
  • Renamed the column to Date
  • Converted the column to month end dates (Transform –> Date –> Month End)
  • Removed Duplicates (Home –> Remove Duplicates)

The end result is a short table that shows only the month end dates:

SNAGHTML6629a7f

Step 2:  Add a Year End date column

I then needed to find a way to count the number of months in each year.  To do that I:

  • Added a year end column (Select the Date column –> Add Column –> Date –> Year –> End of Year)
  • Went to Transform –> Group By and set up the grouping as follows:
    • Group by:  EndOfYear
    • New column name: Months_x_Year
    • Opeartion:  CountRows

SNAGHTML6785ccf

Step 3: Modify to list Months in Period

At this point I realized that I had a pretty serious miss in my logic.  If I wanted to apply this as a proportion, I needed to also track the total amount of months in the period (so that I could allocate 5/6 to 2015 and 1/6 to 2016.)

To fix this, I added another level of grouping, but with a twist…

  • I removed the “Group By” column
  • I created an “Original” column, and set the operation to All Rows
  • I created a “Months_Total” column, set to SUM the Months_x_Year column

Here’s the configuration:

image

And the result:

SNAGHTML67d3653

This is pretty slick, as the grouping returned the total count of months, but also returned the original table.  Of course, when you expand the table using the double headed arrow to the top right of the Original column, it runs the Months_Total row down each row that gets added:

SNAGHTML67eeae8

Step 4:  Turn the routine into a function

The next step was to go back into the Advanced Editor, and turn this into a function. That’s actually not hard at all, requiring only three lines to be modified.  The first 4 lines of the function are shown here:

(startdate as date, enddate as date) as table =>
let
//startdate=#date(2015,8,1),
//enddate=#date(2016,1,31),

As you can see, I basically added the parameter line at the beginning (using the same variable names for startdate and enddate), then commented out the lines I initially used in order to populate the data I used to build my test case.

Finally, I renamed the function to fnGetAllocationBase, and saved it.

Step 5:  Using the function

To use the function, we basically now just load the original table, then feed the start/end dates in to it.  Here’s how I went through that process:

  • Select the table –> Power Query –> From Table
  • Select the First Month and Last Month columns –> Change Type –> Date
  • Add Column –> Add Custom Column
    • Formula:  =fnGetAllocationBase([First Month],[Last Month])

I now had a new column containing the tables I needed with my allocation basis:

image

As I didn’t need month granularity for my model, (we’re budgeting on an annual basis,) I’m now able to:

  • Remove the First Month and Last Month columns
  • Expand the columns from the Custom column
  • Add a new custom column with the following details:
    • Name:  Units
    • Formula:  =[Units To Allocate]*[Months_x_Year]/[Months_Total]
  • Remove the Units to Allocate, Months_x_Year and Months_Total columns
  • Set my data types

And the end result is a nice table that will serve my sales model nicely:

SNAGHTML68df2ee

Renewed as an MVP

I’m pleased to say that I received the email this morning to let me know that I’ve been renewed as an MVP for another year.

image

This is my 10th Microsoft Excel MVP award, and there have been a lot of changes in that time.  Although I do provide my opinions during my technical articles, typically I don’t often create blog posts that are solely introspective or opinion pieces.   With this being a milestone for me, I thought I’d do one today.

Some of the highlights since I’ve been an MVP:

  • We’ve seen four versions of (Windows) Excel since I was awarded:  Excel 2007, 2010, 2013 and now 2016.
  • Excel online pioneered as a “free for everyone service” (which you can access at www.onedrive.com)
  • Office 365 was released, and Microsoft is pushing hard to make this the new model for how you purchase Office in future.
  • Excel for iOS and Android hit the scene.  (iPhone was in it’s infancy back in 2005.)
  • Power Pivot was released – first as a free download for Excel 2010 (Microsoft did free?), then a second free iteration for Excel 2010, before finally becoming integrated into Excel 2013 and up.
  • Power Query was released, again as a free download for Excel 2010 AND 2013 (not only the current, but the past version?), experiencing MONTHLY updates (rather than major versions only) before being integrated into Excel 2016.
  • Power Query is continuing to get updates even after Excel 2016’s release, meaning that we’ve got upgrades continuing to be provided for 3 versions of Excel for the first time ever.

If you’d have asked me to predict any of that back in October 2005, there’s no way that I could have.  Whether you agree with them all or not, these changes have been revolutionary for both business intelligence in Excel, as well as the ability to connect from anywhere to get your work done.  (Or feel guilty about not getting it done!)

Of course, not all of these have been without flaws or problems.  These are just my opinions, but…

  • The pioneering of the Ribbon in Excel 2007 without a way to customize it was a disaster that should never have happened.  Microsoft was so focused on “their” new user interface that they didn’t even want to hear that it was less efficient and punished power users efficiency.  So much so that I wrote a book on it.  Thankfully, the product teams listened eventually, and we saw that change in Office 2010.
  • Office 365 has been met with confusion and angst. I think there’s been a couple of reasons for this.  The first is that they’re pushing a new subscription model, conflicting with over 30 years of Microsoft’s habitual model where we were able to buy once and “own” the product.  The second is that they’ve done a poor job of explaining that the Office 365 service is actually a bundle of products in many cases, not JUST your Office license.  I use Office 365 to host my email and even have a Sharepoint site.  There is no way that I could afford to run and maintain my own email/Sharepoint servers, and last time I checked - based on a 3 year update cycle – the cost was about the same to subscribe vs re-buy anyway.  Overall, I think the subscription is actually the right move, it’s just going to take time for people to get used to and embrace it.  It’s this model that will allow Microsoft to justify giving us constant and frequent updates.
  • Re-branding of the “Power Add-ins” was a huge mistake.  When Power Pivot, Power Query, Power View and Power Map were released, I met MANY high end Excel pros who celebrated that Microsoft had FINALLY recognized their role, and was releasing Power Tools aimed at them.  (For reference, I know the Excel team DOES recognize the value of power users, this was simply the perception of those I met.)  Now we’re seeing these tools re-branded down to “less intimidating” names: Power Query in Excel 2016 has basically had it’s identity removed, Power Map is now 3D maps, and Power View has been hidden away without a default place in the Ribbon.  I can’t say how much it frustrates me that there is a constant need to dumb down the interface and pretend that Excel is easy.  It’s not, and the team at Microsoft that controls the branding needs to recognize this.  The Excel pros that drive the majority of business intelligence need both the tools and the respect for what they do.  There is a vast amount of the program that is perfectly accessible to the 80% who use 20% of the program.  Let’s put some focus on the 20% that drive the majority of business intelligence world wide.

Now, having said all that, I want to point out that overall I’m really happy with where Excel has gone.  The Power products were real game changes for me.  So much so that I quit my day job in May to focus on ExcelGuru Consulting Inc, primarily on providing training services to help people get the most out of Excel.  We’ve got some great Excel courses available at GoSkills, and of course our awesome Power Query Training Workshop which will blow your mind.

Over the past 10 years I’ve learned that the Excel team is not just a faceless group of folks working in isolation from the real world.  I was very much of this opinion the first time I made a trip to Redmond… (I was newly minted, and we’d just been given the Ribbon in Excel 2007, so that will give you an idea of my mindset.)  In fact, as I’ve learned over the past 10 years, the Excel team are actually a collection of SUPER passionate people who LOVE Excel, and want it to be the best it can be.  I have a huge respect for all of them.  I count myself very fortunate to be in a place where I get to go meet with them annually and provide feedback, much of which (combined with that of my MVP colleagues) has influenced many different features for the better.

One last thing that makes this program so amazing is that the MVP award gives many of my colleagues a reason to gather in one place annually.  I’ll never forget walking into my first summit and being introduced to the Excel MVPs that had made the trip… the names of legends that I’d been learning from.  And the welcome from them as they calmly just pointed out that hero-worship wasn’t allowed, as I was just one of them.  So odd, humbling and empowering.  Over the past 10 years I’ve made a lot of friends whom I’ve met in Redmond at the summit, and been able to fly around the world to meet even more.  It’s been just fantastic. The summit is the highlight of my year, for the simple reason that I get to go and hang out with people who truly understand me.

I have no idea where the next 10 years will take us, but I’m excited to see it unfold.