Merge Tables using Outer Joins in Power Query

The term “Join” comes from the database world, and I’ll admit that I’ve struggled with understanding it... especially when you combine it with some other keywords.  So this week I thought it would be good to explore it in a bit more detail, specially to show the different results when we merge tables using outer joins.

Different types of Outer Joins

There are actually three flavours of Outer Join you could come across, and each work in different ways.  Those three flavours are:

  • Left Outer join
  • Right Outer join
  • Full Outer join

Which… to an Excel person… mean very little.  To confuse matters more, there are also Inner joins, and Anti joins.  (We’ll look at those next week though.)

Sample Data

To illustrate the different join types, we are going to work with the set of data shown below (which you can download here.)

SNAGHTML1ebe0085

So two tables of data, one called Transactions, and one called ChartOfAccounts.

Now, the key piece you want to watch here is that we need the Account-Dept combination to exist in both tables in order to make a perfect join between them.  Think VLOOKUP for a second… if you tried to make a VLOOKUP for account 10045 from the Transactions table against the ChartOfAccounts table, what would you get?  Of course, you’d get #N/A since 10045 doesn’t exist in the ChartOfAccounts table.

In this case we have items in both tables that don’t exist in the other.  (The yellow records in the Transactions table don’t have a match in the ChartOfAccounts table, and the red records in ChartOfAccounts don’t have a match in the Transactions table.)  With these differences we can test how each of the first three join types available to us behave when we try to merge the data in both tables together.

Groundwork

Of course, the first thing we need is a pointer to each table for Power Query to work.  So let’s set that up first.

  • Click in the Transactions table –> New Query –> From Table
  • Go to Home –> Close & Load –> Close & Load To… –> Only Create Connection
  • Click in the COA Table –> New Query –> From Table
  • Go to Home –> Close & Load –> Close & Load To… –> Only Create Connection

I also right clicked each query in the Workbook Queries pane and choose to move them to a group I called Source, just to keep them organized:

image

This will work nicely.  We have a pointer to both tables, but didn’t cause any data duplication by loading a new copy to a worksheet.

 

Merge Tables using Outer Joins – Left Outer Join

Okay, let’s get started.  We need to join these based on the combination of the Account and Dept number.  Fortunately we can use the trick discussed in this post to do this with creating a concatenated key manually first:

  • Right click the Transactions query and choose Merge
  • Select ChartOfAccounts for the bottom table
  • For the top query (Transactions) select Account, hold down CTRL and select Dept
  • For the bottom query (ChartOfAccounts) select Account, hold down CTRL and select Dept

Your join selections should now look like this:

image

Notice that the Join Kind in the bottom is “Left Outer (all from first, matching from second).  Let’s click OK and see what that means.

When you get into the Power Query editor:

  • Right click the NewColumn column –> Rename –> COA
  • Click the Expand icon on the top right of the COA column
  • Leave all the defaults and click OK

The resulting query should look as follows (barring the colour of course):

image

So this is a “Left Outer Join” – the default choice for Power Query.  It returns all entries in the left table (or top in the case of the power query editor) and returns the matching values it finds based on the lookup column.  Essentially this is the same as VLOOKUP.  It returns a match, except where it can’t find a matching record.  (The main difference between VLOOKUP and Power Query is that if Power Query found multiple matching records, it would return all of them, not just one.)

Again, the key point here is that every value from the Left table is returned, whether there is a match or not in the Right table.  The yellow rows here match the yellow rows in the original Transaction table shown at the beginning of the post.

But… notice also that accounts 10040 and 11000 (the red accounts in the COA table) do not shown in the listing at all.  This is your Left Outer join in action.  It pulls all records from the left table, any matches from the right (or null if no records on the right match).  It never looks at the right side at all to see if records exist there that don’t exist in the left hand table.

Okay, so now we can see what’s happening here, let’s finish it off:

  • Change the name of the query to LeftOuter
  • Go to Home –> Close & Load –> Close & Load To… –> Only Create Connection

Merge Tables using Outer Joins – Right Outer Join

Rather than discuss the difference here, let’s just demo it.  One caveat… it’s important to get a clear understanding that you start with same base table in order to make parallel comparisons!

  • Right click the Transactions query and choose Merge
  • Select ChartOfAccounts for the bottom table
  • For the top query (Transactions) select Account, hold down CTRL and select Dept
  • For the bottom query (ChartOfAccounts) select Account, hold down CTRL and select Dept
  • Change the Join type to “Right Outer (all from second, matching from first)”
  • Click OK

And then we’ll do the same thing we did before so that we can compare the results:

  • Right click the NewColumn column –> Rename –> COA
  • Click the Expand icon on the top right of the COA column
  • Leave all the defaults and click OK

If you followed along correctly, your result should look like this:

image

 

Notice the big difference here?  This time the values from the Right table (ChartOfAccounts) show whether there is a match in the Left (Transactions) table or not.  The red rows (containing 10040 and 11000 from our original table) are now present, where they weren’t in the previous scneario.

But the (yellow) items that were in the Left (Transaction) table which don’t have a match in the Right (ChartOfAccounts) table?  They’re nowhere to be seen!

Where I now know I struggled with this when writing SQL code from scratch is that no one ever explained to me which table was Left and which was Right.  Lacking that knowledge there really isn’t anything to explain what these joins are truly doing.  But now that we can see that the table we start our merge from is the Left table, and the one we are joining to it is the Right table… it starts to make a LOT more sense.  The only kicker we have with Power Query is that the Left table is the top in our merge dialog, and the Right is the bottom.  But you can certainly see why the UI was designed this way (imagine trying to fit it on your screen if it was side by side?)

At any rate, we can now compare and contrast those two joins.  Let’s finish this one off and look at the final join we’ll examine today.

  • Change the name of the query to RightOuter
  • Go to Home –> Close & Load –> Close & Load To… –> Only Create Connection

Merge Tables using Outer Joins – Full Outer Join

By now, you know the drill…

  • Right click the Transactions query and choose Merge
  • Select ChartOfAccounts for the bottom table
  • For the top query (Transactions) select Account, hold down CTRL and select Dept
  • For the bottom query (ChartOfAccounts) select Account, hold down CTRL and select Dept
  • Change the Join type to “Full Outer (all from second, matching from first)”
  • Click OK

And now modify the query:

  • Right click the NewColumn column –> Rename –> COA
  • Click the Expand icon on the top right of the COA column
  • Leave all the defaults and click OK

And what do we have?

image

Aha!  So the Full Outer join makes sure we’ve got all items from both sides.  I can see this being SUPER useful for trying to compare to lists of transactions like in a bank reconciliation.  Transactions that match would get lined up nicely, and any that needed attention would have a bunch of nulls beside them on either side.  Very cool.  (I’ll have to do a post on that some day!)

Wrapping Up

This shows how to merge tables using Outer Joins… I’ll be back next week to show the three remaining join types:

  • Inner Join
  • Left Anti Join
  • Right Anti Join

Identify Duplicates Using Power Query

Some time ago I got an email from Alex asking me if there was a way to identify duplicates using Power Query, but without removing non-duplicate records in the process.  This post explores how to do that.

Background

Suppose someone has given you a list like the one shown below (which you can download here if you’d like to follow along):

image

While multiple brands are okay here, we need a list that shows only unique SKU numbers.  While the list provided to you was supposed to be duplicate free, you’re not 100% sure that it actually is.  While it would be easy to just hit the SKU column with the Remove Duplicates function, you don’t want to do that.  Instead you’d like to indentify which records have duplicate entries in the list.

So how do we do this?

Naturally, there will be a few different ways to do this.  I’m carving off one method that is the easiest to replicate via the user interface…

Step 1:  Link to the Data

Of course we’ll start by pulling the data in to Power Query

  • Click anywhere in the Products Table
  • Create a new query –> From Table

The data will be loaded in to Power Query, and you’ll see two steps in the Applied Steps window:

  • Source (pointing to your source data)
  • Changed Type (setting the data types for the columns)

This might seem like an odd step right now, but we’re going to add a Index column to this table as well.  The reason will become apparent later, but for now:

  • To to Add Column –> Add Index Column –> From 0

Your data should now look like this:

SNAGHTML1693b894

Now we need to figure out how to flag any repeating SKU as a duplicate.

Step 2: Indentify Duplicates via Grouping Rows

The trick here is to use the Group By feature in Power Query, while preserving the relevant matching records.

NOTE:  We cover the Grouping feature in Chapter 14 of M is for Data Monkey.

Here’s how we do this:

  • Go to Transform –> Group By
  • Set your Group By Options as follows:
    • Group By:   SKU Number
    • New column name:  Duplicates –> Count Rows

Next, click the + to the right of the “New Column Name” section to add another detail row.  Set it up as follows:

    • New column name:  Duplicates –> All Rows

When you’re done, the dialog should look like this:

image

And upon clicking OK, the results will show that there are, indeed, items that show up more than once:

SNAGHTML1698d58a

Let’s tweak this a bit, and subtract 1 from each value.  That would give us a truer representation as to how many duplicates there are.

  • Select the Duplicates column –> Transform –> Subtract –> 1

Resulting in the following:

SNAGHTML169b63c7

Much better.  We’re now seeing that SKU 510010 appears to have 1 duplicate entry in the data set.

But there is still an issue here.  When we grouped our records, we lost both the Brand names column, but also any duplicate records.  Since the whole point of this exercise was to Indentify Duplicates but not remove the duplicate records, we’re still not in a good place.

Step 3: Identify Duplicates and Show Duplicate Records

Let’s fix this.  Remember how we added a new step to show “All Rows” for the ProductDetail column?  That step gave us the ability to do something pretty cool… it gave us the ability to get back all the lost records and product detail information we’re currently missing.

  • Click the Expand button at the top right of the ProductDetail column
  • Uncheck the SKU Number option (as we already have it)
  • Uncheck the option to “Use original column name as prefix”

SNAGHTML16a0c46c

 

As you can see, this will bring back all the details we lost earlier.

Step 4: Final Cleanup

But hang on a second.  Let’s look at this output a bit more closely…

SNAGHTML16a24427

Notice, that it re-sorted the data.  That’s not exactly a desirable outcome, as we are trying to flag duplicates for a reason. Maybe we want to know where they exist in an inventory count or we have some other reason for wanting to preserve the original sort order of our data.  It’s for this reason that we added the Index column earlier.  That came through with the All Rows step, so let’s put our data back into its original order.

  • Click the drop down arrow on the Index column –> Sort Ascending
  • Right click the Index column –> Remove

And we can now finalize the query:

  • Rename the query to ShowDuplicates
  • Go to Home –> Close & Load

Step 5: Make the Duplicates Obvious

With the data now in an Excel table, we can make the duplicates even more obvious by applying some conditional formatting to the table.  To do this:

  • Select all the values in the Duplicates column of the table
  • Go to Home –> Conditional Formatting –> Data Bars –> Choose a colour

I chose blue data bars, which makes the data look like this:

image

Conclusion

Our goal is now complete.  We were able to identify duplicates and flag them without removing non-duplicate items.  In addition, we have preserved the original order of the data in case that was important to us for any reason.

Separate Values and Text – Part 2

In this week’s post we’re going to circle back to the original post on how to Separate Values and Text in Power Query from November 4, 2015.  That post attracted a couple of suggestions from Imke Feldman and Bill Szysz with better methods.

Other Posts on the Subject

All three of the last post are closely related and, as a reminder, which you need depends on your specific situation, of course.  Here are the three iterations:

The Issue at Hand

So why do we need to examine this again?  Well, the reality is that the solution I built worked perfectly for the data set that I used.  Bill, however, mocked up some different data which looked like this:

SNAGHTML46ae4721

Now, my friend Scott would tell you that the user (I’ll paraphrase this) “should get a stern lesson on proper data entry practices”, but if the data is already in the system… it’s too late and we have to deal with it.

If you tried my method, you’ll find that it fails, as shown below:

SNAGHTML46b0cc07

Basically, any measure that has a number in it, or commas or spaces mid number… they’re all killers to my routine.  So Bill emailed me to show me how he would approach the situation.

Bill’s Method

I’ve broken the steps back a bit from Bill’s original submission, and built it in a way that I think most users will approach this as you’ll see.  (Bill’s original submission was a bit more perfect, but I show how I would have arrived there trying to build what he ended up with.)

If you’d like to follow along, the source workbook can be downloaded here.

Step 1: Pull in the Data

Of course, to start with, we need the data…

  • Create a new query –> From Table
  • Right click the Quantity column –> Transform –> lowercase

This last step is actually quite important.  The reason is that we now want to split the data apart at the first instance of a character between a and z.  Since Power Query is case sensitive, forcing the text to lowercase means that we won’t miss splitting based on a character in the A to Z set.  It also means that we give Power Query less processing to do, since it only has to look for 26 characters, not 52 (both lower and upper case.)

Step 2: Separate Values and Text

Now that we know what we want to do, let’s do it.  Let’s split the text by the first alpha character:

  • Go to Add Column –> Add Custom Column
    • New Column Name –> Value
    • Custom Column Formula:

=Text.SplitAny([Quantity],"abcdefghijklmnopqrstuvwxyz")

This formula is quite interesting, as it will split by an of the characters between the quotes.  Since we forced the text to lowercase, it will react to any letter of the alphabet from a-z or A-Z.  But there is one small issue… it returns a list, not the text:

image

Since we’re only interested in the first item in this list at the moment (everything that precedes the first letter), we can modify the formula to drill in to just the first element.  To do that:

  • Click the gear icon beside the Added Custom step in the Applied Steps window
  • Modify the formula to read as follows:

=Text.SplitAny([Quantity],"abcdefghijklmnopqrstuvwxyz"){0}

SNAGHTML46bed402

Remembering that Power Query counts from a base of zero, and that the number between the curly braces allows us to drill into a specific item in the list, we then get back a list which includes only the first element, as follows:

image

With this done, we can extract the remaining values from the right using some text functions.  (You can learn more about these in my post on 5 Very Useful Text Formulas – Power Query edition, or by reading Chapter 17 of M is for Data Monkey)

  • Go to Add Column –> Add Custom Column
    • New Column Name –> Measure
    • Custom Column Formula:

=Text.End(Text.From([Quantity]), Text.Length([Quantity])-Text.Length([Value]))

At this point, we can identify an issue in the way we stepped through the process.  Can you see it?

image

In the original data set, the L (for litres) was capitalized.  In our output, it’s not.  If you don’t care about this, then skip step 3, but if you think this is important… we need to modify our steps a bit.

Step 3:  Fix the Lower Case Steps

We caused the issue shown above by converting the Quantity column to lower case.  Because that column sticks around, we really need it to retain it’s original format so that we can split the measure and retain the correct case for the characters.  But ideally, we’d like to do this without modifying that original formula as follows:

=Text.SplitAny([Quantity],"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"){0}

So how?  If we need it converted in order to split with a smaller list, what do we need to do?

The answer is to nest the lowercase step into our first Added Custom step.  Let’s go modify the first Added Custom step:

  • Click the gear icon beside the Added Custom step
  • Modify the formula to read as follows:

=Text.SplitAny(Text.Lower([Quantity]),"abcdefghijklmnopqrstuvwxyz")

 

Now, let’s remove the Lowercased Text step and see if it still works (don’t forget to select the Added Custom1 step after you remove the Lowercased Text step:

image

Note:  If you’re expecting this from the beginning, there is obviously no need to convert to lowercase, cause the error then fix it.  You could skip the pain and just wrap the column in a Text.Lower() function to begin with.  The reason I showed this route is that I find I frequently iterate this way when building my own data cleanup scenarios.

Step 4: Final Cleanup

The only thing left to do is convert the Value column to numbers.  You’ll get an error if you try it though, as there are still spaces in the middle of some numbers (the commas are okay, but the spaces aren’t.)

  • Right click the Value column –> Replace Values
    • Value to Find –> a single space
    • Replace With –> leave this area blank
  • Right click the Value column –> Change Type –> Decimal Number

image

And finally:

  • Rename your query
  • Go to Home --> Close & Load

With the results working nicely:

SNAGHTML46dac7b5

Keep Only Numbers in Power Query

My last blog post was interesting in that I got a few emails about it.  Both Imke Feldman and Bill Szysz sent me better methods, and a blog commenter asked for a slightly different version.  For this post, I’m going to adapt Imke’s technique to show how we can Keep Only Numbers from a string of text (removing all other characters.)

Other Posts on the Subject

Each of these posts will be a targeted to a specific scenario with it’s own idiosyncrasies.  Which you need depends on your specific situation, of course

  • My original post to split off measurements leaving only the numbers (this will only work if there are no numbers in the measurement.)
  • The method in this post (which will remove all numbers – or text – in the input)
  • Bill Szysz’s method to split off measurements (coming in a future post, but better than my original as it doesn’t break when measurements also include numbers)

In this Post:

In this post, we are going to keep only numbers in our data set.  Actually, we’ll also keep spaces and decimals the first time around, but we could easily modify the function to clear those too.  So for our first go, we’ll convert the data in the left column below, to show as displayed in the right column:

image

Of course, I started by just pulling the data into Power Query via the From Table command.

How to Keep Only Numbers

Looking at this from a logic point of view, what we want to accomplish is to remove any character that is not a number.  Ideally, we would like to use a function like this in a custom column in order to do so:

=Text.Remove(text as nullable text, removeChars as any)

The first parameter should be pretty easy, we could just feed in the [Quantity] column, but how would we provide all the characters to the last parameter?

Here’s the cool part… removeChars is an “any” datatype… that means we’re not restricted to a single character, we can actually provide a list.  So all we need to do is find a way to create a list of the characters to remove.

This is where Imke’s email to me was really helpful.  She had a step similar to the following in her code:

CharsToRemove = List.Transform({33..45,47,58..126}, each Character.FromNumber(_))

So what does this do?  It actually creates a list of non-contiguous numbers (33-45, 47, 58-126), then transforms each value in the list into it’s alphanumeric equivalent.  A partial set of the results is shown here:

SNAGHTML82d28c1

For reference, character 32 is a space, 46 is a period, and 49-57 are the values from 0 through 9 – facts that you can discover by changing the values inside the lists.

In order to use this, I just popped into the Advanced Editor, and pasted the line above right between the “let” and “Source=…” lines.  (Don’t forget to add a comma at the end.)  And with a nice list of values contained the the CharsToRemove step, we can now create the custom column from the Source step:

  • Add Columns –> Add Custom Column
    • Name:  Result
    • Formula:  =Text.Remove([Quantity],CharsToRemove)

And it loads up nicely:

image

Now, keep in mind here that the purposed of this is to strip all characters except the numbers.  In the case of things like m2 and m3 in this data set, we’re left with a the final value, but that is exactly what the query is designed to do.

The final M code for this solution is:

let
CharsToRemove = List.Transform({33..45,47,58..126}, each Character.FromNumber(_)),
Source = Excel.CurrentWorkbook(){[Name="RawData"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Quantity", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each Text.Remove([Quantity],CharsToRemove))
in
#"Added Custom"

Keeping Only Numbers

What if we wanted to also remove any spaces and decimals?  Easy enough, just add those values to the original list in the CharsToRemove step as follows:

CharsToRemove = List.Transform({32,46,33..45,47,58..126}, each Character.FromNumber(_))

And the result:

image

Removing Numbers Only

Now let’s keep the text and remove the numeric characters from 0-9 only.  To do this we modify the original list values again:

CharsToRemove = List.Transform({48..57}, each Character.FromNumber(_))

 

 

And the result:

image

End Result

This is pretty neat.  Once we recongnize which character code represents each character, we can build a list of those characters to remove, and take care of them all in one shot.  To put it all together, here is a look at the different views all shown in one table:

image

You can also download the completed file here.

Last PowerQuery.Training Class of 2015!

There will be a regular blog post coming later this week, but we wanted to just throw out a quick heads up that we are currently accepting registrations for the last PowerQuery.Training class of 2015.

Registrations are open now for the class which begins on November 24, 2015.  This will be your last chance of 2015 to get an in depth training class on the best damn tool to hit Excel in 20 years.  (Sorry Power Pivot, but Power Query is going to reach more people overall.)

For more details on why you need to take this amazing live online workshop, check out the details here.

To register, you can follow this link (and click the Register button on the bottom right of the page.)

And don’t forget that when you register you get a free digital copy of our amazing new M is for Data Monkey book too.

Hope to see you there!

Separate Values and Text in Power Query

I recently received a comment on one of my blog posts asking how to separate values and text, especially when there is no common delimiter such as a space separating them.  This is a bit of an interesting one, as there is no obvious function to make this happen.

Background

The scenario the user has here is a list of values with their unit of measure, similar to this:

image

This issue here is that we don’t really have anything to easily split this up, as there isn’t really a good pattern.  Sometimes there are spaces after the values, sometimes not.  The letters change, and there is non consistency to how many characters the values represent.  So how would you approach this?

You can download the sample workbook here.

My Route

I think that a solution for this type of problem is going to be specific to the data in use.  Looking at the sample data, I figured that I can probably bank on all the numbers being at the beginning of the string, and that I probably won’t see something like square meters expressed as m2.  Of course, if that assumption wasn’t correct, I’d have to come up with another method.

At any rate, the angle I approached this was to build a custom function to remove the leading numeric values.  That should leave me with the text values, which I could then replace in the original string.  Let’s take a look.

Removing Numbers

As we recommend in M is for Data Monkey, the way to build a custom function is to start with a regular query that will let us step through each piece you need to do.

So focussing on doing this through the user interface, here’s how I started this solution.

  • Create new Power Query –> From Other Sources –> Blank Query
  • In the formula bar, I typed in 1.07Kg  (no quotes, just that text and pressed Enter
  • I then right clicked the text in the Power Query window, and choose to convert it to a list

image

Of course, you can’t do a ton with Lists in the user interface, so I converted it to a table:

  • List Tools –> Transform –> To Table –> OK

To be fair, I could have started by creating a record or a list from scratch (as we show you how to do in M is for Data Monkey,) but I didn’t really need to here in order to get up and running quickly.  Regardless, I’m now sitting in a nice place where I have the entire UI exposed to do what I need (which was my original goal.)

SNAGHTMLaf3bbeb

At this point, things become pretty easy:

  • Right click Column1 –> Replace Values –> Replace 0 with nothing
  • Repeat for 1 through 9 and the decimal character

This removed all numbers and decimals, leaving me with just text.  But because I know some of the values had spaces in them as well, I should deal with that:

  • Right click Column1 –> Transform –> Trim

image

The final thing I did was to drill into the data point there, as I don’t really want to return a table when I convert this into a function.  To do that I needed to:

  • Click the fx on the left of the formula bar
  • Append the following to the text in the formula bar:  [Column1]{0}

image

Notice that we now have just the data point, not the Column1 header.

Converting the Query to a Function

Now, we’ve got a neat little function that will let me take a data point, sanitize it, and turn it into data point with no leading values.  But how can I repurpose that to use it for every record?  The answer is to turn this query into a custom function, as we describe in  Chapter 22 of M is for Data Monkey.  Here’s how we do it:

  • Go to View –> Advanced Editor
  • Right before the “let” line, add the following:

(Source) =>

  • Go and place two / characters in front of the current Source line in order to comment it out (otherwise it would overwrite the function input)

//Source = “1.07Kg”,

  • Click Done
  • Rename the query to fxRemoveNumbers

That’s it.  We’ve converted it to a function.  Now you can go to Home –> Close & Load to save it and it’s ready for use.  The interesting part here is that creating the logic is the hard part, converting it to a function is deadly easy.

Separate Values and Text

So now let’s use our new function to separate values and text.  Here’s how I did this:

  • Select any cell in the table –> create a new query –> From Table
  • Go to Add Column –> Add Custom column
    • New column name:  Measure
    • Column formula:  fxRemoveNumbers([Quantity])

And we’ve got a nice new column with just the textual values.

image

Not bad, now we just need to figure out a way to replace the matching text in the Quantity column with nothing…  After checking MSDN’s Power Query formula guide, I found a formula called Text.Replace() that seems it should do just that:

  • Go to Add Column –> Add Custom column
    • New column name:  Value
    • Column formula:  =Text.Replace([Quantity],[Measure],"")

 

 

 

To summarize here, we’re going to look at what is in the Quantity column and replace any instance of the text in the Measure column with the value between the two sets of quotes (i.e. nothing.)  The results are shown below:

image

Now it’s just a simple matter of doing some cleanup:

  • Right click the Value column –> Change Type –> Decimal Number
  • Right click the Quantity column –> Remove

image

And there you go.  It’s finished.  We simply need to go to Home –> Close & Load to commit it, and then refresh it any time we need it.

M is for Data Monkey

The book is now available and is packed with good information that will help you solve this issue as well as many others.  Learn more about the book here.

Suggestion to Improve the Pivot Table Experience

This is a special post to to discuss a suggestion to improve the Pivot Table experience, especially for Power Pivot users.

This week I’m at the 2015 MVP Summit in Redmond, WA.  It’s a trip I’m lucky enough to make every year, and certainly one of the annual events that I look forward to the most.  It’s a chance to reunite with my friends in the global community of Excel experts, as  well as make some new friends there too.  In addition, we get the opportunity to meet with the Microsoft Excel engineers, give our feedback, and talk about the things that are/aren’t working in the program.

Of course, this doesn’t mean that they can or will implement the suggestions we have.  Excel is a massive program, and every feature change can cause bigger issues elsewhere.  But they do listen, and they do want this product to be the best it can be.  Like every company, they have to work out what they can afford to do, and where the best investments are for their limit of resources.

In the spirit of the summit, I thought I’d share one of the ideas I have that I think would be really beneficial to Power Pivot users.  Maybe it makes the radar, maybe it doesn’t, but I think it would be a really useful change.  I’m fairly certain it could also be implemented without causing any issues with other features in the product as well.

The Issue

For those working with Power Pivot, you know the power of DAX.  This leads to creating many different DAX measures, each of which are landed in the columns of the Pivot Table.  This is awesome, but it brings up a challenge with the usability of the Pivot Table field list:

image

Back when we just dropped singular fields into the Values area, things weren’t so bad.  I generally only ran with a few fields, and I didn’t feel super constrained by the size of the window.  Yes, I overran the limit on occasion, but it wasn’t a big deal.

With Power Pivot, things have changed.  I have so much more flexibility to write the DAX measures I need, which leads to many more columns being defined.  If you think about things like forecasting an annual cash flow statement, I’ll write at least 13 different measures (one for each month), plus a total.  And that’s just one scenario.  For a regular financial statement the same thing… Actual, Budget, Variances, Year to date Actuals, Year to date Budgets, and so on.  Again, it’s not uncommon to see a statement with over 12 columns.

This proliferation of measures leads us to the issue… the Values are of the Pivot Table field list is too small today.  It only holds 3-4 visible columns at a time.  Trying to move a measure into the right place is a real pain, especially if you add a new measure to the bottom, and you have to drag it up.  I’m sure you’ve had massive “overscroll” problems where the thing seems to speed up to mach 5 JUST as you are trying to move it up that one last row…

The Slightly Better View

The Pivot Table field list has an alternate view called “Field Section and Areas Section Side-By-Side”.

SNAGHTMLec87838

 

This is a bit better, as we can at least see more fields in the area on the left.  But that’s only helpful for scrolling and finding the fields we need, not placing them on the Pivot:

image

You see?  I’ve still only got three rows showing (four when my Excel is maximized on screen.)

But here’s the thing…

When I’m building my Pivot, I rarely end up putting anything in the Filters area, as I tend to use Slicers.  I might have a few fields in there that I don’t want users messing with (I hide the top rows of the Pivot Table), but generally I’m looking at between zero and two fields in there.

And when I build my Rows and Columns, I tend to drag them on the Pivot and call it a day.  I could use more space on occasion when I’m layering on my Row fields, but Columns are usually sufficient.  Especially now that I’m writing DAX formulas.  The measure gets dragged in to the Values area, and doesn’t need anything in the Columns area at all.  It’s partly for this reason that the small size of the Values area is killing me.  The old logic for how the Pivot was build has essentially changed, with the description moving from the Columns area to the Values area.

What that means is that I’ve got a ton of wasted whitespace in my Filters and Columns area.  So why not reclaim that whitespace?

Suggestion to Improve the Pivot Table Experience

So here’s my suggestion to improve the Pivot Table experience: modify the “Field Section and Areas Section Side-By-Side” view as follows (excuse the rough mockup…)

SNAGHTMLed7c5f1

The key changes here are really about the arrows to the right of the Filters, Rows, Columns and Values areas.  These are the same arrows as used in the Field List on the left, where the white arrow pointing to the right shows the area collapsed, and the black arrow shows the area expanded.

To be clear, the proportions aren’t correct here, but my thought is that the expanded areas consume an equal share of the remaining whitespace.  So if all four areas are expanded, they each get a 25% share of the remaining space, as it what we see in the current implementation.

But collapse one field (let’s say Filters), and each remaining area expands, as it now gets a 33% share of the remaining space.  Collapse two (as I’ve shown above), and the remaining two get 50% each.  Collapse three, and all remaining whitespace goes to the final area:

image

This would be fantastic, as it would let me build my Pivot much more easily.  I’d be able to see what I’m working with, especially on Pivot Tables with higher levels of Row or Values fields.

I didn’t scope this in, but it would also probably be a good idea to append a number in parenthesis to each area as well, indicating how many fields exist in each area.  So in this case: image

Naturally, when you’re first building a Pivot, it should open with all areas expanded to 25% of the share… but bonus points if there is a way to save the default view for a configured Pivot.  The reason that I say this is that my guess is that 75% of the time when I’m modifying a Pivot it’s the Values area I’m doing, 20% is Rows, 4% is Columns and the remaining 1% of the time I’m modifying Filters. Respecting that others have different uses though, the ability to choose which fields are expanded/collapsed by default on an already existing pivot would be incredible.

At any rate, that’s my idea.  Here’s hoping a program manager on the Excel team thinks there’s merit to it and starts to look at the feasibility.  Feel free to share your thoughts on the subject below.  :)

If you like this idea...

Please throw it some votes at Excel UserVoice.  The more votes it gets there, the more likely it will be implemented!

Merge Data Based on Two Columns

This past weekend I attended SQL Saturday in Portland, OR.  While I was there, I attended Reza Rad’s session on Advanced Data Transformations with Power Query.  During that session, Reza showed a cool trick to merge data based on two columns through the user interface… without concatenating the columns first.

The Issue

Assume for a second that we have data that looks like this:

SNAGHTML64de019

There’s two tables, and we want to join the account name to the transaction.  The problem is that the unique key to join these two tables (which isn’t super obvious here) is a combination of the Acct and Dept fields.  (Elsewhere in the data the same account exists in multiple departments.

To get started, I created two connection only queries, one to each table.

  • Select a cell in the left table (Transactions) –> create a new query –> From Table –> Close & Load To… Connection only
  • Select a cell in the right table (COA) –> create a new query –> From Table –> Close & Load To… Connection only

My Original Approach

Now, with both of those created, I want to merge the data so I get the account name on each row of the Transactions table.  So how…?

Originally I would have edited each query, selected the Acct and Dept columns, and merged the two columns together, probably separating them with a custom delimiter.  (This can be done via the Merge command on the Transform or the Add Column tab.)

Essentially, by concatenating the columns, I end up with a single column that I can use to dictate the matches.

Reza’s presentation showed that this isn’t actually necessary, and I don’t need to merge those columns at all…

Merge Data Based on Two Columns

So here’s how we can get those records from the COA Table into the Transactions table:

  • Right click the Transactions query in the Workbook Queries pane
  • Choose Merge
  • Select the COA query

The data now looks like this, asking for us to select the column(s) we wish to use for the merge:

SNAGHTML70855ae

So here’s the secret:

  • Under Transactions, click the Acct column
  • Hold down the CTRL key
  • Click the Dept column

And Power Query indicates the order of the columns you selected.  It will essentially use this as a temporary concatenated value!

image

So now do the same to the COA table:

image

And then complete the merge.  As you can see, you get a new column of data in your query:

image

of course, we can expand NewColumn to get just the Name field, and everything is working perfectly!

image

End Thoughts

This is pretty cool, although not super discoverable.  The really nice piece here is that it can save you the work of creating extra columns if you only need them to merge your data.

I should also mention that Reza showed this trick in Power BI Desktop, not Excel.  But because it’s Power Query dealing with the data in both, it works in both.  How cool is that?

Breaking Power Query via Power Pivot is a thing of the past

I’m pleased to let people know that breaking Power Query via Power Pivot is a thing of the past … at least for users of Excel 2013 or higher.  (Sorry, if you’re on 2010, you still need to be careful.)

The information has been around for a bit, and it’s one of the topics we cover in our http://powerquery.training/course as well: how to break your Power Query by doing one of the following actions in Power Pivot:

  • Renaming a table
  • Renaming a column sourced from Power Query
  • Deleting a column sourced from Power Query

Any of these three actions would set your query into an un-editable state, but worse, nothing would appear to happen.  The query would refresh as normal, until you eventually tried to change it.  At that point all hell would break loose and your only option was to rebuild your query (and related data model table) from scratch.

This has been covered in detail in the following sources:

But now, breaking Power Query via Power Pivot is a thing of the past…

This issue was fixed in Excel 2016, but it left many of us hanging with an older version that still exhibited the problems.  If you’re on 2013, however, that problem has now been fixed.  I share the links at the bottom of the post to make sure you’re updated, but first I’ll demonstrate that the fix is really working.

To set the stage, I created a simple Calendar table in Power Pivot, and loaded it to the Data Model.

Corruption Method #1:  Deleting Columns

My first test was to attempt to delete the Year column in Power Pivot.  At first it looks like nothing has really changed:

SNAGHTMLf5b1a5d

But when I click Yes, Power Pivot comes back with a message to let me know that I can’t do it after all:

SNAGHTMLf5c18b3

Hooray!  This is fantastic news, as it means that I can’t actually destroy my entire data model.  Beautiful!

Corruption Method #2:  Renaming Columns

Next I tried to rename the Year column to myYear.

SNAGHTMLf5d7bae

Nope.  Can’t break the model that way either.

Corruption Method #3:  Renaming the Table

Finally, I tried to rename the table from Calendar to myCalendar:

SNAGHTMLf614112

And it looks like we’re protected from shooting our model in the foot too.

My thoughts on the fix

I’m 99% happy with this fix.  It protects us from accidentally blowing up our data models, which is super important.  Especially because it was possible to break the model and still run for months without every realizing it.  That just shouldn’t be allowed to happen.  So why am I not 100% happy?

Well, the first part is that Excel 2010 users are still susceptible to the issue.  That’s a challenge, although to be fair Microsoft has been pretty forthcoming that the Load to Data Model hack is not truly a supported method anyway.  So really, there’s not much of a surprise there.  I’m not holding any points back on this one.

The last part – the remaining 1% for me - is that the fix, as implemented, means that you cannot ever rename a table in Power Pivot that was source from Power Query.  In fact, even if you go back to Power Query and rename the table there, it still shows under the original name in Power Pivot.  Granted it’s not a total show stopper, but you do want to give some thought to your query naming before you push it into the data model that very first time.

How can you ensure you have the fix?

If you’re running automatic updates for Office 2013, you should already have the fix in place.  But if you want to check (or you don’t), then here’s the deal:

The full support KB article on the subject can be found here.

It will direct you to install the following updates:

  • KB3039800: update for Office 2013 – From October 13, 2015
  • KB3039739: update for Office 2013 – September 8, 2015
  • KB3085502:  MS15-099 security update for Excel 2013 – September 8, 2015

(There is a 32 and 64 bit version of each, so make sure you pick up the right version.)

For reference, I just tried to install them, without checking if they’d been installed first.  Fortunately it does a check first, so for me each of them came back with a message like this:

image

So there, you go.  Great news for users of Power Query and Power Pivot 2013 and higher.  You can now model with the confidence that you won’t accidentally blow up your solution!

MZ-Tools 8.0 for VBA

One of my favourite add-ins of all time just got an upgrade, and I’m super stoked about it.  Why?  Because I can use it again!

As I began my VBA journey, there were two add-ins that I used all the time:

Both were invaluable, with SmartIndenter allowing right click access to re-indent code, and MZ Tools providing a TON of useful content.  (My favourite was the error handling template I could just inject with a couple of clicks.)

It became painful to work on or debug VBA code on anyone’s PC who didn't’ have these tools installed, and the became part of the default installation routine for my machine.

Why I’ve been Add-in free for years

Unfortunately, both MZ Tools (3.0) and SmartIndenter were written in VB6, which meant that they were restricted to the 32 bit versions of Excel.  And that meant that the day I started using Power Pivot, I lost the ability to use either add-in.  (Okay, to be fair I could have stuck with 32 bit Excel for Power Pivot… except there was no way I was doing that.  The need for more memory accessed trumped the tools that made my VBA life easier.)

I’ve now been running without the aid of these tools for about 5 years… which is shocking… and STILL miss them.  A few times over the last few years, I even made some attempts to replicate some of these features on my own, but I could never figure out how to get VB.NET to hook into the VBIDE, so gave up on it.  Instead I focussed on tools I could control, building add-ins and software in other areas.  (It always irked me that I couldn’t figure out how to hook the VBIDE though!)

No longer Add-in Free

For that reason, I was pretty jazzed when Carlos Quintero emailed out to say that he’s updated and released not only MZ-Tools for Visual Studio, but also MZ-Tools 8.0 for VBA.  That is FANTASTIC.  I’ve downloaded it, got it installed, and am already digging through the loads of features to customize my templates.

Unfortunately I’m not such a good judge of what’s new in this version (my memory of it is five years out of date) but here’s some of the stuff that I’m looking forward to (re-)acquainting myself with:

  • Dead code review.  I’ve already scanned a couple of my add-ins and found unused variables and unused routines that can be trimmed.
  • Statistics.  Kind of a vanity thing, maybe, but I’ve always wondered how many lines of code are actually in my XLGFileTools add-in.  As of today, the answer is 6,726.  (Maybe a couple less once I review the Dead Code report above)
  • Code templates:  I can’t wait to rebuild the error handling template.  I also remember in the past the ability to insert a comment block at the top of each routine/module very easily for documentation too.
  • The simple thing of being able to right click the Immediate window and choose Clear.  Oh my how I’ve missed you!

These are just some highlights, there are obviously tons more.

Worth the cost

if you look back you’ll see I don’t endorse many products, and certainly not as passionately as I am here.

The goal of MZ-Tools is to make your everyday programming life easier.  I 100% believe that it does that, and that it is worth the cost to purchase it – something I don’t say very often!  (Understand I’m not making any commission or advertising revenues off this, either.)  The software is just that good and useful.

But even better, if you are in the market for it, Carlos has a 50% sale on through the end of October.  That will save you $40 off the regular $79.95 price tag.  How can you beat that?

You can find it at http://mztools.com/index.aspx 

Happy coding!