Power Query Dependencies Viewer

The November 2016 update is now out and it finally brings a way to view the Power Query dependencies viewer.  While it’s been out in Power BI Desktop for a while, (as Matt posted about a while ago,) this is huge news for Excel, as this feature has been badly needed.

Viewing Power Query Dependencies

To get to the Power Query Dependencies view, you simply need to perform the following steps:

  • Edit any query (just to get into the Power Query editor)
  • Go to the View tab
  • Click the Query Dependencies button

image

Once you do so, you’ll be launched into the Power Query dependencies windows as shown below:

image

At first glance…

So at first glance, this is pretty exciting and yet – if you work with complicated Power Query setups like I do – you’ll find the Query dependencies view a bit… lacking in some areas too.

First off, if your query is complicated, it really does open that small.  Wow.  Now there is a scaling button down the bottom, but that quickly scales so that stuff is off-screen.  No problem, right?  We’ll just drag some stuff around then… oh… except we can’t.  Dragging any box around drags the entire model, not just that one box.  Sad smile

What can you do with the Query Dependencies viewer?

Rather than focus on the stuff we can’t do, I want to take a look at what we can (although I won’t be able to help making a couple of suggestions here as well.)

Maximizing the model layout

The first thing to point out is that despite the fact that it isn’t obvious, the window is resizable.  If you mouse over any border or corner you’ll get the arrows that indicate you can left click and drag to make the window bigger.

So normally the first thing I do is:

  • Move the window to the upper left of the screen
  • Drag the bottom right corner to make the model fill the entire screen
  • Click the little box icon in the bottom right corner by the scroll bar to “Fit to Screen”

After all, the reason I’m using this view is because the models are big!

Some things that would be really useful here:

  • It would be awesome if there was a Maximize button near the X in the top right (like the Power Query window and every other app has.)
  • It would also be good if we could double click the title bar and have it maximize the window (again, like so many apps out there.)

Either (or both) of those features would save me a lot of time.

Alternate Views for Tracing Query Dependencies

In the default view, the data sources are plotted at the top, and the queries cascade down below.  Fortunately you’re not stuck with this view, there are four different ways to display the model:

image

In this instance I’ve chosen Left to Right, which puts the data sources  on the left and fans the query dependencies out to the right hand side.

Honestly, if I had my preferred way it would probably be to use Bottom to Top (data sources at the bottom and data model tables on the top.)  To me this should basically “bubble up” the model tables to the top of the screen.  Unfortunately it doesn’t quite work like that… all we can guarantee is that the data sources will be at the bottom, but the model tables could end up anywhere.

Ideally, I’d love to have an option to force the Data Sources to be lined up based on the first choice in that menu, and the Load Destinations (whether table or data model) be lined up in the viewer based on the option chosen for the second choice.  This would allow me to easily see the “From” and “To”, with the chain of what happened in between.

Tracing Query Dependencies

In the image below (click on it to see the larger version), I’ve selected one of the tables in the middle of the query dependencies tree:

image

The effect is that it highlights all child and dependent queries in the data flow.  That’s cool, and I’m okay with this being the default behaviour when I select a query step.  Wouldn’t it be cool though, if we also had:

  • A right click option to trace precedent queries only
  • A right click option to trace dependent queries only

Those would be super helpful in tracing a queries flow without the extra noise, something that is really important in able to quickly dig in to the key factors you probably want to know about your query dependencies.

Identifying Load Destinations

So the very first thing I did when I threw this specific model into the query dependencies view was identify two queries that were not in the query chain.  “Awesome,” I though, so I went and deleted them.  Then I restored from backup, as one of them was in use!

Don’t get me wrong, the view was correct, it’s just that the distinction for load destinations is so weak that I saw no arrows and assumed it was good to be removed.  As it turns out, the words actually matter here:

image

The Day Types table is created from a hard coded list.  Since there are no queries flowing in or out of it (it is floating above the lines) I nuked it.  I missed the fact – especially with it being on the left), that it was actually loaded to the data model.

Raw Data-Departments, on the other hand, is pulling from the Current Workbook and is loaded as “Connection Only”.

So here’s my thoughts here:

  • I’d love to see nodes that are loaded to worksheets or the data model identified.  Either an icon in the top right, or a shading  in place would be ideal.  Something that makes them a bit less subtle than they are today.
  • I’m not a fan of the “Not loaded” term… it’s about as awesome as the “Load Disabled” that Power Query used to use about two years ago.  This should – in my opinion – be consistent with the UI and should read “Connection only”.  Not loaded makes it look like it isn’t working.

Navigating Query Dependencies

One of the issues I had above is that my Day Types table – being standalone – should not sit on top of any arrows… that’s just scary bad and misleading… but that’s actually part of a much bigger issue as this is kind of the style used throughout the entire tool:

image

This also leads me to another issue in that I need to be able to follow these arrows.  Today the only ability you have – because you can’t move the boxes – is to essentially print the query dependencies window (you’ll need screen capture software for that since there isn’t a print button) – and trace with a highlighter.

What I’d love to see in this instance is the ability to select a single (or multiple arrows) and have them turn bold.  It would be an even bigger bonus if they shaded the tables on each end of the arrow and allowed you to select multiple arrows.  That would actually solve a few issues mentioned earlier too, allowing us to really drill into the relationships we need to trace.

Overall Impressions of the Query Dependencies Viewer

Overall it’s a good first version.  I’d really love to see some (or all) of the improvements I mentioned above, but it’s a HUGE amount better than what we had a month ago.  Smile

Extract Data from a Mixed Column

More and more I’m seeing examples where people are trying to extract data from a mixed column.  In other words, they have two data types in a single column, but need to find a way to extract one from the other.

Examining the issue

The sample data I’m using can be downloaded from this link.

I’m going to use Power BI Desktop for this, but the results will look identical in Excel using Power Query (except for the colour, of course.)

So let’s get started:

  • Get Data (new Query in Excel) –> From CSV –> MixedDataInColumn1.csv
  • Promote First Row as Headers

The issue can be seen in the red circles below… the report author injected the name of each vendor for the parts above their first part in the list.

image

So the issue here is how to extract the vendor name from Part No column.  The problem is that there isn’t any obvious way to do this.  We have different textual values in all columns, which could change over time.  There’s really nothing that we can test for reliably in this case.

How to Extract Data from a Mixed Column

There are actually a few different ways to extract data from a mixed column… a few of which we demonstrate in our Power Query workshop.  I’m going to show just one here.

Step 1 – Identify a column with a pattern you can exploit

The key we are really looking for is a column which has values or dates for all rows other that the one with our vendors.  In this case we actually have two: Part No and Cost.  Both have text on the Vendor lines, but what looks like values on the rest.  The challenge we have here is that we can’t always guarantee that Part No won’t have text in it.  It’s completely possible we could see a part number like TH-6715 or something. So this leaves us with the Cost column.

Step 2 – Duplicate the identified column

This next set of steps is actually the trick that lets us work this out.

  • Right click the column in question and choose Duplicate Column
  • Right click the Cost – Copy column –> Change Type –> Whole Number
  • Right click the Cost – Copy column –> Replace Errors –> null

You should now have null values where the textual values were located:

image

Step 3 – Use a little conditional logic

We now have something that we can use in order to extract the Vendor name.  So let’s build a little bit of conditional logic:

  • Add Column –> Conditional Column
  • Configure the Conditional Column as follows:

image

The only trick here is to make sure you change the Output to a column so that you can select from the list of columns.

  • Click OK
  • Right click the Vendor column –> Fill Down

The result is shown below:

image

Step 4 – Clean up

We’re now at the point of clean up which entails:

  • Filter the Cost – Copy column to remove null values
  • Delete the Cost – Copy column
  • Set the data types on all columns

The results now look as follows:

image

At this point we can commit the query and we are good to go.

Final Thoughts

This is not a new trick by any means; I’ve been using it for a long time.  The biggest key is really about identifying patterns and thinking outside the box.

It’s unfortunately very easy to get focused on the primary column we want to solve, and lose site of the others.  (Trust me, I’ve been there too.)  Sometimes though, when a column is particularly tough to deal with, we need to just step back and take a look at the bigger picture to see if there is a pattern in another column that we can exploit.  In fact, I’d say that this is probably one of the most important things to master when working with Power Query.

October News and Events

It’s a busy month here at Excelguru. Instead of a technical post we wanted to catch everyone up on our October news and events!

Live Course: Master Your Excel Data October News and Events

Ken is teaching a LIVE, hands on course in Victoria, BC on Friday, October 21 from 9:00am-4:30pm. This session is great for anyone who has to import and clean up data in Excel and will change the way you work with data forever! Ken will teach you how to use Excel Tables, Pivot Tables and Power Query. Space is limited to only 20 attendees, so don't miss out on your chance to sign up. For full details and to register for the session, visit: http://www.excelguru.ca/content.php?291-Live-Course-Master-Your-Excel-Data.

October News and Events: Power BI Meet-up

The next Vancouver Power BI User Group meet-up is happening on Thursday, October 13 from 5:30-7:00pm. Scott Stauffer, Microsoft Data Platform MVP, will be presenting on How to Operationalize Power BI. Together we’ll look at some solutions that might help pass your Power BI solution over to IT to manage enterprise-wide. Dinner and soft drinks will be provided. View the full details and sign up to attend at: http://www.meetup.com/Vancouver-Power-BI-User-Group/events/234126999/.

Microsoft MVP Award Received

For the 11th straight year, Ken has received the 2016 Most Valuable Professional Award from Microsoft! The previous 10 years, Ken’s award has been in the Excel category, but this year’s award is in the Data Platform category. The new category reflects the work he’s been doing this past year with Power Query and Power BI. Congratulations Ken, your guru status remains assured.mvp_horizontal_fullcolor

Our Team Has Grown

As we mentioned the other day, Rebekah Sax has recently joined the Excelguru team. She brings with her a wealth of experience in marketing, communications, event planning and administration. Please join us in welcoming Rebekah as she helps us make new connections and continue to grow.

Fix: Excel Formulas don’t update in Power Query tables

If you’re new to Power Query, chances are you’re more comfortable doing tricky mathematics using Excel formulas, rather that Power Query formulas.  No shame there, but you’ve probably run into a situation where you set up the formulas, refresh your query and the Excel formulas don’t update in Power Query 's output table.

I’ve worked with this issue for a long time, and it’s actually caused me to avoid using Excel formulas in tables generated via Power Query all together. Having said that, there is now an easy way to fix this which renders that avoidance obsolete.

The Issue:  Excel Formulas don't update in Power Query tables

Let’s take a quick look at this scenario.  We have a simple table called Animals as follows:

SNAGHTML2968fa38

And it gets landed in another table.  But in this table, we added a new column called “Est” to the end, which holds the following formula: =[@Price]*[@Quantity]

SNAGHTML296a33f0

So far so good, but what happens when we add a new line to our Animals table and refresh it?

SNAGHTML296c10ce

Plainly, this is not good at all!

The Fix:  Excel Formulas don't update in Power Query tables

The fix is remarkably simple, once you know what to do:

Step 1: Change the Table Design Properties

  • Select any cell in the OUTPUT table (the green one)
  • Go to Table Tools –> Design –> Properties (External Table Data group)

SNAGHTML2970d8de

  • Check the box next to Preserve column soft/filter/layout and click OK

image

Now, at this point, nothing appears to change.  In fact, even refreshing the table seems to make no difference.

Step 2: Ensure the Formulas are consistent

The reason the formulas didn’t fill correctly for us is different now.  It is entirely based on the fact the formula in the last column is no longer consistent.  Naturally, that means that Excel won’t auto-fill the formula, as it doesn’t know which is correct (the formulas or the blank cell.)  We need to fix that before this will work for us.

  • Copy from the first formula cell down the entire column (I've got reports that this DOES matter, and that copying from another cell may not fix it.)

Our data should now look something like this:

SNAGHTML297810e4

Step 3:  Test it

And now, when we add new data and refresh the Power Query…

SNAGHTML297934e2

Wrap-up Thoughts

On my Excel 2016 this behavior is now default.  I don’t know when it changed, to be honest.  And if your behavior is different, I’d love to know.  I’m running the Office Pro Plus subscription – first release.

On Excel 2010/2013, the old default of not updating the tables appears to prevail.  It’s actually for this reason that I covered this, as it came up as a question in my Power Query forum.

I’m not sure if this is good or bad, but this setting can/must be managed for each output table individually.  There doesn’t seem to be a way to set one behavior or other to apply to all tables.  To be honest, I think they’ve got it right in Excel 2016, so at least it’s fixed if you’re current.  (And for reference, my understanding is that this required a patch to Excel, not Power Query, which is why I suspect that we likely won’t see it fixed for Excel 2010/2013.)

July 2016 Power Query Update

Hey folks,

I'm actually on vacation, so this post is going to be short.  I just wanted to make sure you all are aware that there is a new Power Query update available.

New features in the July 2016 Power Query update:

  • New SAP HANA connector.
  • New SharePoint Folder connector.
  • New Online Services connectors category.
  • Improved DB2 connector, now leveraging the Microsoft driver for IBM DB2.
  • Improved Text/CSV connector, now exposing editable settings in the preview dialog.
  • Improved relational database connectors, now including Schema information as a part of the Navigation hierarchy.
  • Data Source Settings enhancements, including “Change Source” capability.
  • Advanced Filter Rows dialog mode within the Query Editor.
  • Inline Input controls for Function invocation within the Query Editor.
  • Support for reordering Query Steps within the Query Editor by using drag and drop gestures.
  • Date picker support for input Date values in Filter Rows and Conditional Columns dialogs.
  • New context menu entry to create new queries from the Queries pane within the Query Editor.

My Thoughts (without actually using it yet)

Now you can get full pictures at the official blog from Microsoft, but I'll just call out a couple that I think are pretty darned important from a usability perspective.

  1. Continuing with last month's update where we got Drag and Drop for the query groups, we now get Drag and Drop for the query steps.  That is just plain AWESOME.
  2. The new Advanced Filter dialog looks pretty good.
  3. The Date Pikcer also looks pretty helpful.
  4. A context menu to create new queries is also SUPER helpful.  One thing I'd like to see added here, is the ability to set each new query to load to connection/table/data model from INSIDE the query editor.  (Currently, the choice you make is applied to ALL new queries - the main reason I have my defaults set to load to connection only.)

June 2016 Power Query Update

Yesterday, Microsoft released the June 2016 Power Query update.  Even though there are only four items on the list of new features, some of them are quite impactful.

What’s new in the June 2016 Power Query Update

The four new features are:

  • Conditional Columns
  • Column type indicator in Query Editor preview column headers
  • Reorder Queries and Query Groups inside Query Editor via drag and drop gestures
  • Query Management menu in Query Editor

Microsoft has a blog on this here, but let me hit these quickly in reverse order to give my comments as well:

Query Management Menu in Query Editor

Honestly, to me this is kind of a throw away waste “button for the sake of a button” kind of feature.

image

Does it make things more discoverable?  Maybe.  But we can get to all these features by right clicking the query in the Queries pane on the left of the editor.  Personally, I would have rather seen them give me a feature to “pin” the Queries pane open and set that as a default, as I find the navigation from that area much more useful:

image

 

Reorder Queries via Drag and Drop

This is great… so great in fact, that the only real question is why it hasn’t worked in the past.  Time & resources is the answer, but it’s now working the way you’d expect it to work.

image

PS, if you don’t know how to group your queries, right click on one, say “Move to Group” and select New Group.  Pretty handy for keeping things organized.

Column Type Indicator

This is BY FAR the most important of the upgrades.  The reason is that this has been a deadly area of weakness since day one.  If you’ve ever been burned by an “any” data type, you know why.  And if you haven’t… hopefully this will help ensure you don’t.

We can now plainly see which columns have been defined with each data type:

image

Notice how easy it is to tell that the “Client, Task and Notes” fields are text (as shown by the ABC icon in the column header.)  Hours is a decimal number, rate is a whole number, and Date… is undefined.  That one needs attention as indicated by the question mark.  Very visual, and very badly needed for  a long time.  This one feature is, in my opinion, worth the upgrade.

Conditional Columns

This is also a pretty cool feature, as it lets a non-coder build an if then else (if) statement.  Full caveat here: this is the image from the official Microsoft blog, not one of mine, but it shows you the general idea:

June-2016-updates-for-Get-Transform-in-Excel-2016-1.png (1282×809)

As cool as this is, there are some issues here:

  1. You can only feed out full columns as outputs, not formulas/equations.  So if I wanted to check a column and return [Hours]*[Rate] in one case and [Hours]*1.5*[Rate] in others, it won’t work.  (Instead I’ll get text.)  To do that you’ll still need to write your formulas manually.
  2. You can’t provide IFERROR style logic to check if something errors and react accordingly.  To do that you’ll still need to create your own custom column formula using the “try otherwise” formula.
  3. Assume you created a custom column using the “Add Custom Column” button, and manually wrote your “if then else” formula.  You then committed it and want to change the logic, so you click the gear icon in the applied steps window… and you’ll be taken to the Conditional Column interface shown above, not the original window where you can create more complex logic.  So if you want to modify that formula to be more complex than this new interface allows, you’re now going to have to go to the Advanced Editor window.  I have suggested to Microsoft that they need a button to return is to the previous interface for this scenario.

Despite the shortcomings, we should recognize that this is a great new feature.  You can test if one column compares (match, doesn’t match, greater than, etc) another column or specific value without having to manually write any M code formulas.  You also aren’t obligated to feed out a column’s value, but rather can feed out text or values too.  So as long as your logic needs are fairly simple, you can use this feature.

Download the June 2016 Power Query update

You can pick it up from Microsoft’s site here:  https://www.microsoft.com/en-ca/download/details.aspx?id=39379

Also, I’ve started holding on to the previously released installers should you ever need to regress to a prior version.  You can find the installers I have in my forum here:  http://www.excelguru.ca/forums/showthread.php?5745-Installing-Power-Query

New Vancouver Power BI User Group

I’ve been thinking about this for a while and, after discussing it with a couple of others who are passionate about Power BI…  I’m pleased to announce that we have created a new Power BI User Group in Vancouver, BC!

What is the Power BI User Group about?

The goals of this user group are fairly simple:

We plan to meet monthly, and have a presentation on using Power BI technologies.  (This could be Power BI Desktop, Excel, Power Query or Power Pivot.)  Whatever it the presentation, and no matter how focussed it is on a specific area, it will ultimately be relevant to the over-arching Power BI path of taking your data from raw form to a published dashboard.  This user group is basically dedicated to bringing you content to inspire you and make you an expert in the Power BI technologies in your company.

Our secondary goal is to be a networking group for Power BI professionals.  If you’ve ever felt like the only one in your company that actually understands what you do… well that’s why we are here.  To give you someone to swap stories with, get ideas and maybe even change your career goals.  🙂

Oh… and did I mention that another goal we have is to keep these events free for attendees?

How can you get involved?

There’s actually a few ways you can get involved with us…

If you’re looking to attend…

Then it’s simple.  Sign up at our Meetup site.  Then attend a meeting. That’s it.  No cost, no fuss.  All we ask is that you register in advance and attend if you say you’re coming.  (We have limited space in our venue right now, only able to seat about 25 people.)

If you’d like to sponsor the event…

We are looking for a sponsor to cover the cost of pizza and soft drinks for our user group attendees.  It shouldn’t be much, and we’d be happy to tell everyone how awesome your company is.  If you’d like to come on board as a sponsor, please get in touch with me via my contact form.

If you’d like to speak…

Got something cool that you’ve built using the Power BI technology stack?  Would you like to talk about how to actually get Power BI traction in a corporate environment?  Got some other relevant topic that you’re passionate about?  Come to an event and chat with us.  One of our key goals is to make sure we have good variety in our speakers!

When is the first meeting?

Great question!  We’re going to be meeting Thursday, July 14 at 5:30pm in downtown Vancouver.  I’ll be presenting on how to build this self updating Power BI dashboard which is originally sourced from PDF files.

Keep The Most Recent Entry

This week’s post was inspired by a question in my Power Query help forum.  The poster has a set up data, and needs to keep the most recent entry for each person from a list of data.

Background

I never saw the user’s real data, but instead mocked up this sample, which you can download here:

image

Obviously it’s fairly simple, and the key thing to recognize here is what we’re after.  What the user needed was this:

image

As you can see, we want to keep the most recent entry only for each person.  In the case of Fred and Mark that is Mar 31, but Jim didn’t have any activity for March, with his last entry being Feb 29.  So how do we do it?

1st attempt to keep the most recent entry

I figured this was pretty easy, so advised the poster to do the following:

  1. Pull the data into Power Query
  2. Sort the Date column in Descending order
  3. Use the Remove Duplicates command on the Student column
  4. Give the query a name (I called mine “Unbuffered” for reasons that will become clear)
  5. Load it to the worksheet

Easy enough, right?  Except that we actually got this:

image

 

 

 

Huh?  What the heck is going on?  I tried changing the dates to text in an attempt to steal away Power Query’s ability to sort based on dates.  (Okay, it was a shot in the dark, and it didn’t work.)

As it turns out, the “Table.Distinct” command that is used to remove duplicates IGNORES previous sorts, going back to the original data sort order.  I’ll admit that this completely shocks me, and is not at all what I’d expect.

So how do you keep the most recent entry?

There’s a few potential ways to deal with this:

  • Sort the data before it comes into your query.  This could potentially be done in a staging query, via a SQL sort command or some other method.  The challenge is that this isn’t always practical (using that custom SQL query breaks query folding, right?)
  • Issue some kind of command (like a group by) that creates a new table which is already sorted in the correct order.  Again, this would work, but really seems unnecessary unless you have some other need to do so.
  • Sort the table, then buffer it before removing duplicates.

Huh?  “Buffer” it?

Using Table.Buffer() to help keep the most recent entry

I’m not a master of explaining Table.Buffer() (yet), but basically you can look at it like this:  It pulls a copy of the table into memory, preventing Power Query from recalculating it.  This can be super useful if you’re passing tables to functions, but in this case can help us lock down the previous query steps before applying the duplicate removal.  When the query state is buffered, that is the “most recent” copy that Power Query will revert to.

Rather than adjust the previous query, here’s what I did in order to create the working solution:

  • Duplicated the “Unbuffered” query
  • Renamed the new query, calling it “Buffered”
  • Selected the “Sorted Rows” step we generated (just before the “Removed Duplicates” step
  • Clicked the fx icon in the formula bar

image

As I’ve mentioned a few times on this blog, this creates a new step that simply refers to the previous query step.  I then just wrapped the text for the new Custom1 step in the formula bar with Table.Buffer():

image

 

And when you hit Close & Load, you get a different result that our previous query… you get the result we actually wanted:

image

So what’s happening here?

First, just to be clear (before Bill or Imke call me out on this), inserting the new step wasn’t entirely necessary.  I only did this to demonstrate the key difference in a distinct step of it’s own.  I could have easily just wrapped the Sorted Rows step in Table.Buffer() and it would have worked fine.  🙂

The key difference here is that the Table.Distinct() command we use the Removed Duplicates step will go now only go back so far as the buffered table.  From the Excel user’s perspective, it’s kind of like we’ve been able to copy all the steps before this, and lock them in with a PasteSpecial command, and point Power Query to that version of the data instead of looking back at the original.

Cool!  I’m going to use Table.Buffer everywhere!

Um… don’t.  That’s actually a really bad idea.

Table.Buffer() needs to be used with a bit more care than that.  Every time you buffer a table, it needs to be processed and written into memory.  That takes resources.  You only want to use this command when it makes sense.  Some places where it does:

  • When you need to lock down previous steps to prevent things being ignored, like in the case above
  • When you want to pass a table to a function.  If you don’t buffer it first, the table may get re-calculated/refreshed before being passed into the function.  If you’re doing this for every row, that can be a lot of re-calculations.  In this case it may make sense to Buffer the table, then send the buffered table into the function.  Even though the buffering takes overhead, it only happens once, which can speed things up

Just also remember that the instant you buffer your table, you break any query folding ability, as the data is now in Excel’s memory space.  Something that is worth consideration if you’re doing large operations against a database.

Cartesian Product Joins (for the Excel person)

While I was at the PASS BA conference in San Jose, CA last week, I got an email from a reader asking if Power Query could create a Cartesian Product join.

Now I’m an Excel guy, and I’d never heard this term before.  Fortunately, I got the email while I was sitting around the table with a few of my geeky friends, many of whom came from the database world.  This was cool as their answer wasn’t “What is that?”, it was “Why would you want to?”  (As it turns out, there are some VERY good uses for this technique.)

Regardless, mine is not to wonder why, but rather to see if things can be done.  And, as you might expect, we can absolutely create a Cartesian Product (or Cartesian Square)using Power Query.  And actually, it’s REALLY easy when you know how.

Cartesian Product for the Excel person

So what the heck is a Cartesian Product anyway?  (Besides being really hard to spell!)

Picture you have two lists:

  • Automobile make
  • Paint colours

Plainly the two are not related in any classic kind of term.  (How do you match red to Dodge Ram?)  But assume that we can paint any vehicle we have any colour of paint we have.  Of course, that’s done at the factory, so we need to make a product list that shows all of our possible combinations:  Dodge Ram – Red, Dodge Ram – Blue, Dodge Ram – Black, etc…

So basically, for each row on the Vehicle Make table, we need to assign every colour that exists in the Paint Colours table.

If you’d like to read more about this join/math, there is a good article on Wikipedia explain it.

Creating a Cartesian Product in Power Query

To illustrate this, I’m actually going to use a deck of cards, as shown in the Wikipedia article I referenced above.  So we have two tables, as shown below:

image

(That’s table “Cards” on the left and table “Suits” on the right)

And now, what we want to do is create a join so that we get each suit assigned to each card.  (We could do this the other way around too, or we could just sort it after.  Either way gets us to the same place in the end.)

Setting up the Tables

So the first step is to set up the tables.  To do this we simply pulled each table into Power Query and set up the query as a connection only query.

  • Click in the appropriate table
  • Create a New Query –> From Table
  • Right click the only column –> Change Type –> Text
  • Home –> Close & Load –> Close & Load To… –> Only Create Connection

We should now have two queries in our workbook that are pointers to the underlying data:

image

Now, let’s set up a new query that references the cards query:

  • Workbook Queries Pane –> right click Cards –> Reference
  • Rename the query to “52 Card Deck”

Awesome, we’ve now got a simple query all ready to go:

image

Creating the Cartesian Product

The trick now is to create the Cartesian Square.  You’d think this would take some weird Voodoo magic, but it’s actually SUPER simple… just different than normal.  We can’t fall back on the whole “merge tables” experience, as we’d need to pick matching values between the columns… and their aren’t any.  For this reason, none of the join types I discuss in either of these articles will work:

So how do we do it?  Like this:

  • Add Column –> Add Custom Column
  • In the formula area, enter “Suits” (with no quotes)

Did you see what we did there?  We asked Power Query to provide the Suits table for each row of our cards table.  The result is a table of tables which – when you click in the whitespace beside the Table keyword – you can see contains our suits:

image

The final step is to click that little expand icon on the top right of the Custom column (clear that default prefix checkbox as you do) to expand those records.  And the result is a completed table where each card has all four suits.

image

 

 

Not bad… no need to write any funky formulas, fill up or anything.  🙂

Sample Workbook

If you’d like to download the sample workbook, you can find it here.

(Now I just hope that when I want to find this article that I can remember how to spell Cartesian correctly!)

32 Bit Excel Memory Limit Increase!

So this is just huge, especially if you work with Power Pivot models and are stuck in 32 bit Excel… Microsoft has just released a 32 bit Excel Memory Limit increase for users of Excel 2016, effective build 16.0.6868.2060 (which is the current build for the Insiders program.)

image

UPDATE:  Effective June 7, 2016 (and build 15.0.4833.1000), there is now a patch available for Excel 2013 (both MSI and subscription versions).  More info here:  https://support.microsoft.com/en-us/kb/3115162

Why a 32 Bit Excel Memory Limit increase?

Users stuck on 32 bit Excel were limited to only using 2GB of RAM for their Excel/Power Pivot models, no matter how much memory was available on the PC.  The answer to this in the past was to install the 64 Bit version  of Excel, as that could address up to 8 TB  of memory (if you had it, of course.)

There has been a hack/patch available for a while, (see below,) and I spoke to a user at the PASS BA summit who told me that without that he simply couldn’t use Power Query at all.

How big an increase is it?

Before you start thinking that you’ll now get the same memory access as with 64 bit Excel, let’s disabuse you of that notion.  It’s better, but not parity.  How much you get actually depends on the bitness of your operating system.

  • 32 bit Windows:  up to 3 GB
  • 64 bit Windows:  up to 4 GB

I suspect the first is an operating system limit and that the second is probably more of an internal architecture decision.  The world needs to move to 64 bit, but this will help give companies (even more) time to make that move.

What about non-Power applications?

This change doesn’t just benefit Power Pivot and Power Query; it benefits anyone who has been running into memory constraints.  So if you’ve been running out of memory because you’ve been pushing huge data sets via VBA/SQL, you’ll love this too.

How about Excel 2010/2013?

Yeah, no.  Sorry.  This is part of the benefit of being current… Microsoft is building for the current version of Office.  Excel’s biggest competitor is previous versions of Excel, so by providing a fix like this to a prior versions they’d actually be giving you reasons NOT to upgrade.  You’re in business, and I’m sure you understand that – as much as this sucks for you right now – you’d probably make the same call.

Having said that, if you want to install “the patch” to get your access in previous version, Rob Collie has a link to it in point 3 of this article.

The “Official Word” from Microsoft…

You can find that be reading KB3160741 for more details.