While I was at the PASS BA conference in San Jose, CA last week, I got an email from a reader asking if Power Query could create a Cartesian Product join.
Now I’m an Excel guy, and I’d never heard this term before. Fortunately, I got the email while I was sitting around the table with a few of my geeky friends, many of whom came from the database world. This was cool as their answer wasn’t “What is that?”, it was “Why would you want to?” (As it turns out, there are some VERY good uses for this technique.)
Regardless, mine is not to wonder why, but rather to see if things can be done. And, as you might expect, we can absolutely create a Cartesian Product (or Cartesian Square)using Power Query. And actually, it’s REALLY easy when you know how.
Cartesian Product for the Excel person
So what the heck is a Cartesian Product anyway? (Besides being really hard to spell!)
Picture you have two lists:
- Automobile make
- Paint colours
Plainly the two are not related in any classic kind of term. (How do you match red to Dodge Ram?) But assume that we can paint any vehicle we have any colour of paint we have. Of course, that’s done at the factory, so we need to make a product list that shows all of our possible combinations: Dodge Ram – Red, Dodge Ram – Blue, Dodge Ram – Black, etc…
So basically, for each row on the Vehicle Make table, we need to assign every colour that exists in the Paint Colours table.
If you’d like to read more about this join/math, there is a good article on Wikipedia explain it.
Creating a Cartesian Product in Power Query
To illustrate this, I’m actually going to use a deck of cards, as shown in the Wikipedia article I referenced above. So we have two tables, as shown below:
(That’s table “Cards” on the left and table “Suits” on the right)
And now, what we want to do is create a join so that we get each suit assigned to each card. (We could do this the other way around too, or we could just sort it after. Either way gets us to the same place in the end.)
Setting up the Tables
So the first step is to set up the tables. To do this we simply pulled each table into Power Query and set up the query as a connection only query.
- Click in the appropriate table
- Create a New Query –> From Table
- Right click the only column –> Change Type –> Text
- Home –> Close & Load –> Close & Load To… –> Only Create Connection
We should now have two queries in our workbook that are pointers to the underlying data:
Now, let’s set up a new query that references the cards query:
- Workbook Queries Pane –> right click Cards –> Reference
- Rename the query to “52 Card Deck”
Awesome, we’ve now got a simple query all ready to go:
Creating the Cartesian Product
The trick now is to create the Cartesian Square. You’d think this would take some weird Voodoo magic, but it’s actually SUPER simple… just different than normal. We can’t fall back on the whole “merge tables” experience, as we’d need to pick matching values between the columns… and their aren’t any. For this reason, none of the join types I discuss in either of these articles will work:
- Merge Tables using Outer Joins in Power Query
- Merge Tables using Inner and Anti Joins in Power Query
So how do we do it? Like this:
- Add Column –> Add Custom Column
- In the formula area, enter “Suits” (with no quotes)
Did you see what we did there? We asked Power Query to provide the Suits table for each row of our cards table. The result is a table of tables which – when you click in the whitespace beside the Table keyword – you can see contains our suits:
The final step is to click that little expand icon on the top right of the Custom column (clear that default prefix checkbox as you do) to expand those records. And the result is a completed table where each card has all four suits.
Not bad… no need to write any funky formulas, fill up or anything. 🙂
If you’d like to download the sample workbook, you can find it here.
(Now I just hope that when I want to find this article that I can remember how to spell Cartesian correctly!)