Cartesian Product Joins (for the Excel person)

While I was at the PASS BA conference in San Jose, CA last week, I got an email from a reader asking if Power Query could create a Cartesian Product join.

Now I’m an Excel guy, and I’d never heard this term before.  Fortunately, I got the email while I was sitting around the table with a few of my geeky friends, many of whom came from the database world.  This was cool as their answer wasn’t “What is that?”, it was “Why would you want to?”  (As it turns out, there are some VERY good uses for this technique.)

Regardless, mine is not to wonder why, but rather to see if things can be done.  And, as you might expect, we can absolutely create a Cartesian Product (or Cartesian Square)using Power Query.  And actually, it’s REALLY easy when you know how.

Cartesian Product for the Excel person

So what the heck is a Cartesian Product anyway?  (Besides being really hard to spell!)

Picture you have two lists:

  • Automobile make
  • Paint colours

Plainly the two are not related in any classic kind of term.  (How do you match red to Dodge Ram?)  But assume that we can paint any vehicle we have any colour of paint we have.  Of course, that’s done at the factory, so we need to make a product list that shows all of our possible combinations:  Dodge Ram – Red, Dodge Ram – Blue, Dodge Ram – Black, etc…

So basically, for each row on the Vehicle Make table, we need to assign every colour that exists in the Paint Colours table.

If you’d like to read more about this join/math, there is a good article on Wikipedia explain it.

Creating a Cartesian Product in Power Query

To illustrate this, I’m actually going to use a deck of cards, as shown in the Wikipedia article I referenced above.  So we have two tables, as shown below:

image

(That’s table “Cards” on the left and table “Suits” on the right)

And now, what we want to do is create a join so that we get each suit assigned to each card.  (We could do this the other way around too, or we could just sort it after.  Either way gets us to the same place in the end.)

Setting up the Tables

So the first step is to set up the tables.  To do this we simply pulled each table into Power Query and set up the query as a connection only query.

  • Click in the appropriate table
  • Create a New Query –> From Table
  • Right click the only column –> Change Type –> Text
  • Home –> Close & Load –> Close & Load To… –> Only Create Connection

We should now have two queries in our workbook that are pointers to the underlying data:

image

Now, let’s set up a new query that references the cards query:

  • Workbook Queries Pane –> right click Cards –> Reference
  • Rename the query to “52 Card Deck”

Awesome, we’ve now got a simple query all ready to go:

image

Creating the Cartesian Product

The trick now is to create the Cartesian Square.  You’d think this would take some weird Voodoo magic, but it’s actually SUPER simple… just different than normal.  We can’t fall back on the whole “merge tables” experience, as we’d need to pick matching values between the columns… and their aren’t any.  For this reason, none of the join types I discuss in either of these articles will work:

So how do we do it?  Like this:

  • Add Column –> Add Custom Column
  • In the formula area, enter “Suits” (with no quotes)

Did you see what we did there?  We asked Power Query to provide the Suits table for each row of our cards table.  The result is a table of tables which – when you click in the whitespace beside the Table keyword – you can see contains our suits:

image

The final step is to click that little expand icon on the top right of the Custom column (clear that default prefix checkbox as you do) to expand those records.  And the result is a completed table where each card has all four suits.

image

Not bad… no need to write any funky formulas, fill up or anything.  🙂

Sample Workbook

If you’d like to download the sample workbook, you can find it here.

(Now I just hope that when I want to find this article that I can remember how to spell Cartesian correctly!)

20 thoughts on “Cartesian Product Joins (for the Excel person)

  1. Doh! That is sooo much easier than what I have been doing all this time: create a column of ones in both tables, and then merging on that column... this eliminates 3 or 4 steps!

  2. Also known as a cross join. These can be handy for setting up filter combinations independent of observation data.

  3. In Excel when you just pull columns from 2 different tables - the default is a "Cartesian" product (in the Query editor)

    This is true not just for Excel but virtually any database - Access/ SQL Server etc
    I am surprised that a Cartesian product is not a default option in PQ
    I mentioned this to the PQ team you replied that it has been "bubbled up for consideration in future product improvements" - no definitive dates as of now

  4. I've never been a big fan of the term "easier". If you know the data model then maybe, but I really think it comes down to what you're comfortable with. It's a different way, for sure, and is also easy. Won't dispute that, but I don't really think one trumps the other.

  5. Excel 2016 is (or will be) under the Get Data button on the Get & Transform group. It’s still Power Query. Just a quick note that everything you can do in 2013 we can still do in 2016, but you can ALSO do if you download Power Query. Don’t avoid doing that just for the sake of avoiding the download. It’s an amazing tool.

  6. Ken - here's a way to develop a 'Jaccard' fuzzy match using Cartesian product

    let
    Table1 = #table(type table [value = text],
    { {"spruce"}, {"cedar"}, {"birch"}, {"pine"}, {"poplar"}, {"aspen"},
    {"baobab"}, {"larch"}, {"willow"} }),

    Table2 = #table(type table [word = text],
    { {"sprce"}, {"cidar"}, {"pane"}, {"aspern"}, {"babab"} }),

    AddCol = Table.AddColumn(Table2, "Custom", each Table1),
    Expand = Table.ExpandTableColumn(AddCol, "Custom", {"value"}, {"value"}),

    FuzzyCalc = Table.AddColumn(Expand, "Prct", each
    2 * List.Count( List.Intersect(
    { Text.ToList([word]), Text.ToList([value]) } ) )
    / (List.Count(Text.ToList([word])) + List.Count(Text.ToList([value])))),

    SortRows = Table.Sort(FuzzyCalc,{{"word", Order.Ascending}, {"Prct", Order.Descending}}),
    AddIndex = Table.AddIndexColumn(SortRows, "Index", 1, 1),
    RemoveDupls = Table.Distinct(AddIndex, {"word"})
    in
    RemoveDupls

  7. This is already in the running for the coolest EXCEL power query function I've learned in 2018. The link to wikipedia article on cartesian products is just icing on the cake.

    After spending almost an hour on this problem and then finally finding your article, my brain is currently flooded in dopamine.
    /drool

  8. Thank you very much for sharing this. In my quest to overcome this, I part processed my data in PowerQuery and then went to Access. This will save a ton of time and actually enable automation.

  9. Hi Ken,
    Thanks for this great post.
    Can it be extended to more than 2 tables? For example, I have many tables (the number can vary every time) and get them with Excel.CurrentWorkbook function and I need to multiplay all of them. Is there a way this to be done?
    Thanks

  10. Hi Kristina,

    I can’t see why not. Even if you just run the pattern for the first two tables the do it again for the next, it should work fine.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.