Howdy. I was answering a thread here about basket analysis, and come up with one approach that essentially unwinds a cross-tab into a long 1D array, so that a pivot table can be used to analyze frequency of twin 'bundles' of goods in the basket in order to find the most popular. The problem with this is that because there were 7 items in a 'basket' then for every 1 row in the source list, my workings had to be extended 49 rows.

Since then I've been playing around with another approach that doesn't require unwinding, but instead the working sit in the same line as the row that it processes. To work out which two items are most frequently purchased together it uses the MODE function. Because the MODE function can't be used directly on Text, I assign each item in the basket (which is a string) to an integer, then using an algorithm to encode all the possible combinations of any two items in the basket (represented by two unique integers) into one unique integer in a way that you can later retrieve the two items from that integer.

This is done for each row (basket) in the list. Because we are now dealing with numbers instead of the source strings, now I can use the MODE to work out which encoded integer (which again represents two items) appears most frequently. Then I un-encode that integer to work out what the constituent items in the bundle were.

My encoding algorithm only handles two numbers, and the formula to encode them to an integer is this:
=0.5*((x+y)^2+3*x+y)
...where x is an integer representing item 1 and y is an integer representing item 2. There's an explanation of how this works - and how to un-encode the original items - at http://mathforum.org/library/drmath/view/56036.html

I'm curious to see how this can be extended to analysis of 3 or more items from a basket. Anyone have any experience of implementing encoding/unencoding of n items to/from an integer in Excel?

I've found some good links on this stuff (see the Other Possible Methods tab in the attached spreadsheet) but before I burn through hours and hours trying to work out how to do this in Excel I thought I'd ask the hive mind.

Note that this whole thing is a thought experiment only, and I'm only doing it with formulas to see if it can be done. So it's a formula challenge for the sake of it.

Cheers

Jeff

Basket analysis 4.xlsb