Thanks Roger. In fact I seemed to have missed this point entirely when I was debating which is the better method over at one of the LinkedIn groups. I have had some dialog recently with Charles Kyd, who first made this point to me, and he provided me with timed results including this method.
Originally Posted by Roger Govier
I made some changes to the way the timer works and conducted tests on all of the methods, including using a helper column to house row index #'s. You should notice quicker times because I no longer include the time taken to write the results to the range (explained below).
The results (milliseconds, average over 10 passes):
- INDEX; using MATCH to yield row index #s in helper co; Binary Search; Variable Col Index: 17ms
- VLOOKUP; Binary Search; Constant Col Index: 20ms
- INDEX&MATCH; Binary Search; Constant Col Index: 21ms
- VLOOKUP; Binary Search; Variable Col Index: 25ms
- INDEX&MATCH; Binary Search; Variable Col Index: 27ms
- INDEX; using MATCH to yield row index #'s in helper col; Linear Search; Variable Col Index: 316ms
- VLOOKUP; Linear Search; Variable Col Index: 1941ms
- VLOOKUP; Linear Search; Constant Col Index: 1944ms
- INDEX&MATCH; Linear Search; Variable Col Index: 1985ms
- INDEX&MATCH; Linear Search; Constant Col Index: 1988ms
Example formula for each method:
- =INDEX(Data!B:B,Lookup!$B2,0) (B2: =MATCH($A2,Data!$A:$A,1))
- =INDEX(Data!B:B,Lookup!$B2,0) (B2: =MATCH($A2,Data!$A:$A,0))
- Binary search is considerably faster than using linear search.
- H/VLOOKUP is faster than using INDEX with a nested MATCH function, but the difference is very slight.
- INDEX is a very fast calculating function. If you need to return results from multiple fields of the lookup table, using the same lookup value, use MATCH in a helper column to yield the row index #, and then use INDEX to yield the result. The speed improvement is considerable.
- Use INDEX&MATCH to perform right-to-left and bottom-to-top lookups. E.g: =INDEX(A:A,MATCH("x",B:B,1))
- Use INDEX&MATCH if your lookup table is sorted in descending order according to the lookup value field. E.g: =INDEX(B:B,MATCH("x",A:A,-1))
- Having a slice of humble pie, I suppose there is no real danger in advising people to favour INDEX&MATCH over H/VLOOKUP entirely, although I would still favour educating users to use both, but to be aware of the differences.
Testing this for yourself
I have attached my test workbook for anybody that wants to have a play. My test workbook included 8MB of data alone so I have provided an empty workbook. Here's how to populate with data:
- Download and install Random Sample Data Generator addin (do a Google search for it)
- In the Data sheet, using the sample data addin, import 100K records.
- Note that in the Data sheet, column I (Order) is used to sort the data randomly. So in this column list indexes 1:100000. When the macro runs it will sort by column A ascending for binary search methods, otherwise it will sort by column I for linear search methods.
- Extract a list of 16k distinct social security numbers from Data Column A and list these in column A in the sheet called Lookup.
- Run the macro called 'Timer' ALT+F8
The Timer procedure
The formulas for the different methods are stored in an array for the different columns in the Lookup sheet. These arrays are actually stored in names M_1 : M_10. The procedure then calls these names, in turn, and writes the formula to the Lookup sheet table. The whole while this happens calculation is set to manual. The workbook is set to Force Full Calculation (available only since Excel 2007) meaning that all formulas are volatile. Calculation is only invoked for the Lookup sheet, and the procedure only times the actual calculation (hence why the times are so much smaller).
Declare Function timeGetTime Lib "winmm.dll" () As Long
Private mlngSTART As Long
Private Sub Start()
mlngSTART = timeGetTime()
Private Function Finish() As Long
Finish = timeGetTime() - mlngSTART
Public Sub Timer()
Dim lngCalc As XlCalculation
Dim lngIt As Long
Dim lngarrResults() As Long
Dim vararrMethods() As Variant
Dim vararrFormulas() As Variant
Dim lngItem As Long
vararrMethods = Array("M_1", "M_2", "M_3", "M_4", "M_5", "M_6", "M_7", "M_8", "M_9", "M_10")
ReDim lngarrResults(0 To 9)
lngCalc = .Calculation
.Calculation = xlManual
.ScreenUpdating = False
ThisWorkbook.ForceFullCalculation = True
For lngItem = LBound(vararrMethods) To UBound(vararrMethods)
If CStr(Application.VLookup(vararrMethods(lngItem), shtResults.Range("A:C"), 3, False)) Like "*Binary*" Then
.Range("A1:I100001").Sort Key1:=.Range("A1"), Order1:=xlAscending, Header:=xlYes
.Range("A1:I100001").Sort Key1:=.Range("I1"), Order1:=xlAscending, Header:=xlYes
For lngIt = 0 To 9
vararrFormulas = Evaluate(vararrMethods(lngItem))
.Formula = vararrFormulas
.Copy .Resize(1600, 8)
Application.CutCopyMode = False
lngarrResults(lngIt) = Finish
shtResults.Range("A:A").Find(What:=vararrMethods(lngItem), LookAt:=xlWhole).Offset(, 5).Resize(1, 10).Value = lngarrResults
.Calculation = lngCalc
.ScreenUpdating = True
ThisWorkbook.ForceFullCalculation = False