Test results
Hi All
I would like to share my test results and describe how this has been tested. The test workbook is so large so I cannot attach it here.
The idea here is to compare binary lookup method versus linear lookup method, and then also to compare VLOOKUP against INDEX and MATCH combo. Again I understand the versatility offered by INDEX and MATCH, but what I am trying to determine is whether or not there is reason to abandon H/VLOOKUP entirely in favour of INDEX and MATCH.
Emphasis here should not be on the actual time it took to calculate each method, rather use the results to compare one method to the next. Actual times will depend on your machine, OS and office version.
First, the results (milliseconds):
- VLOOKUP, Binary Search, Constant Column Index: 165 ms
- VLOOKUP, Binary Search, Variable Column Index: 194 ms
- INDEX/MATCH, Binary Search, Constant Column Index: 205 ms
- INDEX/MATCH, Binary Search, Variable Column Index: 232 ms
- VLOOKUP, Linear Search, Constant Column Index: 2094 ms
- VLOOKUP, Linear Search, Variable Column Index: 2125 ms
- INDEX/MATCH, Linear Search, Constant Col Index: 2134 ms
- INDEX/MATCH, Linear Search, Variable Column Index: 2163 ms
Let me clarify for any readers unaware of what is meant by binary/linear search. Linear search is when an exact match is sought and the last argument of V/HLOOKUP is FALSE (1) or for MATCH it is 0. These assume that data is sorted ascending. Binary search is the method Excel uses to lookup data when instructed to yield an approximate match. In these instances we pass TRUE to the last argument of h/VLOOKUP, or 1 for match (and we assume data is ordered ascending).
Further reading -
Binary Search
Further reading -
Linear Search
Example formula for each method:
- =VLOOKUP($A2,Data!$A:B,2,TRUE)
- =VLOOKUP($A2,Data!$A:B,COLUMNS(Data!$A:B),TRUE)
- =INDEX(Data!$A:B,MATCH($A2,Data!$A:$A,1),2)
- =INDEX(Data!$A:$I,MATCH($A2,Data!$A:$A,1),MATCH(B$1,Data!$A$1:$I$1,0))
- =VLOOKUP($A2,Data!$A:B,2,FALSE)
- =VLOOKUP($A2,Data!$A:B,COLUMNS(Data!$A:B),FALSE)
- =INDEX(Data!$A:B,MATCH($A2,Data!$A:$A,0),2)
- =INDEX(Data!$A:$I,MATCH($A2,Data!$A:$A,0),MATCH(B$1,Data!$A$1:$I$1,0))
Binary search method here is the clear winner, so much so that I would suggest users be mindful of this and always ensure that they apply this method when doing lookup's on large lookup tables, assuming they can order the data ascending.
The difference between H/VLOOKUP versus INDEX & MATCH combo is very slight, but H/VLOOKUP still ranks better. Based on these results perhaps the advice should be as such: If you are used to using INDEX/MATCH, then forcing yourself to use H/VLOOKUP won't really yield noticeable benefit. If, on the other hand, you are already comfortable and used to H/VLOOKUP, then ignore the various calls to switch to INDEX & MATCH combo because the method you are using is already the better of the two.
My Sample (3 worksheets):
- Data - Columns A:H house random fake persons data. The key used, in the left-most column, is Social Security Number. I have exactly 100K records.
- Lookup - Column A houses a list or 16K Social Security Numbers and B:H the remaining fields to be populated using lookup formulas.
- Results - I have 8 rows of results (8 methods described) and 20 columns for the calculation time results. What is presented above is the average of the 20.
Testing Method:
I am retrieving the information for 7 fields for 16K social security numbers, and for 8 different methods. Thus I have 8 defined names for each (M_1 to M_8), each contains an array of 7 formula, 1 for each field to be retrieved. So for example, the name M_1 will house:
Code:
{"=VLOOKUP($A2,Data!$A:B,2,FALSE)","=VLOOKUP($A2,Data!$A:C,3,FALSE)","=VLOOKUP($A2,Data!$A:D,4,FALSE)","=VLOOKUP($A2,Data!$A:E,5,FALSE)","=VLOOKUP($A2,Data!$A:F,6,FALSE)","=VLOOKUP($A2,Data!$A:G,7,FALSE)","=VLOOKUP($A2,Data!$A:H,8,FALSE)"}
I then run a sub-routine to populate the fields for the 16K social securities and to time how long it takes to calculate the results. This is done 20 times and I track each result so that it can be aggregated to provide an average of the 20 at the end.
The routine has to make sure that the lookup table is ordered ascending by social security number for binary search, or order it randomly for linear search. I am using the timeGetTime API, which is said to be better than the alternatives because it has a 1 millisecond resolution. When applying the formula I actually use the evaluate method (because uses same calculation algorithm to yield results), and because I have mixed references in the formula when applied to the whole 7X16000 range the appropriate references move relatively. So in the code below there is no actual trigger for calculation. In fact this method won't care if you are switched to manual calculation.
The Timer Code:
Code:
Declare Function timeGetTime Lib "winmm.dll" () As Long
Private mlngSTART As Long
Private Sub Start()
mlngSTART = timeGetTime()
End Sub
Private Function Finish() As Long
Finish = timeGetTime() - mlngSTART
End Function
Public Sub Timer()
Dim lngIt As Long
Dim lngarrResults() As Long
Dim vararrMethods() As Variant
Dim lngItem As Long
vararrMethods = Array("M_1", "M_2", "M_3", "M_4", "M_5", "M_6", "M_7", "M_8")
ReDim lngarrResults(0 To 19)
For lngItem = LBound(vararrMethods) To UBound(vararrMethods)
If CStr(Application.VLookup(vararrMethods(lngItem), shtResults.Range("A:B"), 2, True)) Like "*Binary*" Then
With shtData
.Range("A1:I100001").Sort Key1:=.Range("A1"), Order1:=xlAscending, Header:=xlYes
End With
Else
With shtData
.Range("A1:I100001").Sort Key1:=.Range("I1"), Order1:=xlAscending, Header:=xlYes
End With
End If
For lngIt = 0 To 19
Start
With shtLookup
.Range("B2:H1601").Formula = Evaluate(vararrMethods(lngItem))
lngarrResults(lngIt) = Finish
.Range("B2:H1601").ClearContents
End With
Next lngIt
shtResults.Range("A:A").Find(What:=vararrMethods(lngItem)).Offset(, 3).Resize(1, 20).Value = lngarrResults
Next lngItem
End Sub
This is the best info I can come up with to compare binary versus linear search, and INDEX & MATCH combo versus H/VLOOKUP. I'm afraid I can't find any comprehensive tests that reveal INDEX&MATCH to be more efficient.
@Simon: I'm struggling to grasp why INDEX & MATCH combo is thought to be better when using lookups dependent on lookups? Both are flagged the same way and added to the dependency tree. Ultimately each will have to be evaluated for what they are, so using H/VLOOKUP would be quicker if dependent on another H/VLOOKUP rather than INDEX & MATCH combo.