INDEX+MATCH versus VLOOKUP

Jon von der Heyden

New member
Joined
Mar 25, 2011
Messages
24
Reaction score
0
Points
0
Location
Stellenbosch,South Africa
Website
www.exceldesignsolutions.com
Hi All

I have recently written training material on optimisation in Excel (formula specific) and I would like to validate this point.

Charles Williams over at decisionmodels.com has suggested that VLOOKUP is slightly faster than using a combination of INDEX and MATCH. I fully get the versatility offered by INDEX and MATCH, but what I am challenging here is sped comparison.

There have been recent discussions over at one of the linkedin groups and also recently over at MrExcel.com where a number of people have suggested that INDEX/MATCH combo is quicker. I have run various tests and all of my tests concur with Charles Williams assessment; i.e. VLOOKUP is faster. This makes sense to me because essentially MATCH and VLOOKUP work the same. Either we instruct Excel to perform a linear search, or a binary search. The INDEX method involves an extra function call and I assume this to be the reason why my tests reveal VLOOKUP as the quicker.

Does anyone here believe that INDEX and MATCH combo is quicker, and if so can you explain why you believe this and do you have test results to share? I know we are talking micro-optimisation here but I just want to make sure that the statements in my training material is correct.
 
Interesting... I've always been under the impression that Index/Match is faster... no evidence to support it though. I was always led to believe in huge workbooks with hundreds/thousands of calls, it would slow down drastically with VLOOKUP...

I do have a pretty massive model that uses a TON of index match calls, and it doesn't have any speed issues, but I never tried with VLOOKUP.
 
Hi Jon, have you been using simple VLOOKUPs? or have you tried cascading them where one is dependant on another, on a large worksheet i would imagine that is where INDEX, MATCH will shine in speed when manipulating a lot of data.
 
Test results

Hi All

I would like to share my test results and describe how this has been tested. The test workbook is so large so I cannot attach it here.

The idea here is to compare binary lookup method versus linear lookup method, and then also to compare VLOOKUP against INDEX and MATCH combo. Again I understand the versatility offered by INDEX and MATCH, but what I am trying to determine is whether or not there is reason to abandon H/VLOOKUP entirely in favour of INDEX and MATCH.

Emphasis here should not be on the actual time it took to calculate each method, rather use the results to compare one method to the next. Actual times will depend on your machine, OS and office version.

First, the results (milliseconds):
  1. VLOOKUP, Binary Search, Constant Column Index: 165 ms
  2. VLOOKUP, Binary Search, Variable Column Index: 194 ms
  3. INDEX/MATCH, Binary Search, Constant Column Index: 205 ms
  4. INDEX/MATCH, Binary Search, Variable Column Index: 232 ms
  5. VLOOKUP, Linear Search, Constant Column Index: 2094 ms
  6. VLOOKUP, Linear Search, Variable Column Index: 2125 ms
  7. INDEX/MATCH, Linear Search, Constant Col Index: 2134 ms
  8. INDEX/MATCH, Linear Search, Variable Column Index: 2163 ms

Let me clarify for any readers unaware of what is meant by binary/linear search. Linear search is when an exact match is sought and the last argument of V/HLOOKUP is FALSE (1) or for MATCH it is 0. These assume that data is sorted ascending. Binary search is the method Excel uses to lookup data when instructed to yield an approximate match. In these instances we pass TRUE to the last argument of h/VLOOKUP, or 1 for match (and we assume data is ordered ascending).

Further reading - Binary Search
Further reading - Linear Search

Example formula for each method:
  1. =VLOOKUP($A2,Data!$A:B,2,TRUE)
  2. =VLOOKUP($A2,Data!$A:B,COLUMNS(Data!$A:B),TRUE)
  3. =INDEX(Data!$A:B,MATCH($A2,Data!$A:$A,1),2)
  4. =INDEX(Data!$A:$I,MATCH($A2,Data!$A:$A,1),MATCH(B$1,Data!$A$1:$I$1,0))
  5. =VLOOKUP($A2,Data!$A:B,2,FALSE)
  6. =VLOOKUP($A2,Data!$A:B,COLUMNS(Data!$A:B),FALSE)
  7. =INDEX(Data!$A:B,MATCH($A2,Data!$A:$A,0),2)
  8. =INDEX(Data!$A:$I,MATCH($A2,Data!$A:$A,0),MATCH(B$1,Data!$A$1:$I$1,0))

Binary search method here is the clear winner, so much so that I would suggest users be mindful of this and always ensure that they apply this method when doing lookup's on large lookup tables, assuming they can order the data ascending.

The difference between H/VLOOKUP versus INDEX & MATCH combo is very slight, but H/VLOOKUP still ranks better. Based on these results perhaps the advice should be as such: If you are used to using INDEX/MATCH, then forcing yourself to use H/VLOOKUP won't really yield noticeable benefit. If, on the other hand, you are already comfortable and used to H/VLOOKUP, then ignore the various calls to switch to INDEX & MATCH combo because the method you are using is already the better of the two.

My Sample (3 worksheets):
  • Data - Columns A:H house random fake persons data. The key used, in the left-most column, is Social Security Number. I have exactly 100K records.
  • Lookup - Column A houses a list or 16K Social Security Numbers and B:H the remaining fields to be populated using lookup formulas.
  • Results - I have 8 rows of results (8 methods described) and 20 columns for the calculation time results. What is presented above is the average of the 20.

Testing Method:
I am retrieving the information for 7 fields for 16K social security numbers, and for 8 different methods. Thus I have 8 defined names for each (M_1 to M_8), each contains an array of 7 formula, 1 for each field to be retrieved. So for example, the name M_1 will house:

Code:
{"=VLOOKUP($A2,Data!$A:B,2,FALSE)","=VLOOKUP($A2,Data!$A:C,3,FALSE)","=VLOOKUP($A2,Data!$A:D,4,FALSE)","=VLOOKUP($A2,Data!$A:E,5,FALSE)","=VLOOKUP($A2,Data!$A:F,6,FALSE)","=VLOOKUP($A2,Data!$A:G,7,FALSE)","=VLOOKUP($A2,Data!$A:H,8,FALSE)"}

I then run a sub-routine to populate the fields for the 16K social securities and to time how long it takes to calculate the results. This is done 20 times and I track each result so that it can be aggregated to provide an average of the 20 at the end.

The routine has to make sure that the lookup table is ordered ascending by social security number for binary search, or order it randomly for linear search. I am using the timeGetTime API, which is said to be better than the alternatives because it has a 1 millisecond resolution. When applying the formula I actually use the evaluate method (because uses same calculation algorithm to yield results), and because I have mixed references in the formula when applied to the whole 7X16000 range the appropriate references move relatively. So in the code below there is no actual trigger for calculation. In fact this method won't care if you are switched to manual calculation.

The Timer Code:
Code:
Declare Function timeGetTime Lib "winmm.dll" () As Long

Private mlngSTART As Long

Private Sub Start()
    mlngSTART = timeGetTime()
End Sub

Private Function Finish() As Long
    Finish = timeGetTime() - mlngSTART
End Function

Public Sub Timer()
    Dim lngIt As Long
    Dim lngarrResults() As Long
    Dim vararrMethods() As Variant
    Dim lngItem As Long
    
    vararrMethods = Array("M_1", "M_2", "M_3", "M_4", "M_5", "M_6", "M_7", "M_8")
    ReDim lngarrResults(0 To 19)
    
    For lngItem = LBound(vararrMethods) To UBound(vararrMethods)
        If CStr(Application.VLookup(vararrMethods(lngItem), shtResults.Range("A:B"), 2, True)) Like "*Binary*" Then
            With shtData
                .Range("A1:I100001").Sort Key1:=.Range("A1"), Order1:=xlAscending, Header:=xlYes
            End With
        Else
            With shtData
                .Range("A1:I100001").Sort Key1:=.Range("I1"), Order1:=xlAscending, Header:=xlYes
            End With
        End If
        For lngIt = 0 To 19
            Start
            With shtLookup
                .Range("B2:H1601").Formula = Evaluate(vararrMethods(lngItem))
                lngarrResults(lngIt) = Finish
                .Range("B2:H1601").ClearContents
            End With
        Next lngIt
        shtResults.Range("A:A").Find(What:=vararrMethods(lngItem)).Offset(, 3).Resize(1, 20).Value = lngarrResults
    Next lngItem
End Sub

This is the best info I can come up with to compare binary versus linear search, and INDEX & MATCH combo versus H/VLOOKUP. I'm afraid I can't find any comprehensive tests that reveal INDEX&MATCH to be more efficient.

@Simon: I'm struggling to grasp why INDEX & MATCH combo is thought to be better when using lookups dependent on lookups? Both are flagged the same way and added to the dependency tree. Ultimately each will have to be evaluated for what they are, so using H/VLOOKUP would be quicker if dependent on another H/VLOOKUP rather than INDEX & MATCH combo.
 
Jon, well tested, and indeed is conclusive to advocate the use of H/Vlookup for binary search, looking at your test and data samples i believe you have chosen and proved the argument exhaustively when you consider the use of these would rarely span anywhere near your area of test data.

Prior to seeing this test i would have bet my wages on INDEX, MATCH being a lot faster in that size of data sample - i've gotta thank you for posting this, it's an eye opener :)
 
Thanks for your feedback Simon, it's nice to have your take on the subject. It makes me feel assured that my test method is sound. :)
To be honest it got me looking back over some replies i've given in the past relating to H/Vlookup and over the 15 or so that i looked at more than two thirds i have provided INDEX, MATCH as the solution regardless of the Op's request for a lookup as i believed that to be sound advice given their larger than average data set, however it seems i should provide BOTH solutions to the same end and allow the Op to choose but informing them of the performance issue around it, although i still think folk are crazy to have hundreds of vlookup in a worksheet :)
 
Hi Jon

Thank you for posting these interesting tests, which do show Vlookup to be faster than Index/Match.
However, for me the fact that Vlookup has to have it's key in the first column of the data, is often a hindrance, and that is an area where Index / Match comes into it's own as a much more versatile methodology, with only a very small performance hit when dealing with exact match.

Also, as you mentioned in an earlier posting, the extra call to Index, having performed the Match, may account for the slower speed.
I haven't carried out any exhaustive testing like yourself, but I fell pretty sure that where you have a series of Vlookups going across a row, to pull in different data items, each with a different offset from the key, that will be slower than my preferred method.

I carry out a single Match, to determine the row number required in the table (in a helper cell), and then use that value as the Index to the relevant column in the data table to pull back my result. Index itself, is exceptionally fast, and with only one Match function to be carried out it seems logical to me that this method will inherently be much faster than a whole series of Vlookups.

Coupled with the extra flexibility of Index / Match, it would take a lot to persuade me towards V or H Lookups.
 
Nice discussion guys.

Can you attach your test workbook Jon?

As far as I know, you can get INDEX + match to be (MUCH!) quicker but only when your lookup list is sorted. You can then do a non-exact match and check it against the value you we're looking up to determine whether an exact match was returned. Like this:

=IF(MATCH(A1,B2:B1000,1)=A1,INDEX(C2:C1000,MATCH(A1,B2:B1000,1)),NA())
 
I feel pretty sure that where you have a series of Vlookups going across a row, to pull in different data items, each with a different offset from the key, that will be slower than my preferred method {INDEX}.

I carry out a single Match, to determine the row number required in the table (in a helper cell), and then use that value as the Index to the relevant column in the data table to pull back my result. Index itself, is exceptionally fast, and with only one Match function to be carried out it seems logical to me that this method will inherently be much faster than a whole series of Vlookups.

Thanks Roger. In fact I seemed to have missed this point entirely when I was debating which is the better method over at one of the LinkedIn groups. I have had some dialog recently with Charles Kyd, who first made this point to me, and he provided me with timed results including this method.

I made some changes to the way the timer works and conducted tests on all of the methods, including using a helper column to house row index #'s. You should notice quicker times because I no longer include the time taken to write the results to the range (explained below).

The results (milliseconds, average over 10 passes):
  1. INDEX; using MATCH to yield row index #s in helper co; Binary Search; Variable Col Index: 17ms
  2. VLOOKUP; Binary Search; Constant Col Index: 20ms
  3. INDEX&MATCH; Binary Search; Constant Col Index: 21ms
  4. VLOOKUP; Binary Search; Variable Col Index: 25ms
  5. INDEX&MATCH; Binary Search; Variable Col Index: 27ms
  6. INDEX; using MATCH to yield row index #'s in helper col; Linear Search; Variable Col Index: 316ms
  7. VLOOKUP; Linear Search; Variable Col Index: 1941ms
  8. VLOOKUP; Linear Search; Constant Col Index: 1944ms
  9. INDEX&MATCH; Linear Search; Variable Col Index: 1985ms
  10. INDEX&MATCH; Linear Search; Constant Col Index: 1988ms

Example formula for each method:
  1. =INDEX(Data!B:B,Lookup!$B2,0) (B2: =MATCH($A2,Data!$A:$A,1))
  2. =VLOOKUP($A2,Data!$A:B,2,TRUE)
  3. =INDEX(Data!$A:B,MATCH($A2,Data!$A:$A,1),2)
  4. =VLOOKUP($A2,Data!$A:B,COLUMNS(Data!$A:B),TRUE)
  5. =INDEX(Data!$A:$I,MATCH($A2,Data!$A:$A,1),MATCH(C$1,Data!$A$1:$I$1,0))
  6. =INDEX(Data!B:B,Lookup!$B2,0) (B2: =MATCH($A2,Data!$A:$A,0))
  7. =VLOOKUP($A2,Data!$A:B,COLUMNS(Data!$A:B),FALSE)
  8. =VLOOKUP($A2,Data!$A:B,2,FALSE)
  9. =INDEX(Data!$A:$I,MATCH($A2,Data!$A:$A,0),MATCH(C$1,Data!$A$1:$I$1,0))
  10. =INDEX(Data!$A:B,MATCH($A2,Data!$A:$A,0),2)

Conclusions:
  1. Binary search is considerably faster than using linear search.
  2. H/VLOOKUP is faster than using INDEX with a nested MATCH function, but the difference is very slight.
  3. INDEX is a very fast calculating function. If you need to return results from multiple fields of the lookup table, using the same lookup value, use MATCH in a helper column to yield the row index #, and then use INDEX to yield the result. The speed improvement is considerable.

Further advice:
  1. Use INDEX&MATCH to perform right-to-left and bottom-to-top lookups. E.g: =INDEX(A:A,MATCH("x",B:B,1))
  2. Use INDEX&MATCH if your lookup table is sorted in descending order according to the lookup value field. E.g: =INDEX(B:B,MATCH("x",A:A,-1))
  3. Having a slice of humble pie, I suppose there is no real danger in advising people to favour INDEX&MATCH over H/VLOOKUP entirely, although I would still favour educating users to use both, but to be aware of the differences.

Testing this for yourself
I have attached my test workbook for anybody that wants to have a play. My test workbook included 8MB of data alone so I have provided an empty workbook. Here's how to populate with data:
  1. Download and install Random Sample Data Generator addin (do a Google search for it)
  2. In the Data sheet, using the sample data addin, import 100K records.
  3. Note that in the Data sheet, column I (Order) is used to sort the data randomly. So in this column list indexes 1:100000. When the macro runs it will sort by column A ascending for binary search methods, otherwise it will sort by column I for linear search methods.
  4. Extract a list of 16k distinct social security numbers from Data Column A and list these in column A in the sheet called Lookup.
  5. Run the macro called 'Timer' ALT+F8

The Timer procedure
The formulas for the different methods are stored in an array for the different columns in the Lookup sheet. These arrays are actually stored in names M_1 : M_10. The procedure then calls these names, in turn, and writes the formula to the Lookup sheet table. The whole while this happens calculation is set to manual. The workbook is set to Force Full Calculation (available only since Excel 2007) meaning that all formulas are volatile. Calculation is only invoked for the Lookup sheet, and the procedure only times the actual calculation (hence why the times are so much smaller).

Code:
Option Explicit

Declare Function timeGetTime Lib "winmm.dll" () As Long

Private mlngSTART As Long

Private Sub Start()
    mlngSTART = timeGetTime()
End Sub

Private Function Finish() As Long
    Finish = timeGetTime() - mlngSTART
End Function

Public Sub Timer()
    Dim lngCalc As XlCalculation
    Dim lngIt As Long
    Dim lngarrResults() As Long
    Dim vararrMethods() As Variant
    Dim vararrFormulas() As Variant
    Dim lngItem As Long
    
    vararrMethods = Array("M_1", "M_2", "M_3", "M_4", "M_5", "M_6", "M_7", "M_8", "M_9", "M_10")
    ReDim lngarrResults(0 To 9)
    
    With Application
        lngCalc = .Calculation
        .Calculation = xlManual
        .ScreenUpdating = False
    End With
    ThisWorkbook.ForceFullCalculation = True
    
    For lngItem = LBound(vararrMethods) To UBound(vararrMethods)
        If CStr(Application.VLookup(vararrMethods(lngItem), shtResults.Range("A:C"), 3, False)) Like "*Binary*" Then
            With shtData
                .Range("A1:I100001").Sort Key1:=.Range("A1"), Order1:=xlAscending, Header:=xlYes
            End With
        Else
            With shtData
                .Range("A1:I100001").Sort Key1:=.Range("I1"), Order1:=xlAscending, Header:=xlYes
            End With
        End If
        For lngIt = 0 To 9
            vararrFormulas = Evaluate(vararrMethods(lngItem))
            With shtLookup
                With .Range("B2:I2")
                    .Formula = vararrFormulas
                    .Copy .Resize(1600, 8)
                    Application.CutCopyMode = False
                End With
                Start
                    .Calculate
                lngarrResults(lngIt) = Finish
                .Range("B2:I1601").Clear
            End With
        Next lngIt
        shtResults.Range("A:A").Find(What:=vararrMethods(lngItem), LookAt:=xlWhole).Offset(, 5).Resize(1, 10).Value = lngarrResults
    Next lngItem
    
    With Application
        .Calculation = lngCalc
        .ScreenUpdating = True
    End With
    ThisWorkbook.ForceFullCalculation = False
End Sub
 

Attachments

  • Test Lookups (EMPTY).xlsm
    24 KB · Views: 111
Conclusions:
  1. Binary search is considerably faster than using linear search.
  2. H/VLOOKUP is faster than using INDEX with a nested MATCH function, but the difference is very slight.
  3. INDEX is a very fast calculating function. If you need to return results from multiple fields of the lookup table, using the same lookup value, use MATCH in a helper column to yield the row index #, and then use INDEX to yield the result. The speed improvement is considerable.

I would like to add to my third conclusion. It seems that the profound improvement of using a helper column to house the row index # number only really exists for a linear search. The improvement using helper column for binary search is so very slight (3 ms over 16K lookups over 100K row lookup table). Thus:

3. INDEX is a very fast calculating function. If you need to return results from multiple fields of the lookup table, using the same lookup value and using linear search method, use MATCH in a helper column to yield the row index #, and then use INDEX to yield the result. The speed improvement is considerable.
 
Hi Jon,

You missed a very important detail in your analysis. Binairy search returns non-exact matches if the searched item is not in the lookup list, linear search returns only exact results and NA if not found.

Therefor I suggest to include my example formula to the list, which combines both elements: a binary search, with an exact match.

I tried to add my suggested index/match with IF to the list, but I can't seem to add the defined name with the formula since it is too long. How on earth did you manage to define names with a refersto of over 255 characters?
 
Hi Jan

Thank you, this is indeed my next step. I want to test the method that you described and I also want to test the different ways of handling errors, e.g:

=IF(ISNA(lookup_formula),0,lookup_formula)
versus
=IFERROR(lookup_formula,0) *
versus
=LOOKUP(9.99E+307,CHOOSE({1;2},0,lookup_formula)) **

* available only since Excel 2007
** exhibit for # results only, use a similar rendition for string results.


Watch this space, I will post results when I finish the tests.

I tried to add my suggested index/match with IF to the list, but I can't seem to add the defined name with the formula since it is too long. How on earth did you manage to define names with a refersto of over 255 characters?

I created VBA arrays and then wrote the arrays out to the names. The 255 limit doesn't actually apply to the RefersTo property, rather it is a limitation when creating names in name manager. Have a look at Module1!FormulaArray. You need to enter your formula in Lookup!B2:I2 first, and then change the name in the sub-routine (currently M_10) and then run the routine.
 
I think the limitation is due to the refedit controls Name manager uses. The true limit is a lot larger indeed.
My Name Manager uses sendkeys (!) to create new range names, hence the reason why I couldn't get the new range name into the workbook. It uses sendkeys to work around a couple of internationalisation bugs in Excel.
 
Jon,

I just found this thread today. I find your research interesting. I've done similar in the past with similar results.

There are a couple additional scenarios that I would encourage you to test as well.

With the helper column in place for row index #'s, array-enter the INDEX formula down an entire relevant column. By this I mean to select all of the cells that will be used in the lookup output for one column, perhaps C2:C1001, and then array-enter the INDEX formula for that entire range in one go.

Similarly, if you are interested in returning the columns in the same order as the source data, you can extend this range over the entire output range - for all columns - resulting in just one array-entered formula for the entire output.

I have found these to be the quickest of all Excel lookups and it is a technique I teach in the Excel Hero Academy. I also alude to it in the Imposing INDEX portion of my new post, Excel Formulas.
 
Last edited:
Thanks, Ken.

You have a fantastic forum implementation here. Looks really sharp, and it has impressive participation. I used Vanilla Forums for my academy, primarily because they had a convenient plugin for WordPress, which allowed for a single sign-on. But this looks really, really good. I'm tempted to add vBulletin to my site now!
 
I love vBulletin. I tend to start with open source (and free) where I can, but I've moderated a few forums that use vBulletin. Honestly, you just can't beat it. Easy to set up, very tweakable, looks good, performs well... I can honestly say it's one piece of software I'm happy I paid for. :)
 
Indirect??

Hey, sorry quick question:

Which is faster,

1) a Linear Search Vlookup over a range, or
2) a Binary Search Match + an Indirect reference over the range?

Context/Example:

I'm trying to test whether a date is in a list of dates. My first option is VLOOKUP(Date, Range, 0). My second option is Indirect("LISTNAME" & MATCH(Date, Range, 1)) = Date. I'm wondering which one's faster.
 
Back
Top