Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 27

Thread: INDEX+MATCH versus VLOOKUP

  1. #1
    Acolyte Jon von der Heyden's Avatar
    Join Date
    Mar 2011
    Location
    Stellenbosch,South Africa
    Posts
    24
    Articles
    0

    INDEX+MATCH versus VLOOKUP



    Register for a FREE account, and/
    or Log in to avoid these ads!

    Hi All

    I have recently written training material on optimisation in Excel (formula specific) and I would like to validate this point.

    Charles Williams over at decisionmodels.com has suggested that VLOOKUP is slightly faster than using a combination of INDEX and MATCH. I fully get the versatility offered by INDEX and MATCH, but what I am challenging here is sped comparison.

    There have been recent discussions over at one of the linkedin groups and also recently over at MrExcel.com where a number of people have suggested that INDEX/MATCH combo is quicker. I have run various tests and all of my tests concur with Charles Williams assessment; i.e. VLOOKUP is faster. This makes sense to me because essentially MATCH and VLOOKUP work the same. Either we instruct Excel to perform a linear search, or a binary search. The INDEX method involves an extra function call and I assume this to be the reason why my tests reveal VLOOKUP as the quicker.

    Does anyone here believe that INDEX and MATCH combo is quicker, and if so can you explain why you believe this and do you have test results to share? I know we are talking micro-optimisation here but I just want to make sure that the statements in my training material is correct.
    Regards,
    Jon von der Heyden

  2. #2
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    2,091
    Articles
    79
    Blog Entries
    14
    Interesting... I've always been under the impression that Index/Match is faster... no evidence to support it though. I was always led to believe in huge workbooks with hundreds/thousands of calls, it would slow down drastically with VLOOKUP...

    I do have a pretty massive model that uses a TON of index match calls, and it doesn't have any speed issues, but I never tried with VLOOKUP.
    Ken Puls, FCPA, FCMA, MS MVP (Excel)

    Master your data with Power Query: Purchase your copy of my book M is for Data Monkey today!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

  3. #3
    Acolyte Jon von der Heyden's Avatar
    Join Date
    Mar 2011
    Location
    Stellenbosch,South Africa
    Posts
    24
    Articles
    0
    Thanks Ken. I shall dandy up my test workbook and post it here. Perhaps you or someone else can reveal if there are any flaws in the way I test it?
    Regards,
    Jon von der Heyden

  4. #4
    Administrator Simon Lloyd's Avatar
    Join Date
    Apr 2011
    Location
    Manchester, England
    Posts
    324
    Articles
    0
    Hi Jon, have you been using simple VLOOKUPs? or have you tried cascading them where one is dependant on another, on a large worksheet i would imagine that is where INDEX, MATCH will shine in speed when manipulating a lot of data.
    Kind regards,
    Simon Lloyd
    Microsoft Office Discussion

  5. #5
    Acolyte Jon von der Heyden's Avatar
    Join Date
    Mar 2011
    Location
    Stellenbosch,South Africa
    Posts
    24
    Articles
    0

    Test results

    Hi All

    I would like to share my test results and describe how this has been tested. The test workbook is so large so I cannot attach it here.

    The idea here is to compare binary lookup method versus linear lookup method, and then also to compare VLOOKUP against INDEX and MATCH combo. Again I understand the versatility offered by INDEX and MATCH, but what I am trying to determine is whether or not there is reason to abandon H/VLOOKUP entirely in favour of INDEX and MATCH.

    Emphasis here should not be on the actual time it took to calculate each method, rather use the results to compare one method to the next. Actual times will depend on your machine, OS and office version.

    First, the results (milliseconds):
    1. VLOOKUP, Binary Search, Constant Column Index: 165 ms
    2. VLOOKUP, Binary Search, Variable Column Index: 194 ms
    3. INDEX/MATCH, Binary Search, Constant Column Index: 205 ms
    4. INDEX/MATCH, Binary Search, Variable Column Index: 232 ms
    5. VLOOKUP, Linear Search, Constant Column Index: 2094 ms
    6. VLOOKUP, Linear Search, Variable Column Index: 2125 ms
    7. INDEX/MATCH, Linear Search, Constant Col Index: 2134 ms
    8. INDEX/MATCH, Linear Search, Variable Column Index: 2163 ms


    Let me clarify for any readers unaware of what is meant by binary/linear search. Linear search is when an exact match is sought and the last argument of V/HLOOKUP is FALSE (1) or for MATCH it is 0. These assume that data is sorted ascending. Binary search is the method Excel uses to lookup data when instructed to yield an approximate match. In these instances we pass TRUE to the last argument of h/VLOOKUP, or 1 for match (and we assume data is ordered ascending).

    Further reading - Binary Search
    Further reading - Linear Search

    Example formula for each method:
    1. =VLOOKUP($A2,Data!$A:B,2,TRUE)
    2. =VLOOKUP($A2,Data!$A:B,COLUMNS(Data!$A:B),TRUE)
    3. =INDEX(Data!$A:B,MATCH($A2,Data!$A:$A,1),2)
    4. =INDEX(Data!$A:$I,MATCH($A2,Data!$A:$A,1),MATCH(B$1,Data!$A$1:$I$1,0))
    5. =VLOOKUP($A2,Data!$A:B,2,FALSE)
    6. =VLOOKUP($A2,Data!$A:B,COLUMNS(Data!$A:B),FALSE)
    7. =INDEX(Data!$A:B,MATCH($A2,Data!$A:$A,0),2)
    8. =INDEX(Data!$A:$I,MATCH($A2,Data!$A:$A,0),MATCH(B$1,Data!$A$1:$I$1,0))


    Binary search method here is the clear winner, so much so that I would suggest users be mindful of this and always ensure that they apply this method when doing lookup's on large lookup tables, assuming they can order the data ascending.

    The difference between H/VLOOKUP versus INDEX & MATCH combo is very slight, but H/VLOOKUP still ranks better. Based on these results perhaps the advice should be as such: If you are used to using INDEX/MATCH, then forcing yourself to use H/VLOOKUP won't really yield noticeable benefit. If, on the other hand, you are already comfortable and used to H/VLOOKUP, then ignore the various calls to switch to INDEX & MATCH combo because the method you are using is already the better of the two.

    My Sample (3 worksheets):
    • Data - Columns A:H house random fake persons data. The key used, in the left-most column, is Social Security Number. I have exactly 100K records.
    • Lookup - Column A houses a list or 16K Social Security Numbers and B:H the remaining fields to be populated using lookup formulas.
    • Results - I have 8 rows of results (8 methods described) and 20 columns for the calculation time results. What is presented above is the average of the 20.


    Testing Method:
    I am retrieving the information for 7 fields for 16K social security numbers, and for 8 different methods. Thus I have 8 defined names for each (M_1 to M_8), each contains an array of 7 formula, 1 for each field to be retrieved. So for example, the name M_1 will house:

    PHP Code:
    {"=VLOOKUP($A2,Data!$A:B,2,FALSE)","=VLOOKUP($A2,Data!$A:C,3,FALSE)","=VLOOKUP($A2,Data!$A:D,4,FALSE)","=VLOOKUP($A2,Data!$A:E,5,FALSE)","=VLOOKUP($A2,Data!$A:F,6,FALSE)","=VLOOKUP($A2,Data!$A:G,7,FALSE)","=VLOOKUP($A2,Data!$A:H,8,FALSE)"
    I then run a sub-routine to populate the fields for the 16K social securities and to time how long it takes to calculate the results. This is done 20 times and I track each result so that it can be aggregated to provide an average of the 20 at the end.

    The routine has to make sure that the lookup table is ordered ascending by social security number for binary search, or order it randomly for linear search. I am using the timeGetTime API, which is said to be better than the alternatives because it has a 1 millisecond resolution. When applying the formula I actually use the evaluate method (because uses same calculation algorithm to yield results), and because I have mixed references in the formula when applied to the whole 7X16000 range the appropriate references move relatively. So in the code below there is no actual trigger for calculation. In fact this method won't care if you are switched to manual calculation.

    The Timer Code:
    Code:
    Declare Function timeGetTime Lib "winmm.dll" () As Long
    
    Private mlngSTART As Long
    
    Private Sub Start()
        mlngSTART = timeGetTime()
    End Sub
    
    Private Function Finish() As Long
        Finish = timeGetTime() - mlngSTART
    End Function
    
    Public Sub Timer()
        Dim lngIt As Long
        Dim lngarrResults() As Long
        Dim vararrMethods() As Variant
        Dim lngItem As Long
        
        vararrMethods = Array("M_1", "M_2", "M_3", "M_4", "M_5", "M_6", "M_7", "M_8")
        ReDim lngarrResults(0 To 19)
        
        For lngItem = LBound(vararrMethods) To UBound(vararrMethods)
            If CStr(Application.VLookup(vararrMethods(lngItem), shtResults.Range("A:B"), 2, True)) Like "*Binary*" Then
                With shtData
                    .Range("A1:I100001").Sort Key1:=.Range("A1"), Order1:=xlAscending, Header:=xlYes
                End With
            Else
                With shtData
                    .Range("A1:I100001").Sort Key1:=.Range("I1"), Order1:=xlAscending, Header:=xlYes
                End With
            End If
            For lngIt = 0 To 19
                Start
                With shtLookup
                    .Range("B2:H1601").Formula = Evaluate(vararrMethods(lngItem))
                    lngarrResults(lngIt) = Finish
                    .Range("B2:H1601").ClearContents
                End With
            Next lngIt
            shtResults.Range("A:A").Find(What:=vararrMethods(lngItem)).Offset(, 3).Resize(1, 20).Value = lngarrResults
        Next lngItem
    End Sub
    This is the best info I can come up with to compare binary versus linear search, and INDEX & MATCH combo versus H/VLOOKUP. I'm afraid I can't find any comprehensive tests that reveal INDEX&MATCH to be more efficient.

    @Simon: I'm struggling to grasp why INDEX & MATCH combo is thought to be better when using lookups dependent on lookups? Both are flagged the same way and added to the dependency tree. Ultimately each will have to be evaluated for what they are, so using H/VLOOKUP would be quicker if dependent on another H/VLOOKUP rather than INDEX & MATCH combo.
    Regards,
    Jon von der Heyden

  6. #6
    Administrator Simon Lloyd's Avatar
    Join Date
    Apr 2011
    Location
    Manchester, England
    Posts
    324
    Articles
    0
    Jon, well tested, and indeed is conclusive to advocate the use of H/Vlookup for binary search, looking at your test and data samples i believe you have chosen and proved the argument exhaustively when you consider the use of these would rarely span anywhere near your area of test data.

    Prior to seeing this test i would have bet my wages on INDEX, MATCH being a lot faster in that size of data sample - i've gotta thank you for posting this, it's an eye opener
    Kind regards,
    Simon Lloyd
    Microsoft Office Discussion

  7. #7
    Acolyte Jon von der Heyden's Avatar
    Join Date
    Mar 2011
    Location
    Stellenbosch,South Africa
    Posts
    24
    Articles
    0
    Thanks for your feedback Simon, it's nice to have your take on the subject. It makes me feel assured that my test method is sound.
    Regards,
    Jon von der Heyden

  8. #8
    Administrator Simon Lloyd's Avatar
    Join Date
    Apr 2011
    Location
    Manchester, England
    Posts
    324
    Articles
    0
    Quote Originally Posted by Jon von der Heyden View Post
    Thanks for your feedback Simon, it's nice to have your take on the subject. It makes me feel assured that my test method is sound.
    To be honest it got me looking back over some replies i've given in the past relating to H/Vlookup and over the 15 or so that i looked at more than two thirds i have provided INDEX, MATCH as the solution regardless of the Op's request for a lookup as i believed that to be sound advice given their larger than average data set, however it seems i should provide BOTH solutions to the same end and allow the Op to choose but informing them of the performance issue around it, although i still think folk are crazy to have hundreds of vlookup in a worksheet
    Kind regards,
    Simon Lloyd
    Microsoft Office Discussion

  9. #9
    Conjurer Roger Govier's Avatar
    Join Date
    Mar 2011
    Location
    Located near Abergavenny, South Wales, UK
    Posts
    113
    Articles
    0
    Hi Jon

    Thank you for posting these interesting tests, which do show Vlookup to be faster than Index/Match.
    However, for me the fact that Vlookup has to have it's key in the first column of the data, is often a hindrance, and that is an area where Index / Match comes into it's own as a much more versatile methodology, with only a very small performance hit when dealing with exact match.

    Also, as you mentioned in an earlier posting, the extra call to Index, having performed the Match, may account for the slower speed.
    I haven't carried out any exhaustive testing like yourself, but I fell pretty sure that where you have a series of Vlookups going across a row, to pull in different data items, each with a different offset from the key, that will be slower than my preferred method.

    I carry out a single Match, to determine the row number required in the table (in a helper cell), and then use that value as the Index to the relevant column in the data table to pull back my result. Index itself, is exceptionally fast, and with only one Match function to be carried out it seems logical to me that this method will inherently be much faster than a whole series of Vlookups.

    Coupled with the extra flexibility of Index / Match, it would take a lot to persuade me towards V or H Lookups.
    Regards

    Roger Govier
    Microsoft Excel MVP
    http://www.technology4u.co.uk

  10. #10
    Nice discussion guys.

    Can you attach your test workbook Jon?

    As far as I know, you can get INDEX + match to be (MUCH!) quicker but only when your lookup list is sorted. You can then do a non-exact match and check it against the value you we're looking up to determine whether an exact match was returned. Like this:

    =IF(MATCH(A1,B2:B1000,1)=A1,INDEX(C2:C1000,MATCH(A1,B2:B1000,1)),NA())
    Regards,
    Jan Karel Pieterse
    www.jkp-ads.com

Page 1 of 3 1 2 3 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •