Page 2 of 3 FirstFirst 1 2 3 LastLast
Results 11 to 20 of 27

Thread: INDEX+MATCH versus VLOOKUP

  1. #11
    Acolyte Jon von der Heyden's Avatar
    Join Date
    Mar 2011
    Location
    Stellenbosch,South Africa
    Posts
    24
    Articles
    0


    Register for a FREE account, and/
    or Log in to avoid these ads!

    Quote Originally Posted by Roger Govier View Post
    I feel pretty sure that where you have a series of Vlookups going across a row, to pull in different data items, each with a different offset from the key, that will be slower than my preferred method {INDEX}.

    I carry out a single Match, to determine the row number required in the table (in a helper cell), and then use that value as the Index to the relevant column in the data table to pull back my result. Index itself, is exceptionally fast, and with only one Match function to be carried out it seems logical to me that this method will inherently be much faster than a whole series of Vlookups.
    Thanks Roger. In fact I seemed to have missed this point entirely when I was debating which is the better method over at one of the LinkedIn groups. I have had some dialog recently with Charles Kyd, who first made this point to me, and he provided me with timed results including this method.

    I made some changes to the way the timer works and conducted tests on all of the methods, including using a helper column to house row index #'s. You should notice quicker times because I no longer include the time taken to write the results to the range (explained below).

    The results (milliseconds, average over 10 passes):
    1. INDEX; using MATCH to yield row index #s in helper co; Binary Search; Variable Col Index: 17ms
    2. VLOOKUP; Binary Search; Constant Col Index: 20ms
    3. INDEX&MATCH; Binary Search; Constant Col Index: 21ms
    4. VLOOKUP; Binary Search; Variable Col Index: 25ms
    5. INDEX&MATCH; Binary Search; Variable Col Index: 27ms
    6. INDEX; using MATCH to yield row index #'s in helper col; Linear Search; Variable Col Index: 316ms
    7. VLOOKUP; Linear Search; Variable Col Index: 1941ms
    8. VLOOKUP; Linear Search; Constant Col Index: 1944ms
    9. INDEX&MATCH; Linear Search; Variable Col Index: 1985ms
    10. INDEX&MATCH; Linear Search; Constant Col Index: 1988ms


    Example formula for each method:
    1. =INDEX(Data!B:B,Lookup!$B2,0) (B2: =MATCH($A2,Data!$A:$A,1))
    2. =VLOOKUP($A2,Data!$A:B,2,TRUE)
    3. =INDEX(Data!$A:B,MATCH($A2,Data!$A:$A,1),2)
    4. =VLOOKUP($A2,Data!$A:B,COLUMNS(Data!$A:B),TRUE)
    5. =INDEX(Data!$A:$I,MATCH($A2,Data!$A:$A,1),MATCH(C$1,Data!$A$1:$I$1,0))
    6. =INDEX(Data!B:B,Lookup!$B2,0) (B2: =MATCH($A2,Data!$A:$A,0))
    7. =VLOOKUP($A2,Data!$A:B,COLUMNS(Data!$A:B),FALSE)
    8. =VLOOKUP($A2,Data!$A:B,2,FALSE)
    9. =INDEX(Data!$A:$I,MATCH($A2,Data!$A:$A,0),MATCH(C$1,Data!$A$1:$I$1,0))
    10. =INDEX(Data!$A:B,MATCH($A2,Data!$A:$A,0),2)


    Conclusions:
    1. Binary search is considerably faster than using linear search.
    2. H/VLOOKUP is faster than using INDEX with a nested MATCH function, but the difference is very slight.
    3. INDEX is a very fast calculating function. If you need to return results from multiple fields of the lookup table, using the same lookup value, use MATCH in a helper column to yield the row index #, and then use INDEX to yield the result. The speed improvement is considerable.


    Further advice:
    1. Use INDEX&MATCH to perform right-to-left and bottom-to-top lookups. E.g: =INDEX(A:A,MATCH("x",B:B,1))
    2. Use INDEX&MATCH if your lookup table is sorted in descending order according to the lookup value field. E.g: =INDEX(B:B,MATCH("x",A:A,-1))
    3. Having a slice of humble pie, I suppose there is no real danger in advising people to favour INDEX&MATCH over H/VLOOKUP entirely, although I would still favour educating users to use both, but to be aware of the differences.


    Testing this for yourself
    I have attached my test workbook for anybody that wants to have a play. My test workbook included 8MB of data alone so I have provided an empty workbook. Here's how to populate with data:
    1. Download and install Random Sample Data Generator addin (do a Google search for it)
    2. In the Data sheet, using the sample data addin, import 100K records.
    3. Note that in the Data sheet, column I (Order) is used to sort the data randomly. So in this column list indexes 1:100000. When the macro runs it will sort by column A ascending for binary search methods, otherwise it will sort by column I for linear search methods.
    4. Extract a list of 16k distinct social security numbers from Data Column A and list these in column A in the sheet called Lookup.
    5. Run the macro called 'Timer' ALT+F8


    The Timer procedure
    The formulas for the different methods are stored in an array for the different columns in the Lookup sheet. These arrays are actually stored in names M_1 : M_10. The procedure then calls these names, in turn, and writes the formula to the Lookup sheet table. The whole while this happens calculation is set to manual. The workbook is set to Force Full Calculation (available only since Excel 2007) meaning that all formulas are volatile. Calculation is only invoked for the Lookup sheet, and the procedure only times the actual calculation (hence why the times are so much smaller).

    Code:
    Option Explicit
    
    Declare Function timeGetTime Lib "winmm.dll" () As Long
    
    Private mlngSTART As Long
    
    Private Sub Start()
        mlngSTART = timeGetTime()
    End Sub
    
    Private Function Finish() As Long
        Finish = timeGetTime() - mlngSTART
    End Function
    
    Public Sub Timer()
        Dim lngCalc As XlCalculation
        Dim lngIt As Long
        Dim lngarrResults() As Long
        Dim vararrMethods() As Variant
        Dim vararrFormulas() As Variant
        Dim lngItem As Long
        
        vararrMethods = Array("M_1", "M_2", "M_3", "M_4", "M_5", "M_6", "M_7", "M_8", "M_9", "M_10")
        ReDim lngarrResults(0 To 9)
        
        With Application
            lngCalc = .Calculation
            .Calculation = xlManual
            .ScreenUpdating = False
        End With
        ThisWorkbook.ForceFullCalculation = True
        
        For lngItem = LBound(vararrMethods) To UBound(vararrMethods)
            If CStr(Application.VLookup(vararrMethods(lngItem), shtResults.Range("A:C"), 3, False)) Like "*Binary*" Then
                With shtData
                    .Range("A1:I100001").Sort Key1:=.Range("A1"), Order1:=xlAscending, Header:=xlYes
                End With
            Else
                With shtData
                    .Range("A1:I100001").Sort Key1:=.Range("I1"), Order1:=xlAscending, Header:=xlYes
                End With
            End If
            For lngIt = 0 To 9
                vararrFormulas = Evaluate(vararrMethods(lngItem))
                With shtLookup
                    With .Range("B2:I2")
                        .Formula = vararrFormulas
                        .Copy .Resize(1600, 8)
                        Application.CutCopyMode = False
                    End With
                    Start
                        .Calculate
                    lngarrResults(lngIt) = Finish
                    .Range("B2:I1601").Clear
                End With
            Next lngIt
            shtResults.Range("A:A").Find(What:=vararrMethods(lngItem), LookAt:=xlWhole).Offset(, 5).Resize(1, 10).Value = lngarrResults
        Next lngItem
        
        With Application
            .Calculation = lngCalc
            .ScreenUpdating = True
        End With
        ThisWorkbook.ForceFullCalculation = False
    End Sub
    Attached Files Attached Files
    Regards,
    Jon von der Heyden

  2. #12
    Acolyte Jon von der Heyden's Avatar
    Join Date
    Mar 2011
    Location
    Stellenbosch,South Africa
    Posts
    24
    Articles
    0
    Quote Originally Posted by Jon von der Heyden View Post
    Conclusions:
    1. Binary search is considerably faster than using linear search.
    2. H/VLOOKUP is faster than using INDEX with a nested MATCH function, but the difference is very slight.
    3. INDEX is a very fast calculating function. If you need to return results from multiple fields of the lookup table, using the same lookup value, use MATCH in a helper column to yield the row index #, and then use INDEX to yield the result. The speed improvement is considerable.
    I would like to add to my third conclusion. It seems that the profound improvement of using a helper column to house the row index # number only really exists for a linear search. The improvement using helper column for binary search is so very slight (3 ms over 16K lookups over 100K row lookup table). Thus:

    3. INDEX is a very fast calculating function. If you need to return results from multiple fields of the lookup table, using the same lookup value and using linear search method, use MATCH in a helper column to yield the row index #, and then use INDEX to yield the result. The speed improvement is considerable.
    Regards,
    Jon von der Heyden

  3. #13
    Hi Jon,

    You missed a very important detail in your analysis. Binairy search returns non-exact matches if the searched item is not in the lookup list, linear search returns only exact results and NA if not found.

    Therefor I suggest to include my example formula to the list, which combines both elements: a binary search, with an exact match.

    I tried to add my suggested index/match with IF to the list, but I can't seem to add the defined name with the formula since it is too long. How on earth did you manage to define names with a refersto of over 255 characters?
    Regards,
    Jan Karel Pieterse
    www.jkp-ads.com

  4. #14
    Acolyte Jon von der Heyden's Avatar
    Join Date
    Mar 2011
    Location
    Stellenbosch,South Africa
    Posts
    24
    Articles
    0
    Hi Jan

    Thank you, this is indeed my next step. I want to test the method that you described and I also want to test the different ways of handling errors, e.g:

    =IF(ISNA(lookup_formula),0,lookup_formula)
    versus
    =IFERROR(lookup_formula,0) *
    versus
    =LOOKUP(9.99E+307,CHOOSE({1;2},0,lookup_formula)) **

    * available only since Excel 2007
    ** exhibit for # results only, use a similar rendition for string results.


    Watch this space, I will post results when I finish the tests.

    I tried to add my suggested index/match with IF to the list, but I can't seem to add the defined name with the formula since it is too long. How on earth did you manage to define names with a refersto of over 255 characters?
    I created VBA arrays and then wrote the arrays out to the names. The 255 limit doesn't actually apply to the RefersTo property, rather it is a limitation when creating names in name manager. Have a look at Module1!FormulaArray. You need to enter your formula in Lookup!B2:I2 first, and then change the name in the sub-routine (currently M_10) and then run the routine.
    Regards,
    Jon von der Heyden

  5. #15
    I think the limitation is due to the refedit controls Name manager uses. The true limit is a lot larger indeed.
    My Name Manager uses sendkeys (!) to create new range names, hence the reason why I couldn't get the new range name into the workbook. It uses sendkeys to work around a couple of internationalisation bugs in Excel.
    Regards,
    Jan Karel Pieterse
    www.jkp-ads.com

  6. #16
    Jon,

    I just found this thread today. I find your research interesting. I've done similar in the past with similar results.

    There are a couple additional scenarios that I would encourage you to test as well.

    With the helper column in place for row index #'s, array-enter the INDEX formula down an entire relevant column. By this I mean to select all of the cells that will be used in the lookup output for one column, perhaps C2:C1001, and then array-enter the INDEX formula for that entire range in one go.

    Similarly, if you are interested in returning the columns in the same order as the source data, you can extend this range over the entire output range - for all columns - resulting in just one array-entered formula for the entire output.

    I have found these to be the quickest of all Excel lookups and it is a technique I teach in the Excel Hero Academy. I also alude to it in the Imposing INDEX portion of my new post, Excel Formulas.
    Last edited by Daniel Ferry; 2011-05-24 at 12:28 AM.

  7. #17
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    1,953
    Articles
    80
    Blog Entries
    14
    Hi Daniel, welcome to the forum. It's great to have you here!
    Ken Puls, FCPA, FCMA, MS MVP (Excel)

    Master your data with Power Query: Purchase your copy of my book M is for Data Monkey today!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

  8. #18
    Thanks, Ken.

    You have a fantastic forum implementation here. Looks really sharp, and it has impressive participation. I used Vanilla Forums for my academy, primarily because they had a convenient plugin for WordPress, which allowed for a single sign-on. But this looks really, really good. I'm tempted to add vBulletin to my site now!

  9. #19
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    1,953
    Articles
    80
    Blog Entries
    14
    I love vBulletin. I tend to start with open source (and free) where I can, but I've moderated a few forums that use vBulletin. Honestly, you just can't beat it. Easy to set up, very tweakable, looks good, performs well... I can honestly say it's one piece of software I'm happy I paid for.
    Ken Puls, FCPA, FCMA, MS MVP (Excel)

    Master your data with Power Query: Purchase your copy of my book M is for Data Monkey today!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

  10. #20

    Indirect??

    Hey, sorry quick question:

    Which is faster,

    1) a Linear Search Vlookup over a range, or
    2) a Binary Search Match + an Indirect reference over the range?

    Context/Example:

    I'm trying to test whether a date is in a list of dates. My first option is VLOOKUP(Date, Range, 0). My second option is Indirect("LISTNAME" & MATCH(Date, Range, 1)) = Date. I'm wondering which one's faster.

Page 2 of 3 FirstFirst 1 2 3 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •