Identify Matching names

Taisir

New member
Joined
May 9, 2014
Messages
12
Reaction score
0
Points
0
Column “B” contains the authors and their affiliations for each paper (462 papers). The authors(s) for each affiliated organization are included within brackets just before the name of the organization.

Column A contains the last names of all authors of the 462 papers listed in Column B
Please see attached file

The question:
I need to identify those authors in column “A” that are affiliated with “King Saud Univ” per column “B”. Maybe by indicating True in column "C" in the same row that match the author affiliated with "King Saud Univ"

Is there any formula to do such job?. I have large number of files with similar job.

Help will be greatly appreciated.
Taisir
 

Attachments

  • Excel Forum Question.xlsx
    89 KB · Views: 22
It looks like ever cell in column B contains the phrase "King Saud Univ".

You can use:

=Isnumber(search("King Saud Univ",B2))

copied down.

This will give you TRUE if found, FALSE if not found.
 
It looks like ever cell in column B contains the phrase "King Saud Univ".

You can use:

=Isnumber(search("King Saud Univ",B2))

copied down.

This will give you TRUE if found, FALSE if not found.


Dear Sir,
Many thanks
Indeed, every cell in column "B" contains "King Saud University"

But I need to figure out which authors in column "A" are affiliated with “King Saud Univ” per column “B”.
 
I don't understand. Please give some examples of expected results and why they are.
 
I don't understand. Please give some examples of expected results and why they are.

many thanks again,

If you examine any cell in column "B", you will find that a number of organizations participated in publishing the research paper including "King Saud Univ". For example in in B2; Naushad, Mu is affiliated with King Saud Univ. The rest of the authors worked in that paper are affiliated with other organizations.

Column A contain a comprehensive list of all outhors fro all organizations. I need to identify only those affiliated with King Saud Univ.

I hope that helps.


All the best and many thanks again. Taisir
 
A little difficult to do with formulas, since you have some sections with multiple names in the [ ] brackets, making the string length with a large length range.

The best I can come up with (someone can probably do it better with some VBA and regular expressions) is.

Using helper formulas...

In C2 enter:

=IF(SEARCH("King Saud Univ",B2)<=100,LEFT(B2,100),MID(B2,SEARCH("King Saud Univ",B2)-64,1000))

copied down. This extras a certain number of characters before the searched key phrase. It is hoped that all the names between [ ] are captured.

in D2:

=SUBSTITUTE(MID(C2,FIND("[",C2)+1,FIND("]",C2)-FIND("[",C2)-1),",","")

copied down. This extracts the names in [ ] before the key phrase only. If you see #VALUE error, it means it did not extract all the names, and you may have to manually copy the names from column B into column D at that cell.

then, finally, in E2:

=ISNUMBER(LOOKUP(10^10,SEARCH(" "&$A$2:$A$463&" "," "&D2&" ")))

copied down. This will let you know if any of the names within the [ ] brackets is found in column A.

Please note. The accuracy of columns C and D may not be perfect. Randomly check names in column D against column B to ensure the correct set of names was extracted. If not, manually correct column D.


EDIT: I have revised the formulas for better accuracy. (Still not perfect, but better).
Also, for even better accuracy, you should change the names in column A to exactly match the names found within column B, including initials etc.

Example why... in D11, the name
Khan, Moonis Ali is extracted. But the search will find Ali in A7 and say it is a match, since Ali is one of the "words" in column D. The Khan name is repeated multiple times in column A. Are they all the same person?
 
Last edited:
Just a quick reply to thank you. I will try your solution. All the best Taisir
 
Note, that I have edited my formulas and commented more.
 
many thanks again
your formulas were a great help. I had only 20+ #VALUEs which I plan to do manually.

My concerns, I am not sure what the magnitude of the number of #VALUE when I use it with larger files 8000 entry.


I hope somebody in the forum will help with a VBA.

Unfortunately names in Column "A" are given Last Name space First Initial where as the names listed in the column B are Last Name comma First Name. Only the last Name is common between the two Columns
 
In your file, you have only last name listed in column A. If it is as you say, then you can add another helper column to make column A names look like column B names. Then you can compare column D to that column for hopefully more accurate results.

If you have 8000 entries, the formulas will work very slowly. In that case, you're better bet would be the VBA. But again, the cleaner the original file is, the easier the VBA solution would be too.
 
Hello again,
I attached the original data of column "A". I thought, originally, that parsing the column and provide the Last Name would simplify the process. I guess I was wrong>

I need the "True" or "False" next to each name in order to count the records for each for the organization which is the ultimate objective for me from the exercise.

I thought a formula in "C2" that looks at "A2" and check whether "A2" is located within any brackets just before the phrase "King Saud Univ" in column "B", would be possible. I guess I am wrong her also.

My apologies for taking that much of your time.

All the best again Taisir
 

Attachments

  • all names.xlsx
    35 KB · Views: 9
If the brackets were after the search phrase it would be much easier. But being before is more difficult, because you need to figure out how far back you need to go without going too far as to get too much, or too close as to not get enough (unless something is going over my head, and I missed an obviously easy solution to that. It is Friday afterall :) ).
 
The new names list isn't really any easier, since now you are listing initials instead of full names, which is what much of column B consists of. They need to be more consistent for better accuracy.
 
hello again

somebody sent me this formula without any instruction and could not get any benefit from. Does it make any sense to you? or how to use it?

=INDEX($A$1:$A$1475,AGGREGATE(15,6,ROW($A$1:$A$1475)/(ISNUMBER(SEARCH("King Saud Univ",$B$1:$B$1475))=TRUE),ROW(A1)))
 
If you are getting help in another forum, our rules (and theirs too more than likely) state that you should be posting links to those other threads/forums so that we aren't stepping on each other toes.

That formula only works if you have Excel 2010 and later versions of Excel. I put the formula in and it returns what is in column A at same row.
 
DID NOT KNOW THAT RULE. INDEED, I DID POSTED AT AT http://www.excelforum.com AND GOT THE FORMULA I POSTED.
ANY ADVICE ON WHAT I SHOULD DO NEXT TO GET ADVICE ON THIS.

THANKS
 
Another piece of advice. Using ALL CAPS in forums is considered yelling. You went from normal script to ALL CAPS. To me it means you are aggravated.
 
Hello again,

All CAPS wasn't intentional.
Not aggravated whatsoever. On the contrary, I am grateful and happy with the couple of formulas you provided which were great and I worked with already with the small files.

Thanks for the advice, Taisir
 
hello again

somebody sent me this formula without any instruction and could not get any benefit from. Does it make any sense to you? or how to use it?

=INDEX($A$1:$A$1475,AGGREGATE(15,6,ROW($A$1:$A$1475)/(ISNUMBER(SEARCH("King Saud Univ",$B$1:$B$1475))=TRUE),ROW(A1)))

If you are wanting help understanding a formula, I would seek it in the forum it came from. Other than that take a look at the 'Evaluate formula' facility on the Formula ribbon. Iwould say that this formula is not robust. Copying it could produce errors.
 
Last edited:
Back
Top