I have data like this with 6 columns
LINES AB1 AB2 AB3 AB4 AB5 P1 Z/Z T/T -/- T/T T/T P2 A/A A/A G/G Z/Z T/T 1 G/G T/T G/G T/T G/G 2 T/T A/A C/C C/C T/T 3 T/T G/G T/T G/G T/T 4 A/A C/C A/A A/A A/A 5 A/A A/A T/T T/T A/A P1 Z/Z T/T -/- T/T T/T P2 A/A A/A G/G Z/Z T/T 1 G/G T/T G/G T/T G/G 2 T/T A/A C/C C/C T/T 3 T/T G/G T/T G/G T/T 4 A/A C/C A/A A/A A/A 5 A/A A/A T/T T/T A/A
- First I want to find how many columns (from AB1 to AB5) are different for P1 and P2 ,
Eq means: Both P1 and P2 should contain same letters (alleles) or if any one of P1 or P2 contains Z/Z or -/- I should consider them as eq only. Now I am using IF condition for this
- I will compare lines column values from 1 with P2 across all the columns (from AB1 to AB5) in horizontal way and continue for remaining lines from 2 to 5. if they match I would like to give 1 else 0 and I would like to continue this till my programme encounters second set of P1 and P2. Presently I am doing this with this formula =if(D4=D$3,1,0).
- I will make sum for lines 1 to 5 across all the columns from columns AB1 to AB5, but I will include only columns showing different for P1 and P2 in my sum count. Now I am working on this with sumif formula.
- I will calculate percentage of matching lines 1 to 5 with P2 by dividing sum came from SUMIF with number of different markers between P1 and P2.
- I want to repeat this for remaining set of P1 and P2.
I am expecting like this
LINES XY1 XY2 XY3 XY4 XY5 P1 EQ NE EQ EQ EQ SUM % P2 1 1 0 0 1 0 0 0 0 2 0 1 0 0 1 1 100 3 0 0 0 0 1 0 0 4 1 0 0 0 0 0 0 5 1 1 0 0 0 1 100
Like this I have data in more than 5000 rows and at present I am doing in excel 2010 with different formulas but it is taking lot of my energy. If anyone provide VBA code for this task I will be more happy and it would be appreciated. Please see my attached file for better understanding, may be i did not explained my problem very well to you.
Regards,
Genetist.
Bookmarks