Retrieve list of values from array which has duplicates

TTTT

New member
Joined
Jul 18, 2018
Messages
25
Reaction score
0
Points
0
Excel Version(s)
365
Hi All
Hope you are well, I've been stuck on this problem for a while now, I've been able to figure out half of it but not the rest. Basically I have a data sheet that lists categories, levels and titles, and I'm trying to get a formula that will populate a grid of results based on the category and level of the job titles. There are duplicates in this data set so I only want to retrieve 1 of the duplicates and all unique values.

Also I'm looking for the formula to stop when there are no more results, and if a new result is added, the formula should pick this up and populate on the grid.

Lastly something more advance, I'm interested to find out from all the experts out there if its possible to list these results one line after another but in 1 cell? In the Grid tab I've added 2 examples of what the final grid should look like

Below is a sample of my data

View attachment sample.xlsx

Thank you so much in advance for your help and taking the time to look at my query

Regards
TTTT
 
Hope this helps, just works for your sample file not your real workbook
 

Attachments

  • sample (1).xlsx
    11.6 KB · Views: 20
Hi Azumi,
That is pretty cool thank you, I tried replicating your formula in my master and it works well, however it does not produce 1 of the duplicates for some... (I realised I forgot to add some duplicate job titles in the mix in my sample data set), also it does not appear to list them 1 title per line as it has so neatly done in the sample you've worked on. I've had to use wrap text and that does not nearly list?
 
Hi Azumi,

I've attached a revised spreadsheet which include duplicates (in this one you'll see Administration Assistant appear twice under group Administration and Level 2, so in the grid I need to to only appear once

Your help or anyone else's help would be greatly appreciated
 

Attachments

  • sample_2.xlsx
    12 KB · Views: 15
Try adding this helper column to D2 of List sheet (this will help with finding unique entries):

=A2&"_"&COUNTIFS(A$2:A2,A2,B$2:B2,B2,C$2:C2,C2)

copied down

Then change Azumi's ARRAY* formula to:

=SUBSTITUTE(IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),1)),"")&IF(IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),2)),"")="","","| "&IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),2)),""))&IF(IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),3)),"")="","","| "&IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),3)),""))&IF(IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),4)),"")="","","| "&IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),4)),""))&IF(IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),5)),"")="","","| "&IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),5)),"")),"| ",CHAR(10))

* Array formula must be confirmed with CTRL+SHIFT+ENTER not just ENTER, then copied down and across

Format the result cells by going to Alignment tab and selecting to "Wrap text". You may have to autofit columns and/or rows afterward.
 
Try adding this helper column to D2 of List sheet (this will help with finding unique entries):

=A2&"_"&COUNTIFS(A$2:A2,A2,B$2:B2,B2,C$2:C2,C2)

copied down

Then change Azumi's ARRAY* formula to:

=SUBSTITUTE(IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),1)),"")&IF(IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),2)),"")="","","| "&IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),2)),""))&IF(IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),3)),"")="","","| "&IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),3)),""))&IF(IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),4)),"")="","","| "&IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),4)),""))&IF(IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),5)),"")="","","| "&IFERROR(INDEX(List!$C$2:$C$12,SMALL(IF((List!$B$2:$B$12=$A4)*(List!$D$2:$D$12=B$3&"_"&1),ROW(List!$C$2:$C$12)-ROW(List!$C$2)+1),5)),"")),"| ",CHAR(10))

* Array formula must be confirmed with CTRL+SHIFT+ENTER not just ENTER, then copied down and across

Format the result cells by going to Alignment tab and selecting to "Wrap text". You may have to autofit columns and/or rows afterward.

Hi NBVC
That is fantastic thank you! it works, another caveat sorry... is there a way to highlight duplicate position titles in the grid if they appear in different boxes?
  • eg. Position title: Administration Assistant - Appears under same category (eg. Administration) just different levels (eg. level 1, 2 and 3.3)
  • eg. Analyst, Systems - appears in the same level (eg. 3.2) just different category (eg. Administration and Development)
  • eg. Lead, Change - appears in different category and level (Administration and Development and level 3.2 and 4.2)
  • Sample attached - View attachment sample_3.xlsx
 
After selecting B4:C10 try this conditional format formula:

=SUMPRODUCT(((ISNUMBER(SEARCH(List!$C$2:$C$15,B4)))*(RIGHT(List!$D$2:$D$15)="1")))>1
 
After selecting B4:C10 try this conditional format formula:

=SUMPRODUCT(((ISNUMBER(SEARCH(List!$C$2:$C$15,B4)))*(RIGHT(List!$D$2:$D$15)="1")))>1

Hi NBVC
That is awesome thank you, one thing on that, it shades everything that had a duplicate, what about if for those examples we dont want to remove duplicate but it could be an error in category selected or level for that job title, so to correct the conditional formatting we either correct the category or level to match with the other is so in the grid it does not appear twice and therefore conditional formatting should not shade those boxes?

I tested your formula and it works great if we remove the duplicate/incorrect entry completely from the list then the boxes unshade? But what if we want to retain the duplicates and just correct the field that was incorrect? (the data set is pretty much a headcount listing, so despite the job titles often matching, theoretically so should the category and level for that job title, so we would want to identify the anomalies to correct them not remove them completely

I hope that helps explain
 
im not sure I understand.

You know you can only conditionally colour the whole cells, so as long as one of the items in the cell matches the conditions, the cell will get coloured.
 
Hi NBVC,

Thank you, currently it conditionally colours the whole cell when there is more than 1 value in the list, whereas I'm looking to have the cell conditionally coloured if the value appears on the grid twice (not on the list), so the position title can appear on the list view multiple times, but it if appears on the grid twice (eg. either under different categories or in different levels) then conditionally colour
 
Hi NBVC
I've attached a more substantive data set and adjusted your formula for the grid to this one, so I've noticed that your formula only goes down 5 values, is there a way to adjust the formula to take into consideration unlimited number of values?

The pivot tab might help explain the conditional formatting I'm trying to develop in the grid so I've shaded some position titles that appear under more than 1 level, so in grid view, I'd cells to conditionally colour when these position titles appear (to flag that the title appears in the grid more than once as illustrated in the pivot view off the data tab)

I hope that makes sense and the updated data set is easiler to work with

View attachment Sample_5.xlsx
 
Ok. Please see attached for proposal.

To adjust for larger number of returns, we will need to implement a vba function.

I like to use this one:

Code:
Function aconcat(a As Variant, Optional sep As String = "") As String
' Harlan Grove, Mar 2002
Dim y As Variant

If TypeOf a Is Range Then
For Each y In a.Cells
aconcat = aconcat & y.Value & sep
Next y
ElseIf IsArray(a) Then
For Each y In a
aconcat = aconcat & y & sep
Next y
Else
aconcat = aconcat & a & sep
End If

aconcat = Left(aconcat, Len(aconcat) - Len(sep))
End Function

Now add yet another helper Lookup column in the Data sheet after column A, with formula in B2 (this will be for the Conditional formatting):

Then apply this new formula to B4 of the Grid sheet, which uses the vba aconcat() function:

=SUBSTITUTE(SUBSTITUTE(TRIM(aconcat(IF((Data[[Lookup]:[Lookup]]=B$3&"_1")*(Data[[Level]:[Level]]=$A4),SUBSTITUTE(Data[[Job Title]:[Job Title]]," ","|"),"")," "))," ",CHAR(10)),"|"," ")

Confirmed with CTRL+SHIFT+ENTER and copied down and across the table.

Then for the conditional format I created to Dynamic Named Ranges... one called "Lookup2" and one called "Job_Title" referencing the corresponding columns in your Data sheet.

Then apply conditional format formula

=SUMPRODUCT(((ISNUMBER(SEARCH(Job_Title,B4)))*(RIGHT(Lookup2)="1")))>1

Note, due to the array formula covering large number of cells and larger references, you may experience some noticeable lag....
 

Attachments

  • Copy of Sample_5.xlsm
    149.4 KB · Views: 9
To adjust for larger number of returns, we will need to implement a vba function.
This one's quite light on resources.
Click the button on the Grid sheet of the attached.
I struggled with keeping all data visible, (Technical category, level 3.3 has 29 results (cell G8)); max row height meant that I had to reduce the font size, but Excel loves to change things for you so I had to code it in.
At the moment, the job titles highlighted in each cell are those where the job title exists at another level - categories are not taken into account at all. That can be changed.
There's a bit of hard-coding in there but it can be made more general.
I added a category (Executive) at Q3.
If this is path you want to go down we can continue…
 

Attachments

  • ExelGuru9201Sample_6.xlsm
    144.4 KB · Views: 10
Hi NBVC,
That is amazing thank you! I figured out what is happening with the conditional formatting. it is picking up all the duplicates that have the count figure of 1, so its counting 1, 11, 12, 13 etc, is there a way to adjust this so that it only counts the first number of "1" and not figures of 10?

I tried adding a "." at the end of the count formula which worked on the grid view but now the conditional formatting does not pick up the duplicate (eg. I changed say Manager, Project Development to appear in level 4 and 5, but the conditional formatting does not pick this up) - I tried to adjust the conditional formatting formula to reflect the "." added but no luck?
 
Omg p45cal, that is pretty amazing and is exactly what I am looking for, especially the duplication indicator of which position titles appear in the grid more than once! Thank you so much

NBVC - thank you also for your help! your tool is great as the grid updates on its own!
 
Thank You!

This one's quite light on resources.
Click the button on the Grid sheet of the attached.
I struggled with keeping all data visible, (Technical category, level 3.3 has 29 results (cell G8)); max row height meant that I had to reduce the font size, but Excel loves to change things for you so I had to code it in.
At the moment, the job titles highlighted in each cell are those where the job title exists at another level - categories are not taken into account at all. That can be changed.
There's a bit of hard-coding in there but it can be made more general.
I added a category (Executive) at Q3.
If this is path you want to go down we can continue…

Hi p45cal,
Thank you so much, sorry i did not see your post before responding to NBVC. This is the definitely what I am after. I tried to throw in some of my original data and noticed that it does not deal too well with the VBA coding :(
So I've made adjustments to your document to include additional columns of data that would be required in the final document, wondering if you could work your magic and adjust the coding? (I'm not savvy with VBA unfortunately to figure out what to change)

Reference to Cell G8 - would this still be an issue if position titles are added ontop of this increasing the # of results to over 29 values? or does your hard coding eliminate the issue?

Your help on this is truly appreciated
View attachment ExelGuru9201Sample_6.xlsm
 
In haste, 'cos it's my bedtime, try:
Code:
Sub blah()
Dim rngCellsToCheck  As Range
Set mydictionary = CreateObject("Scripting.Dictionary")    'this will contain dictionaries of each category-level job titles
Set JobTitleDic = CreateObject("Scripting.Dictionary")    'this will contain Job titles which exist at more than one level
Set CategoryHdrs = Sheets("Grid").Range("B3:Q3")    'hard coded
Set LevelHdrs = Sheets("Grid").Range("A4:A12")    'hard coded
CatLookup = CategoryHdrs.Value
LevelLookup = LevelHdrs.Value
For i = 1 To UBound(LevelLookup):  LevelLookup(i, 1) = CStr(LevelLookup(i, 1)): Next i
Set Tbl = Range("data").ListObject
'SourceData = Range("data").Value
SourceData = Tbl.DataBodyRange.Value
With Tbl.HeaderRowRange
  CatColm = Application.Match("Category", .Value, 0)
  LevelColm = Application.Match("Level", .Value, 0)
  JobTitleColm = Application.Match("Job Title", .Value, 0)
End With
For rw = 1 To UBound(SourceData)
  DicName = SourceData(rw, CatColm) & "|" & SourceData(rw, LevelColm)    '3 is cat,4 is level,5 is jobtitle.
  If Not mydictionary.exists(DicName) Then
    Set NewDic = CreateObject("Scripting.Dictionary")
    mydictionary.Add DicName, NewDic
  End If
  If Not mydictionary(DicName).exists(SourceData(rw, JobTitleColm)) Then mydictionary(DicName).Add SourceData(rw, JobTitleColm), SourceData(rw, JobTitleColm)
  
  'create list of job titles which exist at more than one level:
  If Not JobTitleDic.exists(SourceData(rw, JobTitleColm)) Then
    For ro = rw + 1 To UBound(SourceData)
      If SourceData(ro, JobTitleColm) = SourceData(rw, JobTitleColm) Then
        If SourceData(ro, LevelColm) <> SourceData(rw, LevelColm) Then
          JobTitleDic.Add SourceData(rw, JobTitleColm), SourceData(rw, JobTitleColm)
          Exit For
        End If
      End If
    Next ro
  End If
  
Next rw
Set rngDestn = Intersect(CategoryHdrs.EntireColumn, LevelHdrs.EntireRow)    'where the results will go
rngDestn.Clear
DestVals = rngDestn.Value
For Each DicName In mydictionary.keys
  x = Split(DicName, "|")
  Z = Join(mydictionary(DicName).keys, vbLf)
  DestVals(Application.Match(x(1), LevelLookup, 0), Application.Match(x(0), CatLookup, 0)) = Z
Next DicName

With rngDestn    'struggling a bit with font sizes, veretical alignment, both being changed by Excel without asking..
  .EntireColumn.ColumnWidth = 110    'much too wide to prevent word wrapping of single job titles
  .Value = DestVals
  .Font.Size = 9
  .EntireColumn.AutoFit
End With
'Highlight the multi-level job titles (this was more difficult that I thought it would be!):
Set rngCellsToCheck = rngDestn.SpecialCells(xlCellTypeConstants, 2)
If Not rngCellsToCheck Is Nothing Then
  For Each cll In rngCellsToCheck.Cells
    cvsplit = Split(cll.Value, vbLf)
    For Each jt In JobTitleDic.keys
      pop = Application.Match(jt, cvsplit, 0)
      If Not IsError(pop) Then
        posn = pop
        For j = 0 To pop - 2
          posn = posn + Len(cvsplit(j))
        Next j
        cll.Characters(Start:=posn, Length:=Len(jt)).Font.Color = -16776961
      End If
    Next jt
  Next cll
End If
End Sub

especially the duplication indicator of which position titles appear in the grid more than once!
Be aware that this is NOT just those titles appearing more than once; it's those appearing at more than one level. If there are ten similar job titles, but they're all at the same level, it won't be highlighted.
I will need to look more closely at your criteria at the end of your msg#6.

Reference to Cell G8 - would this still be an issue if position titles are added ontop of this increasing the # of results to over 29 values? or does your hard coding eliminate the issue?
Not necessarily. I've just reduced the font size in the code to a size that accommodates the 29 results. If there are 40 results, then I could reduec the size of the font to unreadable levels, but it would probably better to go looking for the full cells and reducing their font size until you can see that all results are visible.
If you're going to have a lot of such cells, then it might be worth considering putting more than one job title per line (at the moment, there's a linefeed between job titles (and it's this multiline property which prevents shrink-to-fit from working and forces word wrap in each cell)), but that also raises a readability problem in that you already use commas within a job title, so we'd have to use something else to identify them as different titles if there are more than one to a line.

Note also that it might be worth checking the job titles. For example there a job title Manager Asset, and another one Manager Assets, are they actually the same job title? I remember there being others but I've forgotten which they were.
 
Last edited:
In haste, 'cos it's my bedtime, try:
Code:
Sub blah()
Dim rngCellsToCheck  As Range <snip>


Hi p45cal
That is amazing thank you, the VBA coding works like a treat!

quote_icon.png
Originally Posted by TTTT
especially the duplication indicator of which position titles appear in the grid more than once!



Be aware that this is NOT just those titles appearing more than once; it's those appearing at more than one level. If there are ten similar job titles, but they're all at the same level, it won't be highlighted.
I will need to look more closely at your criteria at the end of your msg#6.

Thats what I we need, titles that appear in more than one level more and/or more than one category which I can see it does.

Not necessarily. I've just reduced the font size in the code to a size that accommodates the 29 results. If there are 40 results, then I could reduec the size of the font to unreadable levels, but it would probably better to go looking for the full cells and reducing their font size until you can see that all results are visible.
If you're going to have a lot of such cells, then it might be worth considering putting more than one job title per line (at the moment, there's a linefeed between job titles (and it's this multiline property which prevents shrink-to-fit from working and forces word wrap in each cell)), but that also raises a readability problem in that you already use commas within a job title, so we'd have to use something else to identify them as different titles if there are more than one to a line.
It shouldnt be an issue for now, but I'd imagine as the business grows there may be additional titles which will grow the list

Note also that it might be worth checking the job titles. For example there a job title Manager Asset, and another one Manager Assets, are they actually the same job title? I remember there being others but I've forgotten which they were.
I'll take note of that but at the moment I think we will treat it as two different positions
 
Last edited by a moderator:
Be aware that this is NOT just those titles appearing more than once; it's those appearing at more than one level. If there are ten similar job titles, but they're all at the same level, it won't be highlighted.
I will need to look more closely at your criteria at the end of your msg#6

Hi p45cal - I checked in with my colleague and the colour coding should be: if the job title appears across more than 1 category (so they can appear under more than 1 level, but would like it to be visibly identifiable if they across categories - would it be possible to achieve this via VBA? and if so how could we do this?
 
Change:
Code:
If SourceData(ro, LevelColm) <> SourceData(rw, LevelColm) Then

to:
Code:
If (SourceData(ro, LevelColm) <> SourceData(rw, LevelColm)) Or (SourceData(ro, CatColm) <> SourceData(rw, CatColm)) Then
to highlight Job Titles which exist on multiple Levels and/or multiple Categories.
 
Back
Top