Stratified random sampling

mrinal222

New member
Joined
Sep 12, 2017
Messages
1
Reaction score
0
Points
0
Hi,

I have to create an excel file where I have a list of IDs in column A and a category column is on its right. Through a macro, I have to create 3 new sheets in which I have to get a 10% random sample for each category from this sheet. I have a vb code for generating a sample of 10% randomly but it doesn't allow the 'stratification'. Quick help would be appreciated.View attachment Expected Result.xlsxView attachment SourceFile.xlsx
 
Welcome to the forum!

Most forums have a policy against solving homework problems. Is this homework or work?

While files are good ways to help others help you, you should include the vba code that you have so far.

To do that sort of thing, an autofilter method would be one approach to start with. The Column B on the created sheets is redundant.

Here are some of my standard routines that I might consider using for such. The concepts are: unique, sorting, and random numbers.
Code:
 'https://msdn.microsoft.com/en-us/library/system.collections.arraylist(v=vs.110).aspx
Function advArrayListSort(sn As Variant, Optional tfAscending1 As Boolean = True, _
    Optional tfAscending2 As Boolean = True, _
    Optional tfNumbersFirst As Boolean = True) As Variant
     
    Dim i As Long, c1 As Object, c2 As Object
    Dim a1() As Variant, a2() As Variant, a() As Variant
     
    Set c1 = CreateObject("System.Collections.ArrayList")
    Set c2 = CreateObject("System.Collections.ArrayList")
     
    For i = LBound(sn) To UBound(sn)
        If IsNumeric(sn(i)) = True Then
            c1.Add sn(i)
        Else
            c2.Add sn(i)
        End If
    Next i
     
    c1.Sort 'Sort ascendending
    c2.Sort 'Sort ascending
     
    If tfAscending1 = False Then c1.Reverse 'Sort and then Reverse to sort descending
    If tfAscending2 = False Then c2.Reverse 'Sort and then Reverse to sort descending
     
    a1() = c1.Toarray()
    a2() = c2.Toarray()
     
    If tfNumbersFirst = True Then
        a() = a1()
        For i = 1 To c2.Count
            ReDim Preserve a(UBound(a) + 1)
            a(UBound(a)) = a2(i - 1)
        Next i
    Else
        a() = a2()
        For i = 1 To c1.Count
            ReDim Preserve a(UBound(a) + 1)
            a(UBound(a)) = a1(i - 1)
        Next i
    End If
     
    advArrayListSort = a()
End Function


' http://www.excelforum.com/excel-programming-vba-macros/819998-filter-and-sort-scripting-dictionary.html
'Early Binding method requires Reference: MicroSoft Scripting Runtime, scrrun.dll
Function UniqueArrayByDict(Array1d() As Variant, Optional compareMethod As Integer = 0) As Variant
  'Dim dic As Object 'Late Binding method - Requires no Reference
  'Set dic = CreateObject("Scripting.Dictionary")  'Late or Early Binding method
  Dim dic As Dictionary     'Early Binding method
  Set dic = New Dictionary  'Early Binding Method
  Dim e As Variant
  dic.CompareMode = compareMethod
  'BinaryCompare=0
  'TextCompare=1
  'DatabaseCompare=2
  For Each e In Array1d
    If Not dic.Exists(e) Then dic.Add e, Nothing
  Next e
  UniqueArrayByDict = dic.Keys
End Function

Function VPickRndX(nArray() As Variant, iPick As Long) As Variant
  Dim i As Long, randIndex As Variant, Temp As Variant
  Randomize
  For i = 1 To iPick
    randIndex = Int(Rnd * UBound(nArray)) + 1
    Temp = nArray(i)
    nArray(i) = nArray(randIndex)
    nArray(randIndex) = Temp
  Next i
  ReDim Preserve nArray(1 To iPick)
  VPickRndX = nArray()
End Function
 
mrinal222 might be a data consultant at his firm tasked to 'conduct studies and training around data analysis across the Global Organization' according to some googling.
So he might even be developing a course! If he's on a training course I'd hazard a guess that any trainer would immediately see that my offering was not his student's own work.:eek:hwell:
In the attached (your Expected Result file with a few additions), a click of the button on the Population sheet will update (not create) the 3 other sheets.
There's a macro in there which updates and copies data from a pivot table on a hidden sheet:
Code:
Sub blah()
Dim RngToClear As Range
'Sheets("Population").Calculate 'might be needed if you're not on Automatic recalculation
With Sheets("Sheet1").PivotTables("PivotTable4")
  .PivotCache.Refresh
  Categories = Array("C21", "ASR", "DR")
  For Each Category In Categories
    .PivotFields("Category").CurrentPage = Category
    topn = .Parent.Range("D1").Value
    .PivotFields("Reference Ids").AutoShow xlAutomatic, xlTop, topn, "Sum of rand"
    Set SourceRng = .DataBodyRange.Offset(, -1)
    With Sheets(Category)
      Set RngToClear = Intersect(.UsedRange.Offset(1), .UsedRange)
      If Not RngToClear Is Nothing Then RngToClear.Clear
      SourceRng.Copy .Range("A2")
      .Range("B2").Resize(SourceRng.Rows.Count).Value = Category 'lose this line if you want as it populates column B which is just the same as the sheet's name.
    End With
  Next Category
End With
End Sub
 

Attachments

  • ExcelGuru8374Expected Result.xlsm
    79.1 KB · Views: 86
Last edited:
Code:
Sub M_snb()
   Application.ScreenUpdating = False
   sn = [transpose(A2:A1001&"_"&B2:B1001)]

   For j = 1 To 3
     sp = Filter(sn, Choose(j, "DR", "C21", "ASR"))
  
     Columns(10).ClearContents
     Cells(1, 10).Resize(UBound(sp) + 1).Name = "snb"
     [snb] = "=rand()"
     st = [index(rank(snb,snb),)]
     
     For jj = 1 To UBound(sp) \ 10
         c00 = c00 & "|" & sp(st(jj, 1) - 1)
     Next
   Next
   
   Columns(10).Resize(, 2).ClearContents
   st = Split(c00, "|")
   Cells(1, 10).Resize(UBound(st) + 1) = Application.Transpose(st)
   Cells(1, 10).Resize(UBound(st) + 1).TextToColumns , , , , 0, 0, 0, 0, -1, "_"
End Sub
 
Back
Top