Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using randomtag by group

    Dear Statalisters,

    I am having a dataset with individuals belonging to different groups and I am running a simulation with many repetitions. Per repetition, one individual is randomly drawn from each group using the sample command, e.g.

    Code:
    *A toy example
    clear
    *Generate 100 different groups
    set obs 100
    generate long group=_n
    *Generate 1000 indivudals per group
    expand 1000
    bysort group: gen individual=_n
    
    *Sample 1 individual per group
    by group: sample 1, count
    Since sample relies on sorting the data (which makes the code run rather slowly), I would like to use the user-written command randomtag (from SSC) that tags the same observations that sample would select but does not sort the observations.

    My problem is that randomtag does not have a by() option, so I can't use it to sample one individual per group. Does anyone has an idea how to accomplish this with randomtag or with another workaround

    If anyone has any ideas, please let me know, thank you in advance!

    Ali

  • #2
    Assuming that the data is already sorted by group, you can repeatedly pick one individual per group randomly without sorting via explicit subscripting (help subscripting). All you need is to generate a random observation index within each group. Something like:

    Code:
    version 15
    set seed 3121
    
    clear
    set obs 100
    gen long group=_n
    gen nid = runiformint(10,1000)
    expand nid
    bysort group: gen individual=_n
    
    by group: gen long pickid = runiformint(1,_N) if _n == 1
    by group: gen pick = _n == pickid[1]
    
    by group: replace pickid = runiformint(1,_N) if _n == 1
    by group: gen pick2 = _n == pickid[1]
    
    listsome if !mi(pickid) | pick | pick2, sepby(group) max(21)
    and the output generated by listsome (from SSC):
    Code:
    . listsome if !mi(pickid) | pick | pick2, sepby(group) max(21)
    
           +------------------------------------------------+
           | group   nid   indivi~l   pickid   pick   pick2 |
           |------------------------------------------------|
        1. |     1   558          1      230      0       0 |
       51. |     1   558         51        .      1       0 |
      230. |     1   558        230        .      0       1 |
           |------------------------------------------------|
      559. |     2   705          1      349      0       0 |
      579. |     2   705         21        .      1       0 |
      907. |     2   705        349        .      0       1 |
           |------------------------------------------------|
     1264. |     3   513          1        5      0       0 |
     1268. |     3   513          5        .      0       1 |
     1292. |     3   513         29        .      1       0 |
           |------------------------------------------------|
     1777. |     4   507          1       83      0       0 |
     1859. |     4   507         83        .      0       1 |
     1923. |     4   507        147        .      1       0 |
           |------------------------------------------------|
     2284. |     5    58          1       32      0       0 |
     2315. |     5    58         32        .      0       1 |
     2322. |     5    58         39        .      1       0 |
           |------------------------------------------------|
     2342. |     6   945          1      220      0       0 |
     2561. |     6   945        220        .      0       1 |
     2920. |     6   945        579        .      1       0 |
           |------------------------------------------------|
     3287. |     7   251          1      159      0       0 |
     3421. |     7   251        135        .      1       0 |
     3445. |     7   251        159        .      0       1 |
           +------------------------------------------------+

    Comment


    • #3
      Thank you Robert, that’s an elegant solution, a quick check indicates that this saves a significant amount of computing time per repetition compared to the sample approach, I will provide detailed figures on Monday. I'm even more amazed by the listsome ado, I needed this without even knowing it. Kudos!.

      Comment


      • #4
        As promised, here are some detailed figures: to compare the sample approach with Robert's suggestion, I generate a data set consisting of 1000 groups with a varying number of individual members (between 1 an 1,000).

        Code:
        set seed 3121
        
        clear
        set obs 1000
        gen long group=_n
        gen nid = runiformint(1,1000)
        expand nid
        drop nid
        bysort group: gen individual=_n
        From this data set consisting of roughly half a million observations, I randomly drew samples of 1 indvidual per group, this step is repeated 1,000 times.

        Code:
        *A: Standard approach using sample command
            timer clear 1
            timer on 1
            quietly {
                forvalues i=1/1000 {
                        preserve
                        by group: sample 1, count
                        if mod(`i',100)==0 {
                            local picked=_N
                            noisily di "repetition: `i' | sample size: `picked'"
                        }
                        restore
                    }
                }
            timer off 1    
        
        *B: Subscripting approach by Robert
                timer clear 2
                timer on 2
                gen long pickid=.
                gen pick=.
                quietly {
                forvalues i=1/1000{
                    by group: replace pickid = runiformint(1,_N) if _n == 1
                    by group: replace pick = _n == pickid[1]
                    if mod(`i',100)==0 {
                            count if pick==1
                            local picked=r(N)
                            noisily di "repetition: `i' | sample size: `picked'"
                        }
                }
                }
                timer off 2
                timer list
        Result:
        Code:
        .                 timer list
           1:    733.06 /        1 =     733.0620
           2:     77.16 /        1 =      77.1560
        The result indicates that Robert's approach is faster by a factor of ~10. Thanks again!

        Comment

        Working...
        X