Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Randomization Code In Stata - Some Concerns!

    I'm using the following STATA code for randomization

    use data, clear
    version 12
    set seed 2846895
    isid id
    sort id
    gen random = runiform()
    bysort branch: gen st_size = _N
    sort branch random
    by branch: gen st_index = _n
    gen treatment = 0
    replace treatment = 1 if st_index <= st_size/2


    My dataset has a variable branch with three categories each having 15, 15 and 14 observations. My code works fine, the only problem is that it doesn't equally divide treatment and control for my categories. Is there any way I can ensure that either my randomization returns equal treatment and control for 14 or the control group gets one more than the treatment?

    Thanks!

  • #2
    I have to say I don't really see why you're not getting equal (or for the branches with 15 observations, a 7/8 split) allocations here. But your code can be cleaned up a bit. You don't need either the st_size or st_index variables for this purpose (though I calculate them anyway below on the theory that you have some other use for them.) And some of the code is a bit cumbersome.

    Code:
    // CREATE SOME DEMONSTRATION DATA
    clear*
    set obs 3
    gen int branch = _n
    gen size = 15
    replace size = 14 in 3
    expand size
    gen id = _n
    
    // DO THE ALLOCATION
    set seed 2846895
    gen random = runiform()
    by branch (random), sort: gen index = _n
    by branch (random): gen treatment = (2*_n <= _N)
    Note: In a data set this small it doesn't matter, but more generally when using random numbers like this it is best to store them as doubles. In a large data set, you could get duplicate values of random at float precision. In fact, in really large data set, shuffling the data requires using two random numbers, each stored as double.


    Comment


    • #3
      Your code seems to do what I understand you to want: within each branch, treatment and control are identical, or the control group gets the extra case.
      Code:
      // create pretend data
      clear
      set obs 44
      generate id = _n
      generate branch = 1 in 1/15
      replace branch = 2 in 16/30
      replace branch = 3 in 31/44
      // do the process
      version 12
      set seed 2846895
      isid id
      sort id
      gen random = runiform()
      bysort branch: gen st_size = _N
      sort branch random
      by branch: gen st_index = _n
      gen treatment = 0
      replace treatment = 1 if st_index <= st_size/2
      
      tab branch treatment
      Code:
      . tab branch treatment
      
                 |       treatment
          branch |         0          1 |     Total
      -----------+----------------------+----------
               1 |         8          7 |        15 
               2 |         8          7 |        15 
               3 |         7          7 |        14 
      -----------+----------------------+----------
           Total |        23         21 |        44

      Comment

      Working...
      X