Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random assignment within strata - specific group sizes

    Hello,

    I want to assign observations to groups within strata. I want specific and uneven group sizes. Such as, 400 obs in group 1, 180 obs in group 2 etc. I have more observations than the combined number of observations in all groups.


    My existing code:
    ************************************************** *************
    set seed 12345 /* Seed for random number*/
    bys strata: gen rand = uniform() if !missing(strata) /* Generate random number within strata*/

    egen N_r=total(sample==1) /*Total number of obs in sample*/
    gen share_r1=(400/N_r) /*400 obs in group 1*/
    gen share_r2=(180/N_r) /*180 obs in group 2*/

    * Assign treatments:
    replace T_group="1" if rand> (0) & !missing(rand)
    replace T_group="2" if rand> (0+share_r1) & !missing(rand)
    replace T_group="Rest " if rand> (0+share_r1+share_r2) & !missing(rand)

    ************************************************** *************

    With this method I get almost the right number of observations per group, but not exactly, I guess since the random numbers are not exactly evenly spread in [0,1].

    I also tried the command randomtag, but I don't get it to work within strata.

    I am using Stata 14.


    Thanks a lot for helping!

    Best,
    Louise


  • #2
    I'm not entirely clear what you want here. Do you want 400 observations from each stratum in group 1, and 180 from each stratum in group 2? Or do you want a total of 400 observations in group 2? If the latter, how do you allocate them among the strata? Please clarify. It would also be helpful to provide an example of your data, using the -dataex- command, so that some code can be tested. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


    Comment


    • #3
      Hello,
      Thanks for your reply Clyde.
      I try to clarify my question below. (Unfortunately I am not allowed to share any data but I will keep this advice in mind for the future.)

      I want the total number of observations in treatment group 1 to be 400, and the total number of observations in treatment group 2 to be 180. I have about 650 observations in total (N_r) so about 70 observations left over ("Rest").
      The strata are explicit and are based on sex (female, male), income (4 income intervals), and year of birth (4 year of birth intervals). This gives me 32 strata. I want the treatments to be randomized within the strata. That is, to keep the shares of treatment groups similar across strata (400/650 in treatment group 1, and 180/650 in treatment group 2). Some strata have many observations and some other strata have a lot fewer observations, due to the distribution of the data.

      Best,
      Louise

      Comment


      • #4
        If you have 650 individuals in the dataset and you wish to create two groups according to some conditions, and you already know that 70 out of them will be left out, I gather this is not a random selection.

        By reading both #1 ans #3, I got the impression you wish some sort of case-control design, with 1:3 ratio and groups paired according to some covariates.
        Last edited by Marcos Almeida; 24 Apr 2018, 02:53.
        Best regards,

        Marcos

        Comment


        • #5
          Hello,
          Thanks for your reply Marcos.

          I do have different treatments that I want to give, and I want to study the effect of those treatments on an outcome. I want to randomize within strata, to ensures that the main covariates are balanced across treatments, which may improve efficiency, and may be useful for a heterogeneity analysis of treatment effects across covariates.

          I manage to randomize the treatments within strata and get roughly the right number of observation in each treatment group, but not exactly the right numbers (see the code above).

          What do you think about using a loop to assign the 400 observations with the lowest random number into treatment group 1, and then continue to assign the 180 observations with the lowest and not already assigned observations into treatment group 2, etc?
          Or maybe instead of some loop, just rank the observations according to the random number. Then identify the random number placed at rank 400, and assign all observations with a random number below this one into treatment group 1?

          Or is there a much more simple command? (Like ranomtag.)

          Best,
          Louise

          Comment


          • #6
            Maybe just like this?

            **********************************
            set seed 12345 /* Seed for random number*/
            bys strata: gen rand = uniform() if !missing(strata) /* Generate random number within strata*/

            sort rand
            egen rank=rank(rand)

            * Assign treatments:
            replace T_group="1" if rank>(0) & !missing(rand)
            replace T_group="2" if rank>(400) & !missing(rand)
            replace T_group="Rest" if rank>(400+180) & !missing(rand)

            **********************************

            Comment


            • #7
              No, the code in #6 is just a single overall randomization of the entire sample. If your strata are sufficiently large, then they will, with high probability have approximately desired proportions in each group. But if you have some small strata, these could still work out far from your intended proportions. For a randomization that achieves the desired proportions in both large and small proportions (to the extent divisibility permits);

              Code:
              local p1 = 400/650
              local p2 = (180+400)/650
              
              gen double shuffle = runiform()
              
              by stratum (shuffle), sort: gen group = 1 if _n <= floor(`p1'*_N)
              by stratum (shuffle): replace group = 2 if missing(group) & _n <= floor(`p2'*_N)
              replace group = 3 if missing(group)
              Note: If you want to make your group a string variable with values "1", "2", and "Rest", there is no law against that. But at some point you will probably want to analyze this data and almost any such analysis will be easier if the group variable is numeric. Hence my use of 1, 2, and 3.

              Comment


              • #8
                Great, thanks a lot Clyde!

                Comment

                Working...
                X