Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • randomly assign treatment to control group with treated group's distribution

    I want to randomly assign a treatment to my control goup, I have state, year, and the treatment variable, in treatment group, only one year will be treated, and thus in my control group, I only want to assign one treatment to one state across entire period.

    I have full 50 states, from 2012 to 2020, all my treatment start from 2016 and my treatment group has following distribution:
    count year
    1 2016
    3 2017
    7 2018
    9 2019
    9 2020
    this means I only have 1 state that is treated in 2016 and 3 states in 2017 and so on

    Here is my data example with two states in treatment group and two states in control group

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str20 state float(year cyear dif)
    "Alabama"  2018    .  0
    "Alabama"  2013    .  0
    "Alabama"  2012    .  0
    "Alabama"  2014    .  0
    "Alabama"  2017    .  0
    "Alabama"  2020    .  0
    "Alabama"  2019    .  0
    "Alabama"  2015    .  0
    "Alabama"  2016    .  0
    "Alabama"  2021    .  0
    "Alaska"   2018    .  0
    "Alaska"   2012    .  0
    "Alaska"   2020    .  0
    "Alaska"   2015    .  0
    "Alaska"   2013    .  0
    "Alaska"   2019    .  0
    "Alaska"   2017    .  0
    "Alaska"   2016    .  0
    "Alaska"   2021    .  0
    "Alaska"   2014    .  0
    "Arizona"  2017 2018 -1
    "Arizona"  2021 2018  3
    "Arizona"  2016 2018 -2
    "Arizona"  2012 2018 -6
    "Arizona"  2015 2018 -3
    "Arizona"  2014 2018 -4
    "Arizona"  2019 2018  1
    "Arizona"  2013 2018 -5
    "Arizona"  2020 2018  2
    "Arizona"  2018 2018  0
    "Arkansas" 2012 2019 -7
    "Arkansas" 2016 2019 -3
    "Arkansas" 2020 2019  1
    "Arkansas" 2013 2019 -6
    "Arkansas" 2019 2019  0
    "Arkansas" 2017 2019 -2
    "Arkansas" 2015 2019 -4
    "Arkansas" 2021 2019  2
    "Arkansas" 2014 2019 -5
    "Arkansas" 2018 2019 -1
    end

    cyear means the year state got the treatment, dif means the how far away from the treatment year. I tried following but when I try to random assign, I fail to let only one state get the treatment, in the end, my state can get more than one treatment:

    Code:
    
    gen ran=uniform()
    gen treatment =. 
    sort year  ran 
    by year: replace treatment =(_n <= 1) if (year == 2016) & t ==.
    by year: replace treatment =(_n <= 3) if (year == 2017) & t ==. 
    by year: replace treatment =(_n <= 7) if (year == 2018) & t ==.
    by year: replace treatment =(_n <= 9) if (year == 2019) & t ==.
    by year: replace treatment =(_n <= 5) if (year == 2020) & t ==.

  • #2
    This may help get things rolling

    Code:
    egen sid = group(state)
    xtset sid year
    g cyear_pseudo = cyear
    g dif_pseudo = dif
    egen controlx = sum(dif) , by(sid)
    g control = controlx==0
    g u = runiform()
    bys sid: replace u = u[1]
    summ u if control
    replace cyear_pseudo = 2016 if control & u == `r(min)'
    replace dif_pseudo = year - cyear_pseudo

    Comment


    • #3
      Originally posted by George Ford View Post
      This may help get things rolling

      Code:
      egen sid = group(state)
      xtset sid year
      g cyear_pseudo = cyear
      g dif_pseudo = dif
      egen controlx = sum(dif) , by(sid)
      g control = controlx==0
      g u = runiform()
      bys sid: replace u = u[1]
      summ u if control
      replace cyear_pseudo = 2016 if control & u == `r(min)'
      replace dif_pseudo = year - cyear_pseudo
      Hello George,

      Thanks for the reply, I think the code only work for if I only want to have 1 random assignment right?

      I have some thought these day and think following can work, but I don't know whether I will redraw for those already been drawed:

      Code:
      preserve 
      tempfile control 
      drop if treat == 1 
      drop treat 
      gen random = 1 
      save `control', replace 
      
      tempfile 2015
      sample 5, count 
      gen treat = 1 
      gen t = 2015
      save `2015', replace 
      restore 
      
      preserve 
      use `control', clear
      tempfile 2016 
      sample 2, count 
      gen treat = 1 
      gen t = 2016
      save `2016', replace 
      restore 
      
      
      preserve
      
      clear
      
      tempfile allyear
      
      append using "`2015'" "`2016'"
      
      duplicates drop id year, force
      
      save `allyear', replace
      
      restore 
      
      *original dataset
      use new_area, clear 
      drop treat 
      gen random = 0 
      gen start = 0
      
      append using `allyear'
      
      by id year, sort: gen test =_N
      
      *drop original and replace the new one 
      
      drop if test == 2 & random == 0
      
      
      *get the treatment year
      by id, sort: egen t2=max(t)
      by id, sort: egen treat2=max(treat)
      *treatment period 
      gen did0=1 if year >= t2 & treat2== 1 
      replace did0=0 if did0 ==.

      Comment


      • #4
        you said "I only want to assign one treatment to one state across entire period". perhaps I misunderstood.

        Do you want only one state to get a treatment in each round of a simulation, with all the others being a control?

        Comment


        • #5
          Originally posted by George Ford View Post
          you said "I only want to assign one treatment to one state across entire period". perhaps I misunderstood.

          Do you want only one state to get a treatment in each round of a simulation, with all the others being a control?
          For my data, I already have Y for control and treatment group, but for control group I don't have treatment date, and I want to random assign one treatment to the control group

          But since I need to following the treatment group distribution, I would randomly select some state to be treated in each year.

          For example, in treament group, there are 1 state has been treated in 2016 and 3 states have been treated in 2017. And suppose in control group, Alabama is randomly selected as the one state treated in 2016. So in the following 2017, when I randomly draw 3 states in control group, I don't want Alabama in the set of control state to be drawed

          Comment


          • #6
            Code:
            g control = mi(cyear)
            forv i = 1/10 {
                capture drop fyear fdif u ru rru
                g fyear = cyear
                g fdif = dif
                g u = runiform() if year==2012 & control
                egen ru = rank(u)
                egen rru = min(ru), by(state)
                replace rru = . if rru>4
                replace fyear = 2016*(rru==1) + 2017*(rru>1) if !mi(rru)
                replace fdif = year - fyear if control
                ** dostuff
            }

            Comment


            • #7
              Originally posted by George Ford View Post
              Code:
              g control = mi(cyear)
              forv i = 1/10 {
              capture drop fyear fdif u ru rru
              g fyear = cyear
              g fdif = dif
              g u = runiform() if year==2012 & control
              egen ru = rank(u)
              egen rru = min(ru), by(state)
              replace rru = . if rru>4
              replace fyear = 2016*(rru==1) + 2017*(rru>1) if !mi(rru)
              replace fdif = year - fyear if control
              ** dostuff
              }
              Thank you for the help, I think I figure it out by dropping the assigned states each year and then append them back:

              Code:
              set seed  1234
              
              cap drop ran treatment
              
              
              
              tempfile control
              drop if t != .
              gen ran=uniform()
              gen treatment = .
              save `control', replace 
              
              
              
              use `control' ,clear 
              tempfile 2016
              sort year  ran 
              by year: replace treatment =(_n <= 1) if (year == 2016) 
              egen c16 = sum(treatment), by(state)
              replace c16 = 2016 if c16 == 1 
              replace cyear = c16 if c16 == 2016
              
              save `2016', replace 
              
              use `2016', clear 
              tempfile 2017
              drop if c16 == 2016 
              sort year  ran 
              by year: replace treatment =(_n <= 3) if (year == 2017) 
              egen c17 = sum(treatment), by(state)
              replace c17 = 2017 if c17 == 1 
              replace cyear = c17 if c17 == 2017
              save `2017', replace

              Comment

              Working...
              X