Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulation (x,y,d=1)

    Hello,

    I am doing a simple simulation with stata12 which contains one variable (x), an outcome y and a binary treatment indicator (d). If i generate x then all observations are part of the same distribution regardless of the distribution of the treatment and control group after the attribution to one of the groups. But is it possible to have different distributions of x for the treated and control observations? Because in reality you are often confronted with groups that differ in their initial distribution(s) of x (treated individuals might be older than controls for example, etc...). The thing is that you somehow have to set the distribution before the assignment to treatment or control group is conducted. I do not really like my solution so far, maybe someone comes up with a better idea:

    set obs 200
    generate x=rnormal()
    replace x=rnormal(2,1) if _n>150
    gen u = invnorm(uniform())
    gen e = invnorm(uniform())
    gen d=-0.5+1.5*x+u
    replace d=(d>1)
    gen y=10*x+d*100+e

    The thing is that all observations >150 have a much higher chance to get treated. So I kind of created two "subpopulations", but still this is not exactly what I am looking for. I want to generate 200 observations with x, then comes the assignment to treatment and now I want to (somehow backwards) say: ok now change the initial distribution of x for treatment and controls. Later on I want to conduct propensity score matching using psmatch2.

    I appreciate any help!!

  • #2
    Jan,

    I'm not sure I follow your question, as it sounds a bit circular. Nonetheless, I think the code you provided may be mostly OK. If you take out the replace x=rnormal(2,1) if _n>150, you still end up with two different distributions of x for treated and control subjects, by virtue of the fact that the treatment "assignment" assumes a positive relationship between x and the likelihood of being treated. You can verify this by doing mean x, over(d) or ttest x, by(d) to see how different they are. Although you cannot precisely control the exact distribution of x(treated) and x(control), you should still be able to adjust the gen d=-0.5+1.5*x+u statement to get close to what you want.

    This approach makes theoretical sense, as well. If x is a characteristic that influences treatment (or no treatment), then you should calculate the likelihood of being treated given x, not the other way around.

    Regards,
    Joe

    Comment


    • #3
      thx for your reply. But I would like to simulate a situation where the polulations of treated and controls have different distributions because they differ systematicially. But based on the observable characteristic(s). Let us just consider one variable: age. There is a labor policy and escpecially elderly people want to apply for this program. Next step would be that only 50% can take the program because of scarve ressources. And the data you have to compare in order to get the ATT differs in age (say the distribution or mean rather mean is lower). So even before you have the decision if an individual gets treated or not the control group and the (possible) treated group differ sysmetticiall in age. I have seen some papers where they decide at the beginning say: 1/3 of observations are treated and then they continue, so there is no index function really. But I want to have an index function. I hope it became clearer now, maybe it is the language, I am not a native speaker.

      Comment


      • #4
        Just keep you updated Joe: I think I might have found a solution. My syntex "sytle" is amateur-ish but here is my do-file:

        clear all
        program define outcome1, rclass
        drop _all
        set more off
        set obs 1000
        gen xc=rnormal()
        replace xc=0 if _n>499
        gen xt=rnormal(2,1)
        replace xt=0 if _n<500
        gen u = invnorm(uniform())
        gen e = invnorm(uniform())
        gen x=xc
        replace x=xt if _n>499
        gen d=-0.5+x+e if _n>500
        replace d=0 if (d==.) | (d<0.9)
        replace d=1 if (d>0.5)
        gen y=10*x+d*100+u
        sort u
        logit d x
        predict pr, pr
        psmatch2 d, outcome(y) pscore(pr)
        return scalar att = r(att)
        qui psmatch2 d x, outcome(y)
        pstest x, both
        return scalar meanbiasbef = r(meanbiasbef)
        return scalar meanbiasaft = r(meanbiasaft)
        end

        outcome1
        corr x d y


        simul att=r(att) meanbiasbef=r(meanbiasbef) meanbiasaft=r(meanbiasaft), reps(100) dots:outcome1
        summ


        Then I calculate the bias:
        gen bias=(att-100)
        sum bias


        And now I plot the standardized bias und the bias:
        twoway (scatter bias meanbiasaft) (lpoly bias meanbiasaft)


        I hope that will work

        Comment

        Working...
        X