Simulation (x,y,d=1)

Jan Nash

Join Date: May 2014

Posts: 5
#1

Simulation (x,y,d=1)

28 May 2014, 04:25

Hello,

I am doing a simple simulation with stata12 which contains one variable (x), an outcome y and a binary treatment indicator (d). If i generate x then all observations are part of the same distribution regardless of the distribution of the treatment and control group after the attribution to one of the groups. But is it possible to have different distributions of x for the treated and control observations? Because in reality you are often confronted with groups that differ in their initial distribution(s) of x (treated individuals might be older than controls for example, etc...). The thing is that you somehow have to set the distribution before the assignment to treatment or control group is conducted. I do not really like my solution so far, maybe someone comes up with a better idea:

set obs 200
generate x=rnormal()
replace x=rnormal(2,1) if _n>150
gen u = invnorm(uniform())
gen e = invnorm(uniform())
gen d=-0.5+1.5*x+u
replace d=(d>1)
gen y=10*x+d*100+e

The thing is that all observations >150 have a much higher chance to get treated. So I kind of created two "subpopulations", but still this is not exactly what I am looking for. I want to generate 200 observations with x, then comes the assignment to treatment and now I want to (somehow backwards) say: ok now change the initial distribution of x for treatment and controls. Later on I want to conduct propensity score matching using psmatch2.

I appreciate any help!!
Tags: None
Joe Canner

Join Date: Mar 2014

Posts: 580
#2

28 May 2014, 11:22

Jan,

I'm not sure I follow your question, as it sounds a bit circular. Nonetheless, I think the code you provided may be mostly OK. If you take out the replace x=rnormal(2,1) if _n>150, you still end up with two different distributions of x for treated and control subjects, by virtue of the fact that the treatment "assignment" assumes a positive relationship between x and the likelihood of being treated. You can verify this by doing mean x, over(d) or ttest x, by(d) to see how different they are. Although you cannot precisely control the exact distribution of x(treated) and x(control), you should still be able to adjust the gen d=-0.5+1.5*x+u statement to get close to what you want.

This approach makes theoretical sense, as well. If x is a characteristic that influences treatment (or no treatment), then you should calculate the likelihood of being treated given x, not the other way around.

Regards,
Joe
Comment
Jan Nash

Join Date: May 2014

Posts: 5
#3

29 May 2014, 16:25

thx for your reply. But I would like to simulate a situation where the polulations of treated and controls have different distributions because they differ systematicially. But based on the observable characteristic(s). Let us just consider one variable: age. There is a labor policy and escpecially elderly people want to apply for this program. Next step would be that only 50% can take the program because of scarve ressources. And the data you have to compare in order to get the ATT differs in age (say the distribution or mean rather mean is lower). So even before you have the decision if an individual gets treated or not the control group and the (possible) treated group differ sysmetticiall in age. I have seen some papers where they decide at the beginning say: 1/3 of observations are treated and then they continue, so there is no index function really. But I want to have an index function. I hope it became clearer now, maybe it is the language, I am not a native speaker.
Comment
Jan Nash

Join Date: May 2014

Posts: 5
#4

30 May 2014, 07:42

Just keep you updated Joe: I think I might have found a solution. My syntex "sytle" is amateur-ish but here is my do-file:

clear all
program define outcome1, rclass
drop _all
set more off
set obs 1000
gen xc=rnormal()
replace xc=0 if _n>499
gen xt=rnormal(2,1)
replace xt=0 if _n<500
gen u = invnorm(uniform())
gen e = invnorm(uniform())
gen x=xc
replace x=xt if _n>499
gen d=-0.5+x+e if _n>500
replace d=0 if (d==.) | (d<0.9)
replace d=1 if (d>0.5)
gen y=10*x+d*100+u
sort u
logit d x
predict pr, pr
psmatch2 d, outcome(y) pscore(pr)
return scalar att = r(att)
qui psmatch2 d x, outcome(y)
pstest x, both
return scalar meanbiasbef = r(meanbiasbef)
return scalar meanbiasaft = r(meanbiasaft)
end

outcome1
corr x d y

simul att=r(att) meanbiasbef=r(meanbiasbef) meanbiasaft=r(meanbiasaft), reps(100) dots:outcome1
summ

Then I calculate the bias:
gen bias=(att-100)
sum bias

And now I plot the standardized bias und the bias:
twoway (scatter bias meanbiasaft) (lpoly bias meanbiasaft)

I hope that will work
Comment

Announcement

Simulation (x,y,d=1)

Comment

Comment

Comment