Dear all:
What I like to do is, to run placebo regressions for a difference in difference regression (y=b0+b1*event+b2*treatment+b3*(treatment*event)+e - for which I get a highly sig. b3 coefficient) in which I want to distribute the treatment variable randomly on the individuals (to show that the real distribution of the treatment status is driving my results, not some process in the back that might coincide with the treatment).
To do this I use the mean of the treatment variable
sum treatment, meanonly
and then generate random numbers based on this mean - I generate 1000 treatment variables:
set seed 123456789
forval i=1(1)1000{
qui: gen treat`i'=runiform()<=`r(mean)'
}
Then I run 1000 regressions in which I use each of the randomly generated treated variables and count the number of estimates for the difference in difference effect b3 that are significant at the 5% level. When I do it as above, I get 63 out of 1000 which is a bit above the 5% level which might question whether it is really the real treatment variable that is generating the results or some other factor not detected in the regression.
Now come the questions!
When I do exactly the same but change the set seed with every simulation run
local run 123456788
forval i=1(1)1000{
local ++run
set seed `run'
qui: gen treat`i'=runiform()<=`r(mean)'
}
and then run the 1000 placebo regression, I get only 51 out of 1000 coefficients that are significant; which are exactly the number you expect taking the 5% sig. threshold.
So my question is: why is that and what is the better (the correct) approach to generate the placebo variables here.
Thanks in advance
Best
Felix
What I like to do is, to run placebo regressions for a difference in difference regression (y=b0+b1*event+b2*treatment+b3*(treatment*event)+e - for which I get a highly sig. b3 coefficient) in which I want to distribute the treatment variable randomly on the individuals (to show that the real distribution of the treatment status is driving my results, not some process in the back that might coincide with the treatment).
To do this I use the mean of the treatment variable
sum treatment, meanonly
and then generate random numbers based on this mean - I generate 1000 treatment variables:
set seed 123456789
forval i=1(1)1000{
qui: gen treat`i'=runiform()<=`r(mean)'
}
Then I run 1000 regressions in which I use each of the randomly generated treated variables and count the number of estimates for the difference in difference effect b3 that are significant at the 5% level. When I do it as above, I get 63 out of 1000 which is a bit above the 5% level which might question whether it is really the real treatment variable that is generating the results or some other factor not detected in the regression.
Now come the questions!
When I do exactly the same but change the set seed with every simulation run
local run 123456788
forval i=1(1)1000{
local ++run
set seed `run'
qui: gen treat`i'=runiform()<=`r(mean)'
}
and then run the 1000 placebo regression, I get only 51 out of 1000 coefficients that are significant; which are exactly the number you expect taking the 5% sig. threshold.
So my question is: why is that and what is the better (the correct) approach to generate the placebo variables here.
Thanks in advance
Best
Felix
Comment