Hi Statalist,
I am trying to do power calculations by simulation for a difference-in-differences design in a repeated cross-section where treatment assignment is at the village level, therefore I want to generate data that is clustered. I have consulted the Stata FAQ, and a number of StataList posts, including this helpful answer by Clyde Schechter, and this other helpful answer from Joseph Coveney. But when I implement what I thought was a reasonable adaptation of this advice, I tended to find that as I increased my Intra-Cluster Correlation I actually had greater power, rather than less. That doesn't seem right to me. See code block below for the program to generate the data. I then use simulate with this program to estimate power. When I just run the program once it is clear that I get higher t-statistics with higher ICC, so that is consistent with higher power, but is not what I expected. Can anyone see if I made a mistake in the way I generated the data? Thanks a lot!
I am trying to do power calculations by simulation for a difference-in-differences design in a repeated cross-section where treatment assignment is at the village level, therefore I want to generate data that is clustered. I have consulted the Stata FAQ, and a number of StataList posts, including this helpful answer by Clyde Schechter, and this other helpful answer from Joseph Coveney. But when I implement what I thought was a reasonable adaptation of this advice, I tended to find that as I increased my Intra-Cluster Correlation I actually had greater power, rather than less. That doesn't seem right to me. See code block below for the program to generate the data. I then use simulate with this program to estimate power. When I just run the program once it is clear that I get higher t-statistics with higher ICC, so that is consistent with higher power, but is not what I expected. Can anyone see if I made a mistake in the way I generated the data? Thanks a lot!
Code:
capture program drop ddpowersimu
program ddpowersimu, rclass
version 17.0
// Input parameters
syntax, nperclust(integer) /// sample size
treat_ratio(real) /// ratio of treated to untreated
clust_num(integer) /// number of clusters
icc(real) /// set intra-cluster correlation
b1(real) /// b1 under the alternative hypothesis
sd(real) /// standard deviation of outcome
[ alpha(real 0.05) /// set alpha level
]
// Gen random data
clear
set obs `clust_num'
gen int clust = _n // clusters
gen byte x = mod(_n, 2) // treatment
expand `nperclust'
scalar ntotal = `nperclust'*`clust_num'
sort clust
expand 2
gen t = 0 // time
by clust, sort: gen memb_num = _n
sort clust memb_num
by clust: replace t = 1 if memb_num > (ntotal/`clust_num')
// y variable
scalar sd_u = sqrt(`icc')
scalar sd_e = sqrt(1-`icc')
by clust (memb_num), sort: gen u = rnormal(0, sd_u) if _n == 1
by clust (memb_num): replace u = u[1]
gen e = rnormal(0, sd_e)
gen mu = `b1'*x*t
gen y = mu + e + u
// Fit diff in diff regression
reg y x##t, vce(cluster clust)
// Return results
mat a=r(table)
local p1=el(a,rownumb(a,"pvalue"),colnumb(a,"1.x#1.t"))
return scalar pvalue = `p1'
return scalar reject = (`p1'<`alpha')
end
Code:
simulate reject = r(reject) pvalue=r(pvalue), reps(100) seed(1234): ddpowersimu, clust_num(10) nperclust(10) b1(0.5) sd(1) icc(0.1) treat_ratio(0.5) sum reject

Comment