Thanks for the helpful tips Ben! Obtaining the correct ATT with weights is no problem but I am struggling a bit on where to put the weights into the G so the IF is calculated correctly. I saw on p.60 of the aforementioned paper you explain the use of survey weights for the logistic regression example. This clarifies that instead of N one should use the sum of weights (W), and in the simple case the weight w just goes into the cross product. For the various G_subscripts, I am not sure exactly how this applies. G11 is the simplest case, but for the others I am unsure. I tried out several things but didn't get the right ATT using csdid_rif, so would be grateful of further guidance.
Then repeat for female==1 and merge that onto the main dataset too and use csdid_rif followed by the test command.
Code:
preserve keep if year==2022 & female==0 mata Dnm = "treat"; Xnm = "$controls" Ynm = "outcome" ; Znm = "$controls" N = st_nobs() D = st_data(., Dnm); X = st_data(., Xnm), J(N, 1, 1) Y = st_data(., Ynm); Z = st_data(., Znm), J(N, 1, 1) w = st_data(., "weight") W = sum(W) // estimate logit and create weights stata("quietly logit " + Dnm + " " + Xnm + " [pw=weight]") p = invlogit(X * st_matrix("e(b)")') w0 = p :/ (1 :- p) :* !D st_store(., st_addvar("double", "w0"), w0) stata("gen est_weight=w0*weight") w0_weight = "est_weight" est_weight = st_data(., w0_weight) // estimate regression model stata("quietly regress " + Ynm + " " + Znm + " if " + Dnm + "==0 [iw=est_weight]") Zg0 = Z * st_matrix("e(b)")' // compute IF for eta01 h1 = X :* (D - p) G11inv = invsym(cross(X, w :* p :* (1 :- p), X) / W) //I think this is now ok after adding w and W h2 = Z :* w0 :* (Y :- Zg0) //Should this use est_weight instead of w0? G21 = cross(-h2, X) / W //Needs editing G22inv = invsym(cross(Z, w0, Z) / W) //Needs editing, perhaps using est_weight instead of w0 or equivalently w:* w0? eta01 = mean(Zg0, D:* w) h3 = D :* (Zg0 :- eta01) G32 = colsum(-D :* Z) / W //Needs editing IF_eta01 = W/sum(D:* w) * (h3 - (h2 - h1 * G11inv' * G21') * G22inv' * G32') //Not sure here // compute IF for eta11 eta11 = mean(Y, D:* w) IF_eta11 = W/sum(D:* w) * D:* w :* (Y :- eta11) //Needs editing // compute IF for ATT ATT = eta11 - eta01 ATT st_local("att", strofreal(ATT)) //added to store the att in a local to calculate the RIF from the IF IF_ATT = IF_eta11 - IF_eta01 st_store(., st_addvar("double", "if_att"), IF_ATT) //added to store the IF for all observations // display results (point estimate, mean of IF, standard error) (ATT, eta11, eta01)', mean((IF_ATT, IF_eta11, IF_eta01))', sqrt(diagonal(variance((IF_ATT, IF_eta11, IF_eta01)) / W)) * sqrt((W-1)/W) end sum if_att //check whether the IF is on average 0 *Calculate the RIF from IF produced above cap drop RIF_att gen RIF_att=if_att+`att' *Now obtain the ATT with clustered standard errors csdid_rif RIF_att, cluster(cluster_var) *Save for male population keep id RIF_att year rename RIF_att RIF_male tempfile male_merge save `male_merge' restore
Comment