I have been testing various packages in Stata 18 to implement heterogeneity-robust DID estimators for repeated cross-sectional data, specifically hdidregress, jwdid, and csdid. I have come across a few nuances when comparing results across models, and am hoping to get answers to 3 questions below.
I am using repeated cross-sectional data with a binary treatment and staggered treatment adoption, and I have the following variables:
count: number of deaths (dependent variable)
census_tract: group level variable at which treatment occurs
period: time variable in months
treated = 1 if the census tract is treated in the given period, and 0 otherwise
firsttreat: period that the census tract first receives treatment, and 0 for all census tracts that never receive treatment
city: city corresponding to the census tract
Question 1 - hdidregress twfe vs. jwdid [ivar()]
Based on the documentation, when using jwdid without declaring ivar(), the data is assumed to be repeated cross-section, which to my understanding should be consistent with hdidregress twfe. hdidregress and jwdid have different default control groups, so after declaring consistent control groups, I was expecting that both methods, lines (1) and (2) below, would produce consistent point estimates of the coefficients. However this was not the case, and only when including ivar() in jwdid, line (3) below, were the point estimates identical to those produced by hdidregress in line (2).
Why would jwdid with ivar be consistent with hdidregress twfe, but jwdid without ivar be different?
Question 2 - "not-yet-treated" vs. "never treated" control group
hdidregress twfe gives the same result regardless if controlgroup(notyet) or controlgroup(never) is specified, and provides the following output message:
"note: never-treated group encountered; controlgroup(notyet) is equivalent to controlgroup(never)."
Why are the estimates the same for hdidregress twfe when using the not-yet-treated vs. never-treated control groups? hdidregress [ra ipw aipw] and jwdid give different estimates when the never-treated control group is used rather than the not-yet-treated control group.
Question 3 - jwdid "never" option
When using jwdid with the "never" option declared, we get different estimates than when using the default not-yet-treated control group. The estimates obtained when using the "never" option are consistent with those of csdid and hdidregress ra (i.e. lines (4) (5) (6) below produce consistent point estimates).
The documentation for jwdid states that when the "never" option is declared, "for each group/cohort, the period g-1 is from the specification". Does this mean that when "never" is declared, the "pre-treatment" comparison period is only the period immediately prior to treatment (consistent with the Callaway and Sant'Anna approach), rather than an average over all pre-treatment periods (which to my understanding is the usual "pre-treatment" comparison for the extended TWFE approach)?
Any advice/clarification is greatly appreciated!
I am using repeated cross-sectional data with a binary treatment and staggered treatment adoption, and I have the following variables:
count: number of deaths (dependent variable)
census_tract: group level variable at which treatment occurs
period: time variable in months
treated = 1 if the census tract is treated in the given period, and 0 otherwise
firsttreat: period that the census tract first receives treatment, and 0 for all census tracts that never receive treatment
city: city corresponding to the census tract
Question 1 - hdidregress twfe vs. jwdid [ivar()]
Based on the documentation, when using jwdid without declaring ivar(), the data is assumed to be repeated cross-section, which to my understanding should be consistent with hdidregress twfe. hdidregress and jwdid have different default control groups, so after declaring consistent control groups, I was expecting that both methods, lines (1) and (2) below, would produce consistent point estimates of the coefficients. However this was not the case, and only when including ivar() in jwdid, line (3) below, were the point estimates identical to those produced by hdidregress in line (2).
Code:
(1) jwdid count, tvar(period) gvar(firsttreat) cluster(city) (2) hdidregress twfe (count) (treated), group(census_tract) time(period) vce(cluster city) controlgroup(notyet) (3) jwdid count, ivar(census_tract) tvar(period) gvar(firsttreat) cluster(city)
Why would jwdid with ivar be consistent with hdidregress twfe, but jwdid without ivar be different?
Question 2 - "not-yet-treated" vs. "never treated" control group
hdidregress twfe gives the same result regardless if controlgroup(notyet) or controlgroup(never) is specified, and provides the following output message:
"note: never-treated group encountered; controlgroup(notyet) is equivalent to controlgroup(never)."
Why are the estimates the same for hdidregress twfe when using the not-yet-treated vs. never-treated control groups? hdidregress [ra ipw aipw] and jwdid give different estimates when the never-treated control group is used rather than the not-yet-treated control group.
Question 3 - jwdid "never" option
When using jwdid with the "never" option declared, we get different estimates than when using the default not-yet-treated control group. The estimates obtained when using the "never" option are consistent with those of csdid and hdidregress ra (i.e. lines (4) (5) (6) below produce consistent point estimates).
Code:
(4) jwdid count, tvar(period) gvar(firsttreat) cluster(city) never (5) csdid count, time(period) gvar(firsttreat) never cluster(city) (6) hdidregress ra (count) (treated), group(census_tract) time(period) vce(cluster city) controlgroup(never)
Any advice/clarification is greatly appreciated!