I’m seeing that when I run:
reghdfe dep_var if treatID == 0, absorb(id year) resid(resid_all)
the stored residuals (resid_all) are only non-missing for the treatID == 0 observations (i.e., those actually used in the regression). All treatID == 1 rows show missing.
Because I restrict the estimation to the control subsample, reghdfe doesn’t compute or cache FE-adjusted predictions or residuals for the excluded (treated) group. However, for my two-stage DID I need residuals (or fitted values) across the entire panel, not just the sample that was regressed on.
I cannot simply switch to regress + predict because my dataset is enormous (millions of obs) and I have dozens of high-dimensional fixed effects—regress cannot handle that many FEs efficiently.
Has anyone discovered a workaround or option—within reghdfe or via a small manual step—that allows one to generate residuals for every observation, including those outside of e(sample)? Thanks!
reghdfe dep_var if treatID == 0, absorb(id year) resid(resid_all)
the stored residuals (resid_all) are only non-missing for the treatID == 0 observations (i.e., those actually used in the regression). All treatID == 1 rows show missing.
Because I restrict the estimation to the control subsample, reghdfe doesn’t compute or cache FE-adjusted predictions or residuals for the excluded (treated) group. However, for my two-stage DID I need residuals (or fitted values) across the entire panel, not just the sample that was regressed on.
I cannot simply switch to regress + predict because my dataset is enormous (millions of obs) and I have dozens of high-dimensional fixed effects—regress cannot handle that many FEs efficiently.
Has anyone discovered a workaround or option—within reghdfe or via a small manual step—that allows one to generate residuals for every observation, including those outside of e(sample)? Thanks!
Comment