CSDID ATT(g,t) Interpretation and Interpretation with Covariates

Alex Soltoff

Join Date: Feb 2026

Posts: 4
#1

CSDID ATT(g,t) Interpretation and Interpretation with Covariates

13 Feb 2026, 10:50

FernandoRios

Hi All,

I am using csdid to examine the impact of healthcare facility acquisitions by large investment firms (the “event”) on nurse salary spending (the outcome). For simplicity, assume my panel is balanced with no missingness.

My core specification is:
csdid outcome, ivar(facility_id) time(year) gvar(acquisitionyear)
and a second version that adds state indicators:
csdid outcome i.state, ivar(facility_id) time(year) gvar(acquisitionyear)
where state is a set of U.S. state dummies. My unit of time is years, so rows in my dataset represent facility-year observations. I have two conceptual questions about what csdid is doing under the hood.

1. How is ATT(g,t) constructed?

My current understanding is that ATT(g,t) is built from within-unit changes over time. In very simplified terms, I have been thinking of it as:
(average within-treated facility change from t−1 to t) − (average within-control facility change from t−1 to t).

Is this intuition broadly correct?

More specifically, can I confirm that csdid is relying on within-unit changes (i.e., differences over time within each facility), rather than something like the below?
(mean of treated cohort at t − mean of treated cohort at t−1) − (mean of controls at t − mean of controls at t−1)

2. What do covariates like i.state do in csdid?

In the second specification I include state dummies:
csdid outcome i.state, ivar(facility_id) time(year) gvar(acquisitionyear)
What is csdid doing behind the scenes to calculate treatment effects when I add a categorical variable like i.state?

My current intuition is:
If inverse probability weighting (IPW) is used, it reweights control observations: Controls from states that are underrepresented relative to the treated group receive more weight, and controls from overrepresented states receive less weight, so that the weighted control group better reflects where treated hospices are located.

The regression adjustment (including i.state) accounts for persistent differences across states by estimating and subtracting each state’s average outcome level in a given year (for example, average nurse salary spending in that state-year). After removing those state-year differences, the treatment effect is calculated using the remaining within-state changes between t and t-1. In practice, this means the estimate is driven mainly by how outcomes change for treated hospices relative to untreated hospices operating in the same states, rather than by comparisons across very different states

Is this second intuition broadly correct?

Many thanks in advance for any clarification this forum might provide!

Jeff Wooldridge

Last edited by Alex Soltoff; 13 Feb 2026, 10:53.
Tags: CallawaySantAnna, causal inference, csdid, panel regression, staggered adoption
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2291
#2

15 Feb 2026, 16:12

Alex: Your intuition is correct, except that for ATT(g,t) the change is from g-1 to t. With panel data, csdid is the same as creating differences, for each unit, from g-1 to t. Those differences are then averaged for treated cohort g. You have your choice about what to average for the control group: the never treated units or those not treated by period t. As I show in my work with Soo Jeong Lee [Lee and Wooldridge (2025)], the cohort assignment is unconfounded with respect to the difference, and therefore, once you've created the "long" differences, you can apply any treatment effect estimator you like. Without controls, there's basically two estimators: do you use all pre-treatment periods (as in Wooldridge (2025, Empirical Economics) lags only) or just the one in period g-1 (CS).

The role of covariates is to make the parallel trends assumption more plausible. Technically, you assume unconfoundedness of the cohort assignment WRT to long differences, conditional on X. You can use regression adjustment or IPW or AIPW or IPWRA. In the case of i.state, there's no reason to go beyond regression because the dummies are exhaustive and mutually exclusive. So you might as well use

Code:

jwdid outcome i.state, ivar(facility_id) tvar(year) gvar(acquisitionyear) never

The above gives the leads and lags (event study) estimate. If you drop "never," it's the lags only estimate. If the probability of treatment is one for some states, these states will be dropped in the regression (but likely will cause problems for the CS AIPW estimator).
Comment
Alex Soltoff

Join Date: Feb 2026

Posts: 4
#3

16 Feb 2026, 09:48

Thanks so much Dr. Wooldridge! Incredibly helpful. I mistakenly wrote t-1 instead of g-1.
1 like
Comment
Alex Soltoff

Join Date: Feb 2026

Posts: 4
#4

19 Feb 2026, 11:38

Jeff Wooldridge Another quick (and ignorant) question here. In reality, I have some states with up to 97 never acquired facilities and some states with only 1 or 2. For smaller cohorts like 2016 (see attached table) my cohort-aggregated ATT estimates start to get really crazy when I include state as a covariate. My cohort-aggregated estimates more or less make sense when my treated cohorts are large; however, the overall sample average treatment effects are quite a bit larger when controlling for state.

Might you have advice on the following
1. How should I handle regression/model adjustments when states have varying amounts of data (some states have tons of controls/treated, some have very few)?
2. Am I better off including states as a fixed effects variable (fevar) or including state as an outcome in models?

Thanks again for any assistance this community can provide!
Attached Files

Table1_AcquisitionsStataQuestion.pdf (109.4 KB, 1 view)

Last edited by Alex Soltoff; 19 Feb 2026, 11:54.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2291
#5

21 Feb 2026, 09:20

Good question. If you include the state dummies among the covariates, you will estimate separate effect initially by state (as well as cohort and time period). That might be too much. On the other hand, including state FEs among the covariates means you're allowing flexible heterogeneous trends. If some states have few observations then the standard errors might not be trustworthy if the model is too flexible. But it's hard to know.
Comment
Alex Soltoff

Join Date: Feb 2026

Posts: 4
#6

23 Feb 2026, 13:22

Thanks again Jeff Wooldridge! Very helpful.
Comment

Announcement

CSDID ATT(g,t) Interpretation and Interpretation with Covariates

Comment

Comment

Comment

Comment

Comment