CSDID with Individual-Level Data - Treatment is at the Group Level

Khushi Surana

Join Date: Jan 2025

Posts: 3
#1

CSDID with Individual-Level Data - Treatment is at the Group Level

10 Feb 2025, 05:33

Hello Statalist,

I am using the Callaway & Sant’Anna (2021) Difference-in-Differences (DiD) estimator in Stata (csdid) to analyze the impact of a policy change. However, I am running into an issue due to the structure of my data and treatment assignment.

My Data Structure:
Unit of Observation: Individual-level (households).
Treatment Assignment: Treatment is assigned at the province level.
Time Variable: year_month.
Treatment Timing Variable (gvar): The year_month when a province first received treatment (all individuals in a province share the same treatment timing).

When I run the following command:
csdid y, id(province) time(year_month) gvar(first_treat) method(dripw) notyet

I get the error (duplicate time and gvar values).
This is because of the duplicate values of gvar (treatment timing) and time (survey wave) across multiple individuals in the same province. Since csdid estimates group-time average treatment effects (GATTs), I am wondering:
Do I need to collapse my data to the province-time level? If so, will this affect comparisons with a standard TWFE DiD model, which I am also running?

Can csdid handle individual-level data when treatment happens at the province level? If so, how should I define gvar to avoid issues?

Will something like this work?
egen time_treated = csgvar(treatment), tvar(year_month) ivar(province_id)
Tags: None
Luisa Nazareno

Join Date: Feb 2025

Posts: 1
#2

20 Feb 2025, 15:15

I have a very similar challenge, and was hoping FernandoRios would provide some insight. I have been browsing StataList for a while and could not find a clear response to this yet.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2471
#3

21 Feb 2025, 07:28

Well, The problem here is that the data its not Panel, but repeated crossection
when you add ivar() you are telling csdid your data is panel. The command will try to verify that, and give the reported error
Alternative...assume data its repeated crossection. In this case, only group fixed effects are considered, and province fixed effects will make little to no sense to add.
Second alternative, use jwdid with the fevar options.
HTH
F
1 like
Comment
Khushi Surana

Join Date: Jan 2025

Posts: 3
#4

12 Apr 2025, 02:59

Hi FernandoRios, thank you so much for your response. I am so sorry for the late reply. I fixed the problem and could run the command. I have one more problem. I also have state-level controls like GDP and the Housing Price Index, which are the same for every individual in a state at a point in time. So, I face a problem of multicollinearity. (Error in DRDID). But, I need the variables as control.
The VIFs are also under 5. And the collinearity between the macro variables is 0.77

Since GDP is only available at a yearly frequency, I use a GDP*time_trend variable. The Housing price index is available at a monthly frequency. I tried detrending the variables, but it didn't work. What can I do to address this?

Can you provide some insights?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2471
#5

12 Apr 2025, 03:37

You probably cant
if drdid for a single case didn’t work you can try running each model by hand
basically regress y x’s if pre treat , pre control, post treat, post control,
if things dropout here you can’t add them to the main csdid
Comment
Khushi Surana

Join Date: Jan 2025

Posts: 3
#6

14 Apr 2025, 00:45

Hi Fernando,
Thank you so much for your insight. I won't be able to include the variables because there is no within-group variation. Every individual in a state would have the same value of the state-level covariate. However, a TWFE regression allows for that. So, how will I be able to justify dropping out macro covariates in my main specification? I haven't found any leads.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2471
#7

14 Apr 2025, 05:38

Well, TWFE allowing for that is just an illusion or misspecification.
But hard to say as it is a case by case issue
Comment
Awxij Smith

Join Date: Apr 2025

Posts: 1
#8

16 Apr 2025, 22:13

I have a similar question and could not figure out what to do. I'm studying a school-level treatment on students’ test scores. The treatment is at the school-year level and is staggered across schools over a 10-year period. I have individual student longitudinal test score data.
I collapsed the data to the school-year-grade level and ran csdid as a panel. I want to include school, grade, and year fixed effects.

egen id = group(school grade) // create a unique id at the school-grade level
csdid depvar [weight=number of students in school-year-grade cell], ///
ivar(id) time(year) gvar(first_treat) cluster(school) never long2

I'm not sure if I'm handling the data correctly.
(1) When I subset the data by grade (grades 3–8), I keep getting grade-specific coefficients that are larger than the overall treatment effect estimated using all grades stacked together.
(2) I also want to conduct subgroup analyses by race and poverty level. Should I collapse the data to the school-year-grade-poverty-race level? Then, should I create a unique ID at the school-grade-poverty-race level and run the data as a panel?
Any help will be appreciate! Thank you!
Comment

Announcement

CSDID with Individual-Level Data - Treatment is at the Group Level

Comment

Comment

Comment

Comment

Comment

Comment

Comment