Staggered Difference-in-Difference and Parallel Trends

Shanya Jain

Join Date: Jul 2023

Posts: 6
#1

Staggered Difference-in-Difference and Parallel Trends

06 Jul 2023, 09:30

Hi,

My data has the following setup. I have birth level data for two survey rounds- 2015-16 and 2019-2021. The treatment happened at the district level in three phases (for example, the Total districts are 600, 300 districts get treatment in Phase1 from Dec2017 to Apr2018, 200 districts get treatment in Phase2 from May 2018 to Dec2018 and 100 districts get treatment in Phase3 from Jan2019 to Dec2019). By Jan2020, all districts got treatment. I want to run a Staggered DID and compare differences in outcomes of children in pre and post-treatment periods.

Can I run the following regression for the same?

Y__idt= α + βTreat__dt+ Z_s+ X__idt + θ_t+ ε_ist,
i – child, d – district, t – birth year
Treat__dt – switches from 0 to 1 if district d had the treatment by time t

Can I use csdid in this case?

Also, in this setup, how do I check for the parallel trends assumption? I am confused about the stata codes in this case.

Please help as I am new to this staggered DID literature.

Thanks in advance.
Tags: difference in difference, stata code
George Ford

Join Date: Aug 2014

Posts: 3177
#2

06 Jul 2023, 09:53

It looks like the treatments mostly occurred during the window between 16-19, so staggered DID is probably not useful. Your data is basically broken down into pre- and post-treatment (with the exception of the last group, which you could exclude). You could do a generalized DID where you have time since treatment, which would be easier to do than staggered DID (treat dummy, timesincetreat*treat). You might include the 2020 treated, since 2/3 of years are treated.
Comment
Shanya Jain

Join Date: Jul 2023

Posts: 6
#3

11 Jul 2023, 08:23

Thank you so much for your response, and sorry for the late reply. I want to confirm if we can check for parallel trends in this case. If yes, then how can we do that?
If not, then how can I control for any pre-trends?
Comment
George Ford

Join Date: Aug 2014

Posts: 3177
#4

11 Jul 2023, 08:47

The data looks like this, I think. Three groups G1, G2, G3, and 5 periods.

1 | 2 | 3 | 4 | 5
Observe Y | treat_G1 treat_G2 treat_G3 | Observe Y
| DON'T OBSERVE Y |

All your obs are treated by the end. There is no untreated group. You have diff-in-diff across treatment timing only. No obvious approach to PP analysis.

Interesting problem.
Comment

Shanya Jain

Join Date: Jul 2023
Posts: 6

11 Jul 2023, 11:47

Sorry if I wasn't clear earlier. I should have given the details properly.
My data looks like this-

Unit ID (i)	District category (d)	Year(t)	Treat
001	1	2015	0
002	1	2016	0
003	1	2017	1
004	2	2015	0
005	2	2018	1
006	2	2016	0
007	3	2017	0
008	3	2019	1
009	3	2018	0
010	3	2020	1
011	1	2019	1
012	1	2018	1
013	2	2021	1
014	2	2019	1
015	3	2016	0
016	3	2015	0

Here unit ID is the child's id. I created a district category which is the category assigned to a district according to the phase(Phases I mentioned earlier) in which that district got treatment, and the year variable is the year of birth, and Treat is a dummy variable created which takes the value 1 if district d got treated by time t, and 0 otherwise.

I understood your point that I have diff-in-diff across treatment timing only as there is no untreated group and I can use generalised DID in this case but would that analysis be relevant in my case? I want to check if the child's outcomes are better after the treatment compared to before the treatment in early-treated districts versus late-treated districts.

Comment

George Ford

Join Date: Aug 2014

Posts: 3177
#6

11 Jul 2023, 12:47

It's not a panel in UnitID. you basically have a panel of 3 districts and 8 years. everyone in a district gets the treatment at a specific year. is there any other way to group the data (county, state, etc...)?
Comment
Shanya Jain

Join Date: Jul 2023

Posts: 6
#7

12 Jul 2023, 04:49

Yes, it's not a panel in UnitID. I have pooled the data by appending the two survey rounds of DHS. I have 640 districts and 7 years in my dataset, and the total number of observations is around 700000. I mentioned the 3 district categories according to the 3 phases in which treatment was given to a district.
Comment
George Ford

Join Date: Aug 2014

Posts: 3177
#8

12 Jul 2023, 13:11

try collapsing by to the 640 districts. Then you have a panel.
Comment
Shanya Jain

Join Date: Jul 2023

Posts: 6
#9

15 Jul 2023, 07:52

Thank you for your quick responses.
If I do collapse, then my analysis would be at the district level whereas I want to do individual level analysis. My outcome variable is at the individual level. I want to know how can we use csdid (Callaway Sant Anna's estimator) in repeated cross-sections.
Comment
Shanya Jain

Join Date: Jul 2023

Posts: 6
#10

15 Jul 2023, 10:07

Thank you for your quick responses.
If I do collapse, then my analysis would be at the district level whereas I want to do individual level analysis. My outcome variable is at the individual level. I want to know how can we use csdid (Callaway Sant Anna's estimator) in repeated cross-sections.
Comment

Announcement