coding staggered difference-in-difference (where each treatment time period is different for each unit)

Ryan Kim

Join Date: Aug 2018

Posts: 26
#1

coding staggered difference-in-difference (where each treatment time period is different for each unit)

09 Aug 2018, 19:35

Hello,

I was trying to code a difference-in-difference (I'll call it DD hereafter) where each treatment time period is different for each unit (in my case it's a hotel).
I found that this type of DD is sometimes referred as staggered DD and thought this staggered DD would fit in my case.
This is because, going straight to an example, an exogenous impact started at a different time period for each hotel.
To give you an example of how data look like(I intentionally put a blanked line not to get confused line by line):

hotel month time treated staggered_DD
A 1 1 1 1
A 2 1 1 1
A 3 1 1 1
A 4 1 1 1
A 5 1 1 1

B 1 0 1 0
B 2 0 1 0
B 3 1 1 1
B 4 1 1 1
B 5 1 1 1

C 1 0 1 0
C 2 0 1 0
C 3 0 1 0
C 4 1 1 1
C 5 1 1 1

D 1 0 0 0
D 2 0 0 0
D 3 0 0 0
D 4 0 0 0
D 5 0 0 0

where
time equals 1 when the impact started and after the impact, and 0 when the impact hadn't started.
treated equals 1 when a hotel is in a treatment group and 0 when one is in a control group.
staggered_DD is the interaction term which indicates a DD variable.

For the staggered DD equation, I could think of as below:

y_it=α_i+δ_t+∑r_k*D_ik+ϵ_it

where
y_it is the outcome variable for hotel i in month t.
α_i is the hotel fixed effects.
δ_t is the month fixed effects.
r_k is the coefficient for the variable D_ik.
D_ik is the indicator variable which equals 1 for hotel i get treated by the impact in all the time periods after k>=t

(Thank you for reading the long description above)
My questions are:

1. When I run a staggered DD in this case, do I just include staggered_DD variable when running a regression?
This is because I tried to run the code by including time, treated, and time*treated but Stata dropped the DD variable due to multicollinearity.
Therefore, I thought it would make sense if I create an staggered_DD variable and then run a DD regression by including this only (since this staggered_DD variable already implies the interaction of treatment and post treat).

2. Another question (other than constructing a code) is that should I include or exclude the observations for hotel A when running staggered DD?
The hotel A is an example of a sample that it already got treated before I collected my data.
In my case, there's no way to get a historical data since I'm web scraping the data but some units such as hotel A were already treated before I started to collect the data.
I was worried that if I include samples like hotel A when running staggered DD, my DD coefficient might be overestimated.
Should I just NOT include those cases when running staggered DD?

I tried to search for the similar cases to my case in order to get my issue solved before posting this but I wanted to confirm whether my question 1 was correct or not and couldn't find any similar Q&A for number 2 case. It would be greatly appreciated if anyone could help me out.
Thank you so much!

Last edited by Ryan Kim; 09 Aug 2018, 19:42.
Tags: None

Ryan Kim

Join Date: Aug 2018
Posts: 26

09 Aug 2018, 19:42

hotel	month	time	treated	staggered_DD
A	1	1	1	1
A	2	1	1	1
A	3	1	1	1
A	4	1	1	1
A	5	1	1	1
B	1	0	1	0
B	2	0	1	0
B	3	1	1	1
B	4	1	1	1
B	5	1	1	1
C	1	0	1	0
C	2	0	1	0
C	3	0	1	0
C	4	1	1	1
C	5	1	1	1
D	1	0	0	0
D	2	0	0	0
D	3	0	0	0
D	4	0	0	0
D	5	0	0	0

I just made a table to make comfortable to see.

Announcement

coding staggered difference-in-difference (where each treatment time period is different for each unit)

Comment