Staggered Difference in Difference*

Akanksha Aggarwal

Join Date: Nov 2020

Posts: 81
#1

Staggered Difference in Difference*

23 Nov 2020, 06:54

Hi

My data has the following setup. I have individual level data for years 1995, 2000, 2005, 2010 and 2015. The treatment happened at state level in different years between 2002 and 2014( For example state 1 in 2003, state 2 in 2011 etc). Some states did not get treated at all. I want to run a Staggered DID and compare differences in outcomes of treated versus non treated states.

Can I run the following regression for the same?

Y_ist= α + βT_st+ ϒ_s+ θ_t+ ε_ist,
i – individual, s – state, t – year
T_st – Whether state s had the treatment by year t

1) Also, will it matter if I run the regression at individual or state level (since all the variation in the RHS is coming at the state level)
2) Since the treatment is given at the state level, the standard errors should be clustered at the state level , right?

Also, in this setup, how do I check for the parallel trends assumption?

Thanks.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

23 Nov 2020, 10:53

Can I run the following regression for the same?

Y_ist= α + βT_st+ ϒ_s+ θ_t+ ε_ist,
i – individual, s – state, t – year
T_st – Whether state s had the treatment by year t

More or less, yes. My reservation is that you are using the same symbol, Y, for a predictor on the right hand side and the outcome on the left. What you are calling Y_s on the right hand side should juts be an indicator for the state.

1) Also, will it matter if I run the regression at individual or state level (since all the variation in the RHS is coming at the state level)

That's not true. The T_st and θ_tterms on the right hand side vary at the observation (individual) level. You must run this at the individual level.

2) Since the treatment is given at the state level, the standard errors should be clustered at the state level , right?

Correct.

Also, in this setup, how do I check for the parallel trends assumption?

Group the observations into those states that were never treated versus those that were. Discard all data from treated individuals at or after the start of treatment and then -collapse- that into mean outcomes for the treated and untreated groups in each year. Now plot the mean outcomes over year in those groups. The graphs should be roughly parallel.
Comment
Akanksha Aggarwal

Join Date: Nov 2020

Posts: 81
#3

23 Nov 2020, 11:30

Thanks for the quick response.

My reservation is that you are using the same symbol, Y, for a predictor on the right hand side and the outcome on the left. What you are calling Y_s on the right hand side should juts be an indicator for the state

Sorry, the right hand side is gamma whereas LHS is letter Y. They look very similar. I should have been more careful.

The T_st and θ_tterms on the right hand side vary at the observation (individual) level. You must run this at the individual level.

I don't understand the above. I get that the T_st and θ_tare varying at the individual level but they take the same value for individuals belonging to the same state and I don't have any other individual level variable on the RHS. So, how would running the regression at the state level change things!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

23 Nov 2020, 11:54

I don't understand the above. I get that the T_st and θ_tare varying at the individual level but they take the same value for individuals belonging to the same state and I don't have any other individual level variable on the RHS. So, how would running the regression at the state level change things!

I think we are thinking differently about the meaning of "run the regression at the X level." Please clarify, with Stata code, what you are thinking of as the two alternatives.
Comment
Akanksha Aggarwal

Join Date: Nov 2020

Posts: 81
#5

23 Nov 2020, 12:11

First alternative (at the individual level)

Y_ist= α + βT_st+ a_s+ θ_t+ ε_ist,
i – individual, s – state, t – year
T_st – Whether state s had the treatment by year t

Second alternative (at the state level)

Y_st= α + βT_st+ a_s+ θ_t+ ε_st,

s – state, t – year
Y_{st is the average of dependent variable for all individuals in state s at time t.}
T_st – Whether state s had the treatment by year t

My question would be whether β would come out to be the same in both empirical specifications.

_{Sorry,I am still at the modelling stage of the analysis. Yet to work on the STATA code.}
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

23 Nov 2020, 14:07

Yes, β would come out the same, but the standard errors, test statistics, and confidence intervals would differ. The standard errors in the state level model will be too low, and your confiidence intervals too narrow: you will have falsely elevated statistical power.
1 like
Comment
Akanksha Aggarwal

Join Date: Nov 2020

Posts: 81
#7

27 Nov 2020, 02:35

Hi. I ran the staggered DID but the results are statistically insignificant.

At this point, I am thinking if I should have controlled for individual level characteristics. My dependent variable is college fees. Basically, tweak the model as follows :

Y_ist= α + βT_st+ sX_i+ a_s+ θ_t+ ε_ist,

i – individual, s – state, t – year
T_st – Whether state s had the treatment by year t
X_i- vector with individual characteristics

I am unsure how to think about controlling for individual level characteristics in a staggered DID setup.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

27 Nov 2020, 10:40

I am unsure how to think about controlling for individual level characteristics in a staggered DID setup.

It is no different than thinking about adjusting for individual level characteristics in any other analysis of observational data. Variables that are associated with both your predictor(s) of interest and the outcome need to be included in your model to avoid omitted variable bias unless they lie on the causal pathway between the predictor(s) and the outcome, or unless they would introduce collider ("Berkson") bias. Sometimes it is also helpful to include additional variables that explain outcome variance but are independent of your predictors so a to give you better precision in your estimates. That said, your sample size will put a limit on how many variables you can include, and you may have to use some judgment in selecting them.

All of that said, remember that you should not be modifying your model in a search for statistically significant findings. That's not science. That's p-hacking. Some would even call it scientific misconduct. In order for your conclusions to be considered hypothesis confirming, the model should have been specified before looking at any results. Then you do the analysis and live with the results, whether they are to your liking or not. If prior knowledge in this area does not permit the adequate definition of a model in advance of any results, then you are doing exploratory data analysis, which is fine provided you do not then treat your results off as conclusions supported by data. And always be particularly cognizant that a finding that is critically dependent on the choice of covariates that were not seen as obviously necessary in the first place is likely to be a Type I error anyway.
1 like
Comment
Akanksha Aggarwal

Join Date: Nov 2020

Posts: 81
#9

27 Nov 2020, 21:21

Thanks so much. Learnt so much from your second paragraph.

Can you please check if my following understanding is right :

In the static staggered DID, we are assuming that the treatment effect is the same over time ( in the sense that if a state has been treated for two years versus a state with five years of treatment, we have specified the same treatment effect for the two states i.e. β in the earlier expressions ). Now if I have to specify another model where the treatment effect varies over the time of exposure to treatment , I will have to use what is called as dynamic staggered DID which can be specified as follows:

Y_ist= α +Ʃ^T_t=1 β_t Years_of_Treatment_t+ a_s+ θ_t+ ε_ist,

i – individual, s – state, t – year
Years_of_Treatment_t – Represents a dummy for t years of treatment
T is the maximum no of years any state has been treated

Can you please check if the above is right ?

Questions

1) How should we think about the parallel trends assumption in the case of dynamic DID? It remains the same as mentioned by you above in the case of static staggered DID, right?
2) Can we say that if none of the β's in the dynamic staggered DID is significant, then the β in the static DID will not be significant too always ?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

28 Nov 2020, 10:29

Yes that model would allow for a separate effect depending on the number of years of treatment. Bear in mind that this model includes a larger number of predictors, so precise estimation will require a larger sample. Another way of thinking of this is that the number of units that received N years of treatment is going to be smaller, perhaps much smaller, than the number of units that ever received any treatment at all. So the estimates of these effects are based on less information and will be less precise than the overall estimate. (They may also be less biased--what we have here is a trade-off between bias and precision.)

Question 1. Yes.

Question 2. No, we cannot say that. As noted in my first paragraph, the betas in the dynamic model are based on less information, so they will have wider standard errors. Consequently, even if those effects are just as large as, or even larger than the beta from the statistic model, they may not be statistically significant even though the latter is. It is a common but serious error to confuse statistical significance of an effect with "real effect" or "important effect." This is one of the serious problems that has led the American Statistical Association to recommend abandonment of the concept of statistical significance. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr.
1 like
Comment
Akanksha Aggarwal

Join Date: Nov 2020

Posts: 81
#11

28 Nov 2020, 11:13

Thanks again for this excellent answer and also for introducing me to this idea of statistical significance of an effect probably not meaning any real effect. Very interesting! Thanks for sharing the articles. Will read all.

They may also be less biased--what we have here is a trade-off between bias and precision

I don't fully follow the above though. Why might the estimates in dynamic model be less biased than estimates in static model ?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

28 Nov 2020, 11:23

So, what I mean is this. Suppose that the real world data generating process is such that the effect does really depend on the duration of treatment. Let's assume, for the sake of discussion, that the real treatment effect increases with treatment duration. Then the static model, which assumes that treatment effect is independent of duration, is incorrect and is providing you with, at best, an estimate of some kind of average treatment effect across time. As such it will systematically overestimate the treatment effect for those entities who received only short durations of treatment, and systematically underestimate the effect for those who received long durations. These are systematic errors which cannot be fixed with a larger sample size: they are bias. By contrast, in the dynamic model, there is much less underestimation or overestimation. There may be some, because the actual treatment effect might vary continuously with duration of treatment and you are approximating that by binning time into 1 year intervals. So there will be a little bit of over and underestimation resulting from that, but it will be considerably less than in the statistic model. So there wlil be less bias in the estimates from the dynamic model.
1 like
Comment
Akanksha Aggarwal

Join Date: Nov 2020

Posts: 81
#13

28 Nov 2020, 11:36

Got it. Thanks so much.
Comment
Akanksha Aggarwal

Join Date: Nov 2020

Posts: 81
#14

29 Nov 2020, 07:29

Hi. I am trying to check if the trends are parallel. The lines of course are not parallel. But can this be checked through a regression too to see if there is a statistically significant differential trend ?
Or do we just look for exactly parallel lines ?

Code:

reg course_fee treatment i.state i.year, cluster(state)

I ran the above for pre treatment years thinking the coefficient on treatment should be insignificant for the parallel trends to go through but the variable year was dropped because of multicollinearity. Where am i going wrong
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#15

29 Nov 2020, 11:12

No, you are not looking for exactly parallel lines; that almost never happens in the real world. Nor is a statistically significant difference in the slopes the criterion either; statistical significance is not a measure of the difference, it is contaminated by sample size issues, and the application of a significance cutoff is just arbitrary anyway. What you want is for the slopes to be close enough for practical purposes. Reading off your graph as best I can, it appears that over the timespan from 1995 to 2007, the difference in the outcome variable between the groups grew from about 800 to about 1200. So the difference in slopes is about 400 units in 12 years, or 100 units every three years. In the real world context of your research, is that meaningful or is that negligible? That's a judgment call that you have to make on the basis of your understanding of the underlying science and the consequences of differences of this magnitude for things that matter in the real world. It is not a statistical issue and there is no statistical test that answers the question. If you are not confident of your judgment in the area, a literature review and a discussion with others in your field might be helpful.

I ran the above for pre treatment years thinking the coefficient on treatment should be insignificant for the parallel trends to go through but the variable year was dropped because of multicollinearity. Where am i going wrong

I don't know what you mean when you say that the variable year was dropped. You do not have a variable year in the model. You have a group of variables, i.year, in the model. You can expect one year to be omitted as the reference category, as would be the case in any regression involving indicator variables. You may also find that one other year is dropped due to colinearity with a pre-post variable. You don't show a variable named pre-post in your regression command, but perhaps that is the variable you are calling treatment. In any case, without knowing how the variables in that model were created, I can't say more. I can't even assess whether that regression is an appropriate way to look at the parallel trends assumption or not.

Finally, since you have used the "s-word" twice in the post, I am provoked to point out that focusing on statistical significance is likely to prove confusing or even misleading. The concept of statistical significance is complicated, poorly taught, and widely misunderstood. The American Statistical Association has recommended that the concept of statistical significance be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr. In interpreting your results, focus on the coefficients themselves, along with the confidence intervals to get a sense of how precise those coefficients are. Model predictions calculated using the -predict- or -margins- commands are also useful for interpreting these models. Statistical significance verdicts are not.
1 like
Comment

Announcement

Staggered Difference in Difference*

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment