Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-Differences with Multiple Treatment Effects At Different Time Periods

    I am trying to run a diff in diff in which the exogenous shock are tsunamis. One of the response variables is production in a given country in a given year.

    I could focus on one tsunami at a time, but then I only have one observation in the treatment group. Hence, I choose to use data for many tsunami's in different years for different countries. The control group in each case is also different and based on pre-treatment covariate balance. Several countries make up the control group for each treatment.

    Now, I am not interested in the heterogeneous nature of different tsunamis. Rather, I am interested in the average impact of tsunamis, as if they were all the same treatment - so that we have a downward biased estimate of the impact of a large one. Now, given a tsunami in 1999 in country X, I may choose country Y as a control. But, given another tsunami in 1980 in country Y, country Y is actually the treatment group, and X may well be in the control. I never allow for overlapping shocks in the following sense: if X is treated in 1990, then it cannot be used as a control with respect to a treatment occurring in 1990-93 (a safe time window)

    So, it seems to me that I would have to, for each treatment, include data on treated and control countries production in the dependent variable at the time periods around the treatment dates. The independent variables would be year fixed effects, country fixed effects, a dummy that equals 1 if the country is treated or not (which would be 1 for country X in 1990 but 0 for country X in 1980), a dummy that equals 1 if we are in the post-treatment period and the interaction of these two dummies.

    This has the problem that some tsunamis have much larger magnitude than other, so treatment is quite heterogeneous. I could focus on subsample analysis using high/low intensity to circumvent this criticism. Another alternative would be to include one treatment dummy for each kind of treatment, a dummy that equals 1 for each post-treatment period associated with each treatment and one interaction of the two per treatment. But this would probably reduce degrees of freedom (even though the problem might not be too bad if I have many countries in the control or if I have more than 1 period before and after the shock date) and yield an unparsimonious model.

    My question is whether my approach is problematic for some reason I am missing and whether there are is any recommended literature dealing with this?

    Thanks in advance.



  • #2
    One thing that you have not talked about is how you will deal with the fact that you are creating, in effect, matched pairs (actually, matched tuples since you allow more than one control per case.) So your analysis needs to accommodate the non-independence of observations due to this. What you really have here is a multi-level model. The highest level is the matched tuple (one for each tsunami, so you could index them by the tsunamis themselves). Then within each tsunami-tuple. Since a given country may participate in multiple tsunamis (as case or control) you have a multiple-membership structure. Then within countries you have repeated observations over time. So my first thought would be to use a multi-level model here. To maximize the chances that the error terms are in fact independent of the predictors, I would include covariates very generously.

    Now, since you are looking at economic outcomes, you may be inclined to avoid multi-level models and prefer to shoehorn the data into a panel model. It gets particularly difficult in your case. This procrustean bed more commonly arises with nested data, such as repeated observations within firms within industries, or replications within countries within regions, etc. There, the least-bad approach is to use the middle level (firm or country, or in your case country) for the panel variable and then use a cluster-robust variance estimator clustered at the highest level (firm, country, tsunami_tuple). But you won't have that option here because of the multiple-membership structure. Stata will notice that the same country can belong to different tsunami_tuples, so it won't let you do this. That' not just Stata being finnicky: the cluster robust vce doesn't make sense in this context. So I think that if I were to model this as panel data with country as panel, I would handle this by including i.tsunami_tuple as an additional fixed effect here: with nested data that wouldn't be an option, but with multiple-membership it is! That may not completely correct the non-independence of observations within countries, but it will reduce it because some of that non-independence will be eliminated due to conditioning on the tsunami_tuple variable.

    Comment


    • #3
      Thank you Clyde.

      What if, alternatively, I include all countries and relevant periods in a panel structure and regress y on country fixed effects, time fixed effects and a dummy that equals 1 if a country is treated during certain periods (and controls)? Huber-white se's clustered by country.

      For instance, see equation (8) in https://economics.mit.edu/files/11572 , page 16. In their setting, a state is treated (common law exception) for some t > T), where each T depends on the state. But for t < T, that state effectively behaves as a control. In this setting I am proposing above, the idea is the same but for a < t < b instead of t < T.

      Would this be a better approach in your opinion?

      Comment


      • #4
        The situation in that paper is not analogous to yours. They are using the complete data in the County Business Patterns data set. By contrast, you are selecting controls for each treatment and matching those controls to the treatments on covariance. So you are creating a new non-independence of observations that has no analog in the paper you linked. In their data, all of the non-independence of observations is attributable to firm and time effects. In your case, you have introduced an additional source, matching, that must somehow be accounted for.

        If, as you suggest in your second paragraph, you dispense with the matching and use all countries and relevant periods, then that analytic approach will be appropriate to that data. The drawback to that approach is that it will probably be quite inefficient compared to the matched data design, as you will sacrifice the variance reduction your matching scheme provides.

        There is no ideal solution to this dilemma. The desired properties of the analysis are (in no particular order):

        1. consistent estimation
        2. efficient estimation
        3. proper accounting for dependencies among observations

        There is no analysis that provides all three of these for this problem. You can have any two and must choose to sacrifice one. Statistics, like life, involves choices with trade-offs.

        As for which approach is better, that depends on your research goals, your audience, and probably on aspects of the underlying science that I wouldn't know about (e.g., just how much variance reduction you actually get from matching on those covariates).

        Comment

        Working...
        X