Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multi-groups or aggregated binary groups in treatment variable for difference-in-difference models?

    Hi Statalisters

    I'm doing a difference-in-difference (DID) analysis and I wonder whether I'm correctly specifying my model. Specifically, my questions are

    1. Should treatment status variable be constructed as a binary variable when multiple units receive treatment?
    2. Should all data be aggregated to treatment status level?

    First, a brief description of the data: I have a region-quarter panel with 6 regions and 24 time units per region which amount to a total of 144 observations. Of the 6 regions, 4 have received treatment and 2 are controls. I aggregated monthly data to quarterly data as my outcome is suicide rate so I had to correct the rate for potential instability induced by a numerator less than 20 ("rule of twenty" in epidemiology).

    I've contructed a binary treatment status variable:
    Code:
    gen intervention = inlist(region, 2, 4, 5, 6)
    label def intervention 1 "Intervention" 0 "Control"
    label val intervention intervention
    I've not aggregated data to the treatment variable level. Instead I've used regional level data. I've seen some argue that all data should be aggregated to treatment variable level, such as here in Wing et al. (2018): "In most cases, it makes sense to aggregate the data so that outcomes are measured at the same level as the treatment variable ... ." However, I've also seen applications that use the same approach as me, i.e. assigning treatment value = (0,1) to multiple units and then run the analyses with several units.

    My main model is:
    Code:
    outcome i.treatment##i.post i.quarter control_variable1 control_variable2, fe
    After reading Angrist & Pischke (2014, part 5.2), I see that this DID model can - and perhaps should - be modelled with a multi-region approach instead of a binary treatment variable, although I'm uncertain about the practical implementation of this. For instance,
    Code:
    outcome i.region##i.post i.quarter control_variable1 control_variable2, fe
    does not seem like a valid approach nor a traditional DID-design to me.

    Hopefully, some of you know how to approach this binary/multi-group problem.

    Best
    Tarjei

    References
    Angrist & Pischke (2014) Mastering 'Metrics.
    Wing et al. (2018) Designing Difference in Difference Studies: Best Practices for Public Health Policy Research. Annual Review of Public Health. 39: 453-69.

  • #2
    Well, both models are correct, but they answer different questions. So it boils down to: what is your research question?

    The i.treatment##i.post model will answer the question "what is the effect of treatment?"

    The i.region##i.post model will answer the question "what are the differences in outcomes among the various regions?" It will be difficult to summarize the answers to those questions into a single estimate of the treatment effect.

    To which question do you seek the answer? Proceed accordingly.

    Comment


    • #3
      Thanks for your helpful comments, Clyde!

      My research question is "what is the effect of treatment?" I realize that the alternative model I proposed in my last post was misleading compared to what Angrist & Piscke (2014) actually suggest. In 5.2., they discuss how to apply DID when several units receive the same treatment starting at different times so that there's no common pre- or post-period. This is different from my design where there is a clear common pre- and post-period. Thus the "traditional" DID approach I use seem to be correct when multiple units receive treatment.

      To be on the safe side regarding aggregation, I aggregated regional variables to treatment status level and compared DID results for the main model between a model with variables on a regional level (6 groups) and a model where variables are on treatment status level (2 groups). The coefficient for the DID interaction term is essentially the same: -.82 [CI 95 -1.48 to -.18] in the former vs -.80 [CI 95 -1.43 to -.16] in the latter.

      Comment

      Working...
      X