Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do 'control units' that never receive treatment affect estimates when using time and individual fixed effects?

    I'm looking at the effect of fires on visits to national parks and national forests. My dataset consists of ~500 geographical units over 10 years, and I'm using an FE Poisson model with year fixed effects too. Different units in the dataset have fires in different years, so I have the option of using either pre fire periods from places that eventually have fires as my control units or that + places that never have fires as my control units. I'm leaning towards the former because it seems plausible that places that do and don't have fires are different in ways I don't understand and can't control for; a cursory look at the data confirms that that is indeed the case.

    The estimates are quite different when I do and don't include the places that never have fires as controls. I'm trying to understand why/how places that never have fires affect the results at all given that I have unit and time fixed effects. My understanding is that these fixed effects mean that the result is computed by comparing the change in visits because of fire within a given unit to the change in visits because of fire in all the other units. How then would the visit levels from places that don't have fires even matter?

    Thank you!


  • #2
    Well, if you are using external controls, you do have to try to select them in ways that minimize their differences from the fire-experiencing units (or, at least, minimize their differences from the fire-experiencing units during their pre-fire epochs.) Actually, one popular way of identifying causal effects is to use both external controls and to contrast the pre- and post-fire experience within the units having fire. This approach is commonly called difference-in-differences (DID). In the simple, classic case, all of the fire experiencing units have their fires at the same date, so that there are two time periods for the data set: pre-fire and post-fire. You would then get longitudinal data on both fire-experiencing and non-fire-experiencing units covering both the pre- and post-fire time periods. The outcome is then regressed in a model with an interaction (##) between the exposure status (fire-experiencing or not) and the time period (pre- or post-fire).

    Your data would not lend itself directly to that because the fires occurred at different times in different places, so that there is no cutoff defining pre vs post fire that can be applied to the non-fire-experiencing units' data. But there is a similar approach, generalized difference-in-differences. In this instance we would have a single variable that is 1 in the fire-experiencing units' observations that come after the fire, and 0 in all other observations (including all non-fire-experiencing units' observations). This variable is used as the predictor in a regression model that incorporates fixed effects for both time and unit. This is the generalized DID estimator of the causal effect of fires.

    With both classical and generalized DID estimation, it is important to verify that in the pre-fire observations, the outcome trends in the fire-experiencing and non-fire-experiencing units are similar. This is referred to as the parallel trends assumption, and if it is not met, it is not really appropriate to interpret the results as a causal effect estimator.

    Your situation is more complicated still because you have degrees of severity of fire to account for, but the method generalizes fairly directly to this situation as well.

    You might find https://www.annualreviews.org/doi/pd...-040617-013507 interesting and helpful.

    Comment


    • #3
      Dear Clyde Schechter ,

      Thank you very much! I am indeed using generalized DID to estimate my coefficients of interest, and designating 1 for fire experiencing observations that come after fire and 0 otherwise.

      I think I'm still a little confused about how no fire observations would affect the estimates. Individual level fixed effects would mean that only the within variation is being used to compute the change in visits because of the fire for a given unit, and then the same is being for all the other units --- and then the software is producing a coefficient estimate by comparing all these changes. For a unit then that doesn't even have a fire, how would even be compared to the rest of the units given that with fixed effects we don't care about differences in mean level of visits in different polygons?

      Thank you!

      Comment


      • #4
        Suppose you did your DID analysis without any no-fire units. Of course, you couldn't do a DID analysis that way because there would be only one difference. It would be the within-unit difference in outcome between post-fire and pre-fire. But that doesn't actually tell you anything about the effect of the fire, because it is entirely possible that some change would have occurred even in the absence of fire, and you don't know how much. What the control observations do is provide a calculation of how much within-unit change occurs in the absence of fire, and that much (which, evidently, is not caused by fire) is subtracted out from the change observed in the units that did experience fires to leave an amount of change that is attributable to fire. In the classical DID analysis, the amount of change in the control units shows up as the coefficient of the pre-post indicator variable. In the generalized DID analysis the amount of change in the control units shows up distributed over the time fixed effects.

        Comment


        • #5
          Dear Clyde Schechter,

          Thank you so much! That's very helpful, and resolves my confusion. I really appreciate it!

          Regards,
          Mansi

          Comment

          Working...
          X