Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data: Fixed Effects vs. Difference in Difference

    Hello everyone,

    I am investigating the effects of an exogenous economic shock on the unsecured credit market. I have a panel data set consisting of 6 states and monthly data over a period of five years resulting in 360 observations.

    So far I have used the Difference in Difference approach (see regression below). However, in the Forum I have seen similar panel data regressions utilizing Fixed Effects and thus the xtreg command. I do not to a full extent understand how these two approaches differ. Considering that I have accounted for state-specific and year specific effects in the regression, is there any differences in the two methodologies that are not captured and should be tested? Further on, is the reg command correct for panel data regressions?

    I have been using the following regression in my analysis;

    Code:
     reg marketsize treatment time-variable interaction_term
    Dstate2 Dstate3 Dstate4 Dstate5 Dstate6 
    Dyear2 Dyear3 Dyear4 Dyear5, robust
    Marketsize: Size of the unsecured credit market in USD.
    Treatment: States affected by the shock
    Control Group: States not affected by the shock
    Time-variable: Post-shock
    Interaction term: Treatment x time variable
    Dstate(number): State dummies. Captures the state-specific features.
    Dyear(number): Year dummies. Captures the year-specific features.

    If there is a difference in the approaches: What are the pros and cons of using a fixed effect model instead of a Difference in Difference model?

    Any help would be greatly appreciated.

  • #2
    Your (pseudo-)code emulates fixed-effects regression by including indicator variables for states and years ("dummies") in an ordinary regression. If, instead, you had a single variable encoding state and another encoding year, you could run a fixed-effects regression:

    Code:
    xtset state year
    xtreg marketsize i.treatment##i.time i.year, fe
    and you would get the same results you are getting now, displayed somewhat differently, and without the estimates of state-level effects. The models are completely equivalent and you would reach identical conclusions from them. The -xtreg, fe- approach is quicker to code, and if done as shown you have the advantage of being able to use the -margins- command to simplify and help with interpretation. But at the end of the day, everything comes out the same. It's just a matter of convenience.

    Now, you have referred to your model as a difference-in-differences model, but it is not, strictly speaking that. The classical DID model would not include the year indicators. In fact, when you run your model as shown, you have colinearity between treatment and the state indicators, and also between time-variable and the year indicators. So you will find that Stata either drops the treatment variable or drops one of the state indicators, and either drops the time-variable or one of the year indicators. So you may well be left with just the interaction term and the state and year indicators. This model is known as the "generalized difference-in-differences" model. It has the advantage of being able to handle situations where treatment does not start in the same year in all the treated states, or where treatment is intermittent. The classical DID model shown in this post above is only valid in the simpler case where all states begin treatment simultaneously and the treatment (or at least its effects) remains in effect for all time thereafter. The classical DID model, however, is easier to interpret, because it lumps all the pre-shock time periods into a single variable.

    Comment


    • #3
      Thank you for your informative answer Mr. Schechter.

      As the output from my model and your specified fixed effects model would be identical, I assume that the underlying assumptions for both models are identical, and thus that I should be indifferent in choosing between them?

      The output of the regression is indeed impacted by collinearity, dropping one state- and one year-dummy. Thus, the question appears, is the current model correctly specified?

      The main variable of interest is, of course, the interaction term, and much of the state-specific effects would probably be captured by the treatment variable. However, making a simplified DID model (treatment, time-variable, and interaction term) the adjusted explanatory power of the regression is considerably below the current model (40% against 90%). The dependent variable also has a positive trend in all states, but if I have understood the DID approach correctly this trend is accounted for in the model given that the assumptions for DID hold. The shock is exogenous and impacts the treatment group at the same time. However, the shock impacts the treatment group with different strength in the time-period. Hitting the treatment group hard at first, then fading away. For that reason, I have tested the dependent variable for different time periods before and after the shock.

      Thanks again for your advice.
      Last edited by Ole Petter Aas; 10 Nov 2018, 05:31.

      Comment


      • #4
        As the output from my model and your specified fixed effects model would be identical, I assume that the underlying assumptions for both models are identical, and thus that I should be indifferent in choosing between them?
        Yes. To the extent that you would not be indifferent between them it is a matter of ease of coding, transparency of coding, and such matters. From a purely statistical perspective, you would be completely indifferent in choosing betwen them.

        The output of the regression is indeed impacted by collinearity, dropping one state- and one year-dummy. Thus, the question appears, is the current model correctly specified?
        Yes, it is correctly specified. Or, more exactly, the collinearity does not constitute mis-specification. Collinearity is essentially harmless because, Stata (or any other statistical package I know of) will simply drop something to break it, and the remaining model remains equivalent to the original model plus some identifying constraint(s). The only issue you might have is that you might prefer, for ease of understanding or aesthetic considerations, to drop different things from the ones Stata chooses. In that case, you can re-run the model with a different specification where you intentionally omit the things you want dropped and leave only a non-collinear set of variables. But this in no way affects the predictions or fit of the model, nor any substantive conclusions you would draw.

        However, making a simplified DID model (treatment, time-variable, and interaction term) the adjusted explanatory power of the regression is considerably below the current model (40% against 90%).
        Yes, with more predictors, you explain some of the noise in the outcome and get a cleaner estimate. The drawback of the bigger model, however, is that it becomes very complicated to estimate things like the expected outcomes in each group before and after the policy shift or intervention. Those are now broken down into many small pieces and re-assembling them becomes complicated. But it may well be that you don't really care about those things: and in that case the larger model has no drawbacks (provided you have enough observations that you are not overfitting.)

        Comment


        • #5
          Thank you for taking the time for a thorough answer Mr. Schechter.
          This was truly helpful and answered all the unclarities.

          Comment

          Working...
          X