Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to include Fixed Effects in a Diff-in-diff specification?

    I'm doing a difference-in-difference report on whether the EU Industrial Emissions Directive (implemented in 2011) had an effect on exports, with UK exports as my treatment group and Australia as my control group. I have panel data from 1990-2018. This is my equation:


    Exports = β0 + β1y2011 + β2country + β3y2011*Country + control variables+ ε

    where y2011 = 1 for observations from 2011 onwards, 0 otherwise. Country=1 for UK, 0 otherwise.

    I attempted to add fixed effects to the regression but encountered several issues.
    1) When I added (,fe) to the end of my xtreg command, it knocked out my 'country' dummy variable due to collinearity.
    2) When I added i.y2011 and i.country to include time and country fixed variables respectively via dummy variables instead of using the (,fe) within estimator, I found that the coefficients I got (apart from β0) were exactly the same as the regression without the fixed effects.

    I am unsure as to whether I would need a different equation to take the following fixed effects into account. I saw online that to add fixed effects, the equation has to change into a different form but I am not sure how to change my current equation into one that takes fixed effects into account and how I will be able to do this on STATA. Any help with this is greatly appreciated.

  • #2
    Well, you don't show the actual code you used, so I'm going to base my responses on my best guess as to what you did.

    I assume that you had -xtset- or -tsset- your data with country as the panel variable.

    I'll imagine that your original regression looked like
    Code:
    xtreg exports i.y2011##i.country /* and "control" variables*/, fe
    OR PERHAPS
    xtreg exports i.y2011 i.y2011#i.country /* and "control" variables*/, fe
    If that's what you ran, bear in mind that the -fe- automatically incorporates a representation of the country indicators. They are not literally included in the calculations, but they are accomplished by within-country de-meaning. There is no output shown for them, but their effects on the regression are there. This is how -xtreg, fe- works in Stata.

    If you then added i.country to that command, Stata would recognize that you are attempting to introduce the country variables twice: once explicitly with i.country, and the second time implicitly with -fe-. But you can't have the same variable appear twice in any regression command: that's the collinearity, that Stata complained about, and resolved by dropping the country indicators you tried to insert. And, of course, the results will be identical to what you get from the code I showed above, except for the constant term (which is meaningless anyway).

    Anyway, if you did the -xtset- as I described and used -xtreg, fe-, the country effects are properly accounted for and nothing more is needed.

    Anyway, if you want reassurance as to whether you ran the regression correctly, you have to show what you actually ran: the code, not an equation that you think the code represents.

    Comment


    • #3
      Thanks Clyde for your response. Here is the code I used:
      Code:
      xtset c_id Year
      
      regress totalexportsA y2011 Country y2011Country gdppc natresrent PartnershipAg2008, vce(robust)
      
      xtreg totalexportsA y2011 Country y2011Country gdppc natresrent PartnershipAg2008,fe vce(robust)
      
      xtreg totalexportsA i.y2011 i.Country y2011Country gdppc rdspend natresrent PartnershipAg2008, vce(robust)
      After reading your comments I then did the following:
      Code:
      regress totalexportsA y2011 Country y2011Country gdppc natresrent PartnershipAg2008, vce(robust)
      
      xtreg totalexportsA y2011 y2011Country gdppc natresrent PartnershipAg2008,fe vce(robust)
      
      xtreg totalexportsA i.y2011 y2011Country gdppc natresrent PartnershipAg2008, fe vce(robust)
      After doing this, again the coefficients were the same as the original regression without any fixed effects, but then a lot of my p values then became insignificant. I think this should be the outcome I was looking for but would I be expecting to get the same coefficients as the regression without fixed effects? And if the p values are now insignificant after fixed effects have been implemented, what does this generally mean?

      Comment


      • #4
        OK, I think I understand what you have better now. You don't explain c_id, but I assume that it is some variable that identifies countries (as opposed to the variable named Country which is a simple dichotomy distinguishing UK (1) from all other countries (0). I was quite confused by your original presentation: you referred to country-level fixed effects, so I assumed you meant fixed effects based on the variable Country--which is not what you meant or did.

        I also assume that the variable y2011Country is a homebrew variable calculated as the product of y2011 and Country (a bad practice: use factor variable notation instead).

        The simplest and, in my view, best way to do this regression and include c_id level fixed effects is:

        Code:
        xtreg totalexportsA i.y2011##i.Country gdppc natresrent PartnershipAg2008, fe vce(robust)
        When you run this, Stata will tell you that i.Country is colinear with the fixed effects and is omitted from the analysis: and that is correct. The variable Country is constant over time within any c_id: it's 1 for the c_id that denotes the UK, and 0 for all the others. So it is collinear and must be dropped. No harm done by that. The i.y2011#i.Country interaction part is not colinear and remains: and of course this is the principle outcome of interest in your DID analysis.

        Because you have panel data, the use of -regress- rather than -xtreg- is inappropriate, unless the -xtreg- output indicates that sigma_u and rho are both zero or extremely close to zero. If that is the case, it indicates that there is little or no c_id level variation in your beyond whatever might be accounted for by gdppc through PaternshipAg2008, Y2011 and Country. (That sometimes happens in real life, but it's unusual, so if this is what you really have, then it is more likely that your data are messed up.) Also, if sigma_u and rho really are zero, or extremely close, then it also means that the results from -xtreg- and -reg- will be essentially the same.

        If you want further advice about interpreting your results, be sure to show them when you post back.

        Comment


        • #5
          Many thanks Clyde, if I want to also include time fixed effects would I put i.y2011 instead of y2011 (where y2011=1 for observations including and after 2011, 0 otherwise)?

          Also if I want to do a basic DID regression without any fixed effects at all, would it then be appropriate to use the normal reg command, or should I avoid it completely? I want to be able to compare results with and without the use of fixed effects.
          Last edited by Mathew Chandy; 06 Apr 2020, 15:24.

          Comment


          • #6
            Because y2011 is a dichotomous variable, in the regression itself, and when it is not part of an interaction term, it makes no difference whether you specify y2011 or i.y2011: you will get identical results. If you want time fixed effects, you need a variable that designates the actual year of each observation. If that variable is called year, then you would put i.year (and the i.prefix is mandatory in this case) into the model. When you do that, expect the y2011 variable to be omitted as it will be colinear with the i.year variables.

            I want to be able to compare results with and without the use of fixed effects.
            This makes no sense to me. Unless you get fixed effects results with sigma_u and rho both zero or very close to that (in which case the regression results will be the same either way), only the -xtreg, fe- version is right because you have panel data. If you use a regression that ignore panel effects in panel data you just get incorrect results. Comparing right and wrong makes no sense to me.

            If you have seen DID analyses done using -regress- and not -xtreg, fe-, then there are two possibilities. One is that it was just done incorrectly. The other, I think more likely, explanation would be that the data is not panel data: the people (or firms, or countries, or whatever they are) in the pre-intervention data are different from the ones in the post-intervention data, so there is no panel structure.

            Comment


            • #7
              Many thanks for your response Clyde. Would a Hausman test be required for any kind of DiD analysis to determine whether fixed effects can be used?

              Comment


              • #8
                There are many for whom the choice between -fe- and -re- is based entirely on the Hausman test. I am not among them. In my view, in the typical DID setting, we are interested in knowing the change in the outcome within units associated with the intervention. Since it is a within-unit question, it calls for a within-unit answer, and that's what -fe- gives you. -re- gives you an estimate that is a mixture of within- and between- effects, and in this context the between-effects are not germane. So for these purposes I tend to stick with -fe-. What using the Hausman test does for you is allow you to use the -re- analysis when the two analyses produce essentially the same results. The Hausman test, after all, is nothing more or less than a test that the coefficient estimates obtained from -fe- and -re- are the same. You may wonder, then, why anybody ever uses an -re- model for DID or cares about the Hausman test. The answer is that when the two models produce (essentially) the same coefficient estimates, the -re- estimates are more efficient, that is, have smaller standard errors. And that's a good thing. So why don't I rely on the Hausman test? It's because I don't believe that a significance test of equality of the coefficients is the right way to decide whether they are equal enough for the purpose.

                I am, in general, a skeptic of significanc testing and a big fan of the American Statistical Association's recommendation that they no longer be used. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and
                https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr. Even to the limited extent that I can see a role for the use of significance tests, I think they answer the wrong question here. I think what is relevant is not whether the -fe- and -re- coefficients differ by a "statistically significant" amount but whether they differ by a practically significant amount.

                All of that said, of course, the Hausman test is widely used and widely endorsed by many. It is especially popular with economists and econometricians, and if that is your target audience (as I imagine given the names of your study variables) then you may want to follow the general practice in that discipline.

                Comment


                • #9
                  Hi Clyde

                  As mentioned in your previous post, to add time fixed effects I should add i.year to the regression. Would I have to remove y2011 from the regression like shown here?
                  Code:
                   
                   xtreg totalexportsA i.year##i.Country gdppc natresrent PartnershipAg2008, fe vce(robust)
                  Or do I include it into my regression like so:

                  Code:
                   
                   xtreg totalexportsA i.y2011##i.Country i.year gdppc natresrent PartnershipAg2008, fe vce(robust)

                  Comment


                  • #10
                    What you are showing in #9 are two different models. In the first one , you are introducing an interaction of all of the years with all of the countries. The distinction between the pre- and post- intervention years is completely missing from that model. It has no attempt to do a DID estimation of your intervention effect.

                    The second model retains the DID estimation of intervention effect and adds yearly shocks to the modeling. Now, y2011 is going to be colinear with the i.year indicators. So something will be dropped. I believe with the particular syntax you show, y2011 will be dropped. But 1.y2011#i.Country will still be there: and that is the key thing you need from the model. So this one is the way to go.

                    Comment

                    Working...
                    X