Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why is FE vs RE even an issue?

    Hi all,

    I am really confused with fixed effect vs random effect estimators. Are not they explaining two different things?
    Why don't people report their FE estimator and then interpret it as an average change for a given unit and use the RE to explain the overall change?
    Why do people even use the Hausman test? why do we compare the results of these two estimators together while they are completely different?

    Thank you,
    Mahtab
    Last edited by Mahtab Karimi; 05 Jul 2023, 21:13.

  • #2
    Mahtab:
    it's a matter of consistency and efficiency.
    The -fe- estimator is always consistent but inefficient if -re- is the way to go.
    The -re- estimator is inconsistent if -fe- is the way to go.
    In addition, the -fe- estimator focuses on within-panel variation, whereas the -re- estimator focuses on between-panel variation. Therefore, their aims are actually different, but it is the way they fit your data that matters.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thank you so much, Carlo!

      So which one is more preferred for academic society? I mean between consistency and efficiency.
      And is it normal to report both fe and re?

      Best,
      Mahtab

      Comment


      • #4
        Mahtab:
        the decision is not trivial, as highlighted in Hsiao C. Analysis of Panel Data. Third Edition. New York: Cambridge University Press, 2014:47-48. The same source wisely warns the readers about the expected difference in the coefficients of the two specifications.
        The preference is driven by the estimator that fits your data better than the alternative specification.
        Usually, the -fe- estimator is the first one to test, followed by the -re- one.
        The two regressions are then compared via -hausman- (that has its own drawbacks, though; for instance, it does not support non-default standard errors).
        Sometimes, you may want to go Mundlak and test whether the main assumption of the -re- estimator (that is, the panel-wise effect is uncorrelated with the vector of regressors) is satisfied or not.
        Both consistency and efficiency are relevant (the former for the point estimates, the latter for the standard errors and related stuff).
        Eventually, from time to time you may find papers reporting, say, -fe- in the baseline analysis and -re- as a sort of sensitivity analysis. Set aside the papers that host both the specifications for teaching purposes, this methodological choice may be difficult to justify when it comes to real-world research.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Thank you so much, Carlo!
          I attach here what I actually have and would appreciate any comments.

          I have an unbalanced panel data of a group of firms over 7 years and aim to explain a measure of annual expenditure as a function of firms' features. Here is what I get:
          Code:
          .             xtreg log_EXP index log_S_insiders log_wage_emp log_Q  lver_N log_assets i.industry, be
          
          Between regression (regression on group means)  Number of obs     =      1,337
          Group variable: cik                             Number of groups  =        215
          
          R-squared:                                      Obs per group:
               Within  = 0.0612                                         min =          2
               Between = 0.5739                                         avg =        6.2
               Overall = 0.5036                                         max =          7
          
                                                          F(44,170)         =       5.20
          sd(u_i + avg(e_i.)) = 1.05243                   Prob > F          =     0.0000
          
          --------------------------------------------------------------------------------
                 log_EXP | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          ---------------+----------------------------------------------------------------
                   index |   .2308241    .136832     1.69   0.093    -.0392845    .5009326
          log_S_insiders |   .0615516   .0726994     0.85   0.398    -.0819583    .2050615
            log_wage_emp |   1.133431    .366029     3.10   0.002     .4108832    1.855978
                   log_Q |   .9710553   .2026017     4.79   0.000     .5711162    1.370994
                  lver_N |  -.4491829   .4566272    -0.98   0.327    -1.350573    .4522069
              log_assets |   .2211069   .0797908     2.77   0.006     .0635985    .3786154
                         |
                industry |
                     -------------------------------------------------------
          
                         |
                   _cons |  -8.008713    4.20781    -1.90   0.059      -16.315    .2975735
          --------------------------------------------------------------------------------
          
          
          
          
             xtreg log_EXP index log_S_insiders log_wage_emp log_Q  lver_N log_assets i.DataYearFiscal  , fe vce(cluster cik)
          
          Fixed-effects (within) regression               Number of obs     =      1,337
          Group variable: cik                             Number of groups  =        215
          
          R-squared:                                      Obs per group:
               Within  = 0.1604                                         min =          2
               Between = 0.2169                                         avg =        6.2
               Overall = 0.1958                                         max =          7
          
                                                          F(12,214)         =      11.08
          corr(u_i, Xb) = 0.2910                          Prob > F          =     0.0000
          
                                              (Std. err. adjusted for 215 clusters in cik)
          --------------------------------------------------------------------------------
                         |               Robust
                 log_EXP | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          ---------------+----------------------------------------------------------------
                   index |  -.0069452   .0086144    -0.81   0.421    -.0239252    .0100348
          log_S_insiders |  -.0172232   .0091188    -1.89   0.060    -.0351974     .000751
            log_wage_emp |   .4382606   .1847722     2.37   0.019     .0740541     .802467
                   log_Q |   .2021079   .0474071     4.26   0.000     .1086632    .2955526
                  lver_N |  -.2445439   .1333582    -1.83   0.068    -.5074076    .0183199
              log_assets |  -.0232125   .0616467    -0.38   0.707     -.144725       .0983
                         |
          DataYearFiscal |
                -------------------------------------------------------------
                         |
                   _cons |   2.424077   1.873559     1.29   0.197    -1.268917    6.117071
          ---------------+----------------------------------------------------------------
                 sigma_u |  1.3461253
                 sigma_e |  .27281735
                     rho |  .96054599   (fraction of variance due to u_i)
          --------------------------------------------------------------------------------
          
          
          
             xtreg log_EXP index log_S_insiders log_wage_emp log_Q  lver_N log_assets i.DataYearFiscal  , re vce(cluster cik)
          
          Random-effects GLS regression                   Number of obs     =      1,337
          Group variable: cik                             Number of groups  =        215
          
          R-squared:                                      Obs per group:
               Within  = 0.1576                                         min =          2
               Between = 0.2273                                         avg =        6.2
               Overall = 0.2118                                         max =          7
          
                                                          Wald chi2(12)     =     146.93
          corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
          
                                              (Std. err. adjusted for 215 clusters in cik)
          --------------------------------------------------------------------------------
                         |               Robust
                 log_EXP | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
          ---------------+----------------------------------------------------------------
                   index |   -.002204   .0085652    -0.26   0.797    -.0189915    .0145836
          log_S_insiders |  -.0153687   .0091181    -1.69   0.092    -.0332398    .0025025
            log_wage_emp |   .6626753   .1777452     3.73   0.000      .314301     1.01105
                   log_Q |   .2452634   .0462054     5.31   0.000     .1547026    .3358242
                  lver_N |  -.2655674   .1302892    -2.04   0.042    -.5209295   -.0102053
              log_assets |  -.0118288   .0492383    -0.24   0.810    -.1083341    .0846765
                         |
          DataYearFiscal |
                
                         |
                   _cons |   -.235188   1.883227    -0.12   0.901    -3.926246     3.45587
          ---------------+----------------------------------------------------------------
                 sigma_u |   1.184759
                 sigma_e |  .27281735
                     rho |  .94964471   (fraction of variance due to u_i)
          --------------------------------------------------------------------------------
          
          .                                 xttest0
          
          Breusch and Pagan Lagrangian multiplier test for random effects
          
                  log_EXP[cik,t] = Xb + u[cik] + e[cik,t]
          
                  Estimated results:
                                   |       Var     SD = sqrt(Var)
                          ---------+-----------------------------
                           log_EXP |   2.053936       1.433156
                                 e |   .0744293       .2728174
                                 u |   1.403654       1.184759
          
                  Test: Var(u) = 0
                                       chibar2(01) =  2821.09
                                    Prob > chibar2 =   0.0000
          
          
               xtreg log_EXP index log_S_insiders log_wage_emp log_Q  lver_N log_assets i.DataYearFiscal i.industry , re vce(cluster cik
          > )
          
          Random-effects GLS regression                   Number of obs     =      1,337
          Group variable: cik                             Number of groups  =        215
          
          R-squared:                                      Obs per group:
               Within  = 0.1587                                         min =          2
               Between = 0.5022                                         avg =        6.2
               Overall = 0.4660                                         max =          7
          
                                                          Wald chi2(39)     =          .
          corr(u_i, X) = 0 (assumed)                      Prob > chi2       =          .
          
                                              (Std. err. adjusted for 215 clusters in cik)
          --------------------------------------------------------------------------------
                         |               Robust
                 log_EXP | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
          ---------------+----------------------------------------------------------------
                   index |  -.0041962   .0085993    -0.49   0.626    -.0210505    .0126582
          log_S_insiders |  -.0164125    .009219    -1.78   0.075    -.0344813    .0016564
            log_wage_emp |   .5394175   .1758741     3.07   0.002     .1947107    .8841243
                   log_Q |   .2352516   .0474967     4.95   0.000     .1421598    .3283435
                  lver_N |  -.2755942   .1315859    -2.09   0.036    -.5334979   -.0176905
              log_assets |   .0176353   .0520406     0.34   0.735    -.0843625     .119633
                         |
          DataYearFiscal |
                 ---------------------------------------------
                         |
                industry |
                  ---------------------------------------------
                         |
                   _cons |   1.797011   1.813508     0.99   0.322    -1.757398    5.351421
          ---------------+----------------------------------------------------------------
                 sigma_u |  1.0429924
                 sigma_e |  .27281735
                     rho |  .93596171   (fraction of variance due to u_i)
          --------------------------------------------------------------------------------
          
          .                                 xttest0
          
          Breusch and Pagan Lagrangian multiplier test for random effects
          
                  log_EXP[cik,t] = Xb + u[cik] + e[cik,t]
          
                  Estimated results:
                                   |       Var     SD = sqrt(Var)
                          ---------+-----------------------------
                           log_EXP |   2.053936       1.433156
                                 e |   .0744293       .2728174
                                 u |   1.087833       1.042992
          
                  Test: Var(u) = 0
                                       chibar2(01) =  2849.50
                                    Prob > chibar2 =   0.0000


          And finally, I also did a Mundlak test for my -re estimators, and it suggests an FE estimator. If I go for the fe, I can't show the effect of the industry dummy on my dependent variable; which could be interesting and important.
          Should I try to find a better fit for the FE?

          (dummy variables are not reported here)

          Sorry it got too long. Thank you,
          Mahtab
          Last edited by Mahtab Karimi; 06 Jul 2023, 01:24.

          Comment


          • #6
            Mahtab:
            1) I'd side-track the -be- estimator, as it has a very limited practical usage;
            2) you may want to test if -re- is the way to go with the community-contributed module -xtoverid-;
            3) if -fe- is the way to go and -i.industry- is a time-invariant variable, there's nothing you can do but live without the -i.industry- coefficient. That said, as the -fe- estimator focuses on within-panel variation, by definition a within.panel constant (that is, a time-invariant predictor) has no role in contributing to the variation of he regressand.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Thank you so much.

              Comment


              • #8
                I have a follow-up question on this post and would appreciate folks' help.

                What happens in terms of efficiency and consistency if I use a simple OLS with an indicator variable for each firm instead of a fe model? Aside from the super high R^2, what else will change?

                Thanks,
                Mahtab

                Comment


                • #9
                  What stops you from trying?
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment


                  • #10
                    Mahtab:
                    the difference in computational time between -regress- with -panelvar- categorical variable vs -xtreg,fe- (I did that comparison using "https://www.stata-press.com/data/r17/nlswork.dta" some years ago and I had to stop -regress. from running after many hours without any results) is the first issue that springs to my mind.
                    In addition, -xtreg,fe- with default standard errors provides detail on -sigma_e- and -sigma_u- along with -rho-.
                    Kind regards,
                    Carlo
                    (StataNow 18.5)

                    Comment


                    • #11
                      Thank you, Maarten and Carlo,

                      @Maaarten, I tried, and the coefficients are almost the same. I was wondering if I am missing something theoretically.
                      @Carlo, I do not have a computational problem, I was worried if the estimated R squared s are reliable. I guess they are not. Is reporting -rho- valuable?

                      Also, after using a FE model, if someone asks me "Are you controlling for the fixed effect of the panel?" I should say yes, right? but I can't provide any estimation of it.

                      Thanks,
                      Mahtab
                      Last edited by Mahtab Karimi; 21 Jul 2023, 13:20.

                      Comment


                      • #12
                        My view is that, in the case with many units and small T, there is one model: the unobserved effects model, which I write as y(i,t) = x(i,t)*b + c(i) + u(i,t). The unobserved variable c(i) does not change over time. If we maintain exogeneity of the x(i,t) with respect to the u(i,s) in all time periods s, the choice between RE and FE essentially comes down to whether c(i) is correlated with (some elements of) x(i,t). If there is correlation, FE is preferred, as the within transformation removes c(i). For this reason, FE is typically preferred for estimating causal effects. However, it's possible that, with good controls, what is left in c(i) can be uncorrelated with x(i,t). If the FE estimates are imprecise, that's when one usually tries to see if RE is sufficient. That's where the Hausman test comes in (preferrably the robust version of it). The default really should be fixed effects, as it's most convincing for estimating causal effects (unless you know the treatment is effectively randomized). Sometimes FE and RE are the same. But if they aren't, and you opt for RE, you'd better have a good story, and statistics to back it up.

                        You should get the same estimates when you put in the unit-specific dummies and use OLS as when using xtreg, fe. The standard errors won't be exactly the same (probably). It's the same estimator, but conventions differ about computing standard errors (though they should be close, especially if you vce(cluster id)).

                        Comment


                        • #13
                          Thank you so much for the very helpful explanation.
                          Let's say I use OLS with dummies instead of xtreg,fe , What should I report for R^2?

                          Thanks,
                          Mahtab

                          Comment


                          • #14
                            Mahtab:
                            1) if you go -xtreg,fe-, you should report the within Rsq;
                            2) if you go pooled OLS (I would not go this way if there's evidence of a panel-wise effect) there onlt one Rsq.
                            Kind regards,
                            Carlo
                            (StataNow 18.5)

                            Comment


                            • #15
                              Thanks every one.

                              Comment

                              Working...
                              X