Why is FE vs RE even an issue?

Mahtab Karimi

Join Date: Jul 2023

Posts: 25
#1

Why is FE vs RE even an issue?

05 Jul 2023, 21:08

Hi all,

I am really confused with fixed effect vs random effect estimators. Are not they explaining two different things?
Why don't people report their FE estimator and then interpret it as an average change for a given unit and use the RE to explain the overall change?
Why do people even use the Hausman test? why do we compare the results of these two estimators together while they are completely different?

Thank you,
Mahtab

Last edited by Mahtab Karimi; 05 Jul 2023, 21:13.
Tags: fixed effects, panel data, random effec
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17713
#2

05 Jul 2023, 22:29

Mahtab:
it's a matter of consistency and efficiency.
The -fe- estimator is always consistent but inefficient if -re- is the way to go.
The -re- estimator is inconsistent if -fe- is the way to go.
In addition, the -fe- estimator focuses on within-panel variation, whereas the -re- estimator focuses on between-panel variation. Therefore, their aims are actually different, but it is the way they fit your data that matters.

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
Mahtab Karimi

Join Date: Jul 2023

Posts: 25
#3

05 Jul 2023, 23:07

Thank you so much, Carlo!

So which one is more preferred for academic society? I mean between consistency and efficiency.
And is it normal to report both fe and re?

Best,
Mahtab
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17713
#4

05 Jul 2023, 23:53

Mahtab:
the decision is not trivial, as highlighted in Hsiao C. Analysis of Panel Data. Third Edition. New York: Cambridge University Press, 2014:47-48. The same source wisely warns the readers about the expected difference in the coefficients of the two specifications.
The preference is driven by the estimator that fits your data better than the alternative specification.
Usually, the -fe- estimator is the first one to test, followed by the -re- one.
The two regressions are then compared via -hausman- (that has its own drawbacks, though; for instance, it does not support non-default standard errors).
Sometimes, you may want to go Mundlak and test whether the main assumption of the -re- estimator (that is, the panel-wise effect is uncorrelated with the vector of regressors) is satisfied or not.
Both consistency and efficiency are relevant (the former for the point estimates, the latter for the standard errors and related stuff).
Eventually, from time to time you may find papers reporting, say, -fe- in the baseline analysis and -re- as a sort of sensitivity analysis. Set aside the papers that host both the specifications for teaching purposes, this methodological choice may be difficult to justify when it comes to real-world research.

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment

Mahtab Karimi

Join Date: Jul 2023
Posts: 25

06 Jul 2023, 01:16

Thank you so much, Carlo!
I attach here what I actually have and would appreciate any comments.

I have an unbalanced panel data of a group of firms over 7 years and aim to explain a measure of annual expenditure as a function of firms' features. Here is what I get:

Code:

.             xtreg log_EXP index log_S_insiders log_wage_emp log_Q  lver_N log_assets i.industry, be

Between regression (regression on group means)  Number of obs     =      1,337
Group variable: cik                             Number of groups  =        215

R-squared:                                      Obs per group:
     Within  = 0.0612                                         min =          2
     Between = 0.5739                                         avg =        6.2
     Overall = 0.5036                                         max =          7

                                                F(44,170)         =       5.20
sd(u_i + avg(e_i.)) = 1.05243                   Prob > F          =     0.0000

--------------------------------------------------------------------------------
       log_EXP | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
         index |   .2308241    .136832     1.69   0.093    -.0392845    .5009326
log_S_insiders |   .0615516   .0726994     0.85   0.398    -.0819583    .2050615
  log_wage_emp |   1.133431    .366029     3.10   0.002     .4108832    1.855978
         log_Q |   .9710553   .2026017     4.79   0.000     .5711162    1.370994
        lver_N |  -.4491829   .4566272    -0.98   0.327    -1.350573    .4522069
    log_assets |   .2211069   .0797908     2.77   0.006     .0635985    .3786154
               |
      industry |
           -------------------------------------------------------

               |
         _cons |  -8.008713    4.20781    -1.90   0.059      -16.315    .2975735
--------------------------------------------------------------------------------




   xtreg log_EXP index log_S_insiders log_wage_emp log_Q  lver_N log_assets i.DataYearFiscal  , fe vce(cluster cik)

Fixed-effects (within) regression               Number of obs     =      1,337
Group variable: cik                             Number of groups  =        215

R-squared:                                      Obs per group:
     Within  = 0.1604                                         min =          2
     Between = 0.2169                                         avg =        6.2
     Overall = 0.1958                                         max =          7

                                                F(12,214)         =      11.08
corr(u_i, Xb) = 0.2910                          Prob > F          =     0.0000

                                    (Std. err. adjusted for 215 clusters in cik)
--------------------------------------------------------------------------------
               |               Robust
       log_EXP | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
         index |  -.0069452   .0086144    -0.81   0.421    -.0239252    .0100348
log_S_insiders |  -.0172232   .0091188    -1.89   0.060    -.0351974     .000751
  log_wage_emp |   .4382606   .1847722     2.37   0.019     .0740541     .802467
         log_Q |   .2021079   .0474071     4.26   0.000     .1086632    .2955526
        lver_N |  -.2445439   .1333582    -1.83   0.068    -.5074076    .0183199
    log_assets |  -.0232125   .0616467    -0.38   0.707     -.144725       .0983
               |
DataYearFiscal |
      -------------------------------------------------------------
               |
         _cons |   2.424077   1.873559     1.29   0.197    -1.268917    6.117071
---------------+----------------------------------------------------------------
       sigma_u |  1.3461253
       sigma_e |  .27281735
           rho |  .96054599   (fraction of variance due to u_i)
--------------------------------------------------------------------------------



   xtreg log_EXP index log_S_insiders log_wage_emp log_Q  lver_N log_assets i.DataYearFiscal  , re vce(cluster cik)

Random-effects GLS regression                   Number of obs     =      1,337
Group variable: cik                             Number of groups  =        215

R-squared:                                      Obs per group:
     Within  = 0.1576                                         min =          2
     Between = 0.2273                                         avg =        6.2
     Overall = 0.2118                                         max =          7

                                                Wald chi2(12)     =     146.93
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                                    (Std. err. adjusted for 215 clusters in cik)
--------------------------------------------------------------------------------
               |               Robust
       log_EXP | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
---------------+----------------------------------------------------------------
         index |   -.002204   .0085652    -0.26   0.797    -.0189915    .0145836
log_S_insiders |  -.0153687   .0091181    -1.69   0.092    -.0332398    .0025025
  log_wage_emp |   .6626753   .1777452     3.73   0.000      .314301     1.01105
         log_Q |   .2452634   .0462054     5.31   0.000     .1547026    .3358242
        lver_N |  -.2655674   .1302892    -2.04   0.042    -.5209295   -.0102053
    log_assets |  -.0118288   .0492383    -0.24   0.810    -.1083341    .0846765
               |
DataYearFiscal |
      
               |
         _cons |   -.235188   1.883227    -0.12   0.901    -3.926246     3.45587
---------------+----------------------------------------------------------------
       sigma_u |   1.184759
       sigma_e |  .27281735
           rho |  .94964471   (fraction of variance due to u_i)
--------------------------------------------------------------------------------

.                                 xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

        log_EXP[cik,t] = Xb + u[cik] + e[cik,t]

        Estimated results:
                         |       Var     SD = sqrt(Var)
                ---------+-----------------------------
                 log_EXP |   2.053936       1.433156
                       e |   .0744293       .2728174
                       u |   1.403654       1.184759

        Test: Var(u) = 0
                             chibar2(01) =  2821.09
                          Prob > chibar2 =   0.0000


     xtreg log_EXP index log_S_insiders log_wage_emp log_Q  lver_N log_assets i.DataYearFiscal i.industry , re vce(cluster cik
> )

Random-effects GLS regression                   Number of obs     =      1,337
Group variable: cik                             Number of groups  =        215

R-squared:                                      Obs per group:
     Within  = 0.1587                                         min =          2
     Between = 0.5022                                         avg =        6.2
     Overall = 0.4660                                         max =          7

                                                Wald chi2(39)     =          .
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =          .

                                    (Std. err. adjusted for 215 clusters in cik)
--------------------------------------------------------------------------------
               |               Robust
       log_EXP | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
---------------+----------------------------------------------------------------
         index |  -.0041962   .0085993    -0.49   0.626    -.0210505    .0126582
log_S_insiders |  -.0164125    .009219    -1.78   0.075    -.0344813    .0016564
  log_wage_emp |   .5394175   .1758741     3.07   0.002     .1947107    .8841243
         log_Q |   .2352516   .0474967     4.95   0.000     .1421598    .3283435
        lver_N |  -.2755942   .1315859    -2.09   0.036    -.5334979   -.0176905
    log_assets |   .0176353   .0520406     0.34   0.735    -.0843625     .119633
               |
DataYearFiscal |
       ---------------------------------------------
               |
      industry |
        ---------------------------------------------
               |
         _cons |   1.797011   1.813508     0.99   0.322    -1.757398    5.351421
---------------+----------------------------------------------------------------
       sigma_u |  1.0429924
       sigma_e |  .27281735
           rho |  .93596171   (fraction of variance due to u_i)
--------------------------------------------------------------------------------

.                                 xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

        log_EXP[cik,t] = Xb + u[cik] + e[cik,t]

        Estimated results:
                         |       Var     SD = sqrt(Var)
                ---------+-----------------------------
                 log_EXP |   2.053936       1.433156
                       e |   .0744293       .2728174
                       u |   1.087833       1.042992

        Test: Var(u) = 0
                             chibar2(01) =  2849.50
                          Prob > chibar2 =   0.0000

And finally, I also did a Mundlak test for my -re estimators, and it suggests an FE estimator. If I go for the fe, I can't show the effect of the industry dummy on my dependent variable; which could be interesting and important.
Should I try to find a better fit for the FE?

(dummy variables are not reported here)

Sorry it got too long. Thank you,
Mahtab

Last edited by Mahtab Karimi; 06 Jul 2023, 01:24.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17713
#6

06 Jul 2023, 01:27

Mahtab:
1) I'd side-track the -be- estimator, as it has a very limited practical usage;
2) you may want to test if -re- is the way to go with the community-contributed module -xtoverid-;
3) if -fe- is the way to go and -i.industry- is a time-invariant variable, there's nothing you can do but live without the -i.industry- coefficient. That said, as the -fe- estimator focuses on within-panel variation, by definition a within.panel constant (that is, a time-invariant predictor) has no role in contributing to the variation of he regressand.

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
Mahtab Karimi

Join Date: Jul 2023

Posts: 25
#7

06 Jul 2023, 08:58

Thank you so much.
Comment
Mahtab Karimi

Join Date: Jul 2023

Posts: 25
#8

20 Jul 2023, 12:25

I have a follow-up question on this post and would appreciate folks' help.

What happens in terms of efficiency and consistency if I use a simple OLS with an indicator variable for each firm instead of a fe model? Aside from the super high R^2, what else will change?

Thanks,
Mahtab
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#9

21 Jul 2023, 00:36

What stops you from trying?

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17713
#10

21 Jul 2023, 00:45

Mahtab:
the difference in computational time between -regress- with -panelvar- categorical variable vs -xtreg,fe- (I did that comparison using "https://www.stata-press.com/data/r17/nlswork.dta" some years ago and I had to stop -regress. from running after many hours without any results) is the first issue that springs to my mind.
In addition, -xtreg,fe- with default standard errors provides detail on -sigma_e- and -sigma_u- along with -rho-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mahtab Karimi

Join Date: Jul 2023

Posts: 25
#11

21 Jul 2023, 13:17

Thank you, Maarten and Carlo,

@Maaarten, I tried, and the coefficients are almost the same. I was wondering if I am missing something theoretically.
@Carlo, I do not have a computational problem, I was worried if the estimated R squared s are reliable. I guess they are not. Is reporting -rho- valuable?

Also, after using a FE model, if someone asks me "Are you controlling for the fixed effect of the panel?" I should say yes, right? but I can't provide any estimation of it.

Thanks,
Mahtab

Last edited by Mahtab Karimi; 21 Jul 2023, 13:20.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#12

21 Jul 2023, 16:53

My view is that, in the case with many units and small T, there is one model: the unobserved effects model, which I write as y(i,t) = x(i,t)*b + c(i) + u(i,t). The unobserved variable c(i) does not change over time. If we maintain exogeneity of the x(i,t) with respect to the u(i,s) in all time periods s, the choice between RE and FE essentially comes down to whether c(i) is correlated with (some elements of) x(i,t). If there is correlation, FE is preferred, as the within transformation removes c(i). For this reason, FE is typically preferred for estimating causal effects. However, it's possible that, with good controls, what is left in c(i) can be uncorrelated with x(i,t). If the FE estimates are imprecise, that's when one usually tries to see if RE is sufficient. That's where the Hausman test comes in (preferrably the robust version of it). The default really should be fixed effects, as it's most convincing for estimating causal effects (unless you know the treatment is effectively randomized). Sometimes FE and RE are the same. But if they aren't, and you opt for RE, you'd better have a good story, and statistics to back it up.

You should get the same estimates when you put in the unit-specific dummies and use OLS as when using xtreg, fe. The standard errors won't be exactly the same (probably). It's the same estimator, but conventions differ about computing standard errors (though they should be close, especially if you vce(cluster id)).
1 like
Comment
Mahtab Karimi

Join Date: Jul 2023

Posts: 25
#13

21 Jul 2023, 20:05

Thank you so much for the very helpful explanation.
Let's say I use OLS with dummies instead of xtreg,fe , What should I report for R^2?

Thanks,
Mahtab
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17713
#14

22 Jul 2023, 00:54

Mahtab:
1) if you go -xtreg,fe-, you should report the within Rsq;
2) if you go pooled OLS (I would not go this way if there's evidence of a panel-wise effect) there onlt one Rsq.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mahtab Karimi

Join Date: Jul 2023

Posts: 25
#15

22 Jul 2023, 08:47

Thanks every one.
Comment

Announcement