Slightly different estimators between Correlated Random Effects / Mundlak Regression and Fixed Effects?

Nikolaus Schueler

Join Date: Oct 2022
Posts: 16

Slightly different estimators between Correlated Random Effects / Mundlak Regression and Fixed Effects?

15 Jul 2024, 07:54

Dear all,

I have some questions regarding the Correlated Random Effects approach for panel data in comparison to the Fixed Effects Approach.

I am using a panel of 419 IDs over 15 years. I am investigating the effects of 13 time-variant independent variables (X1...X13) on a continuous outcome variable Y1 and I am further planing on implemeting time-constant variables which is why I am interested in using the Mundlak approach. Therefore i additionally centered all 13 time-variing variables (prefix "m_").

As far as I know, the regression coefficients for time-variing variables received via Fixed Effects estimation are supposed to be (exactly?) the same as for the CRE/Mundlad approach, and yet my results seem to differ at least slightly, as you can see below:

Fixed Effects Regression:

Code:

 xtreg Y1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13, fe vce(cluster ID)

Code:

Fixed-effects (within) regression               Number of obs     =      6,285
Group variable: ID                              Number of groups  =        419

R-sq:                                           Obs per group:
     within  = 0.5562                                         min =         15
     between = 0.0835                                         avg =       15.0
     overall = 0.3207                                         max =         15

                                                F(13,418)         =     180.30
corr(u_i, Xb)  = -0.3159                        Prob > F          =     0.0000

                                   (Std. Err. adjusted for 419 clusters in ID)
------------------------------------------------------------------------------
             |               Robust
          Y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          X1 |   55.02444   6.379792     8.62   0.000     42.48397    67.56492
          X2 |    13.8213   3.903985     3.54   0.000     6.147415    21.49519
          X3 |   11.46342   3.493078     3.28   0.001     4.597231    18.32961
          X4 |   7.465484   4.077655     1.83   0.068     -.549781    15.48075
          X5 |   636.4464   326.3867     1.95   0.052    -5.117453     1278.01
          X6 |  -25.81329   8.716326    -2.96   0.003    -42.94658   -8.679995
          X7 |   13.20441   31.39781     0.42   0.674    -48.51287    74.92168
          X8 |  -2.024588   .6302402    -3.21   0.001    -3.263423   -.7857531
          X9 |    .004547    .006382     0.71   0.477    -.0079978    .0170917
         X10 |   -.005173   .0058291    -0.89   0.375     -.016631    .0062851
         X11 |  -280.4201   37.11896    -7.55   0.000    -353.3832   -207.4571
         X12 |   80.67299   22.88321     3.53   0.000     35.69249    125.6535
         X13 |   35.65685   13.67972     2.61   0.009      8.76724    62.54647
       _cons |   56.10956   21.37515     2.62   0.009     14.09339    98.12574
-------------+----------------------------------------------------------------
     sigma_u |  25.115418
     sigma_e |  20.710376
         rho |  .59524574   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Correlated RE/Mundlak:

Code:

xtreg Y1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 m_X1 m_X2 m_X3 m_X4 m_X5 m_X6 m_X7 m_X8 m_X9 m_X10 m_X11 m_X12 m_X13, i(ID) vce(cluster ID) re

Code:

Random-effects GLS regression                   Number of obs     =      6,285
Group variable: ID                              Number of groups  =        419

R-sq:                                           Obs per group:
     within  = 0.5562                                         min =         15
     between = 0.3116                                         avg =       15.0
     overall = 0.4664                                         max =         15

                                                Wald chi2(26)     =    2639.61
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                   (Std. Err. adjusted for 419 clusters in ID)
------------------------------------------------------------------------------
             |               Robust
          Y1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          X1 |   55.09628    6.37843     8.64   0.000     42.59479    67.59778
          X2 |   13.86291   3.902933     3.55   0.000     6.213306    21.51252
          X3 |   11.50361   3.498475     3.29   0.001     4.646728     18.3605
          X4 |   7.485479   4.080035     1.83   0.067    -.5112435     15.4822
          X5 |   627.0565   323.8607     1.94   0.053     -7.69889    1261.812
          X6 |  -25.74439   8.706317    -2.96   0.003    -42.80846   -8.680323
          X7 |   14.46523     31.356     0.46   0.645    -46.99141    75.92186
          X8 |  -1.982969   .6405773    -3.10   0.002    -3.238477   -.7274601
          X9 |   .0043312   .0064786     0.67   0.504    -.0083665     .017029
         X10 |  -.0052707   .0057847    -0.91   0.362    -.0166086    .0060671
         X11 |  -280.7377   37.10206    -7.57   0.000    -353.4564    -208.019
         X12 |   81.23751   23.02698     3.53   0.000     36.10546    126.3696
         X13 |   35.68913   13.52256     2.64   0.008     9.185402    62.19286
        m_X1 |   63.07001   17.62765     3.58   0.000     28.52045    97.61956
        m_X2 |  -39.11101   10.73944    -3.64   0.000    -60.15992    -18.0621
        m_X3 |   7.172989   7.381349     0.97   0.331     -7.29419    21.64017
        m_X4 |  -9.116723   4.497211    -2.03   0.043    -17.93109   -.3023521
        m_X5 |   661.9385    433.975     1.53   0.127    -188.6368    1512.514
        m_X6 |  -48.76732   16.34738    -2.98   0.003     -80.8076   -16.72703
        m_X7 |   652.9672   296.4707     2.20   0.028      71.8953    1234.039
        m_X8 |   1.328787   .7064285     1.88   0.060    -.0557874    2.713361
        m_X9 |   .0169634   .0122089     1.39   0.165    -.0069655    .0408923
       m_X10 |   .0033445   .0064843     0.52   0.606    -.0093645    .0160534
       m_X11 |   368.3401   85.69027     4.30   0.000     200.3902    536.2899
       m_X12 |   -78.2951   43.22375    -1.81   0.070    -163.0121    6.421903
       m_X13 |  -43.49377   25.80072    -1.69   0.092    -94.06225    7.074716
       _cons |   74.06801   24.27393     3.05   0.002     26.49198     121.644
-------------+----------------------------------------------------------------
     sigma_u |  18.355635
     sigma_e |  20.710376
         rho |  .43994227   (fraction of variance due to u_i)
------------------------------------------------------------------------------

My questions:

1.) Are those slight differences between the estimation results from FE and those from CRE estimation a cause for concern?
2.) In the literature about CRE/Mundlak it is said that the correlated random-effects model relaxes the assumption of zero correlation between the entity-specific error and the time variing variables (Schunck (2013):Within and between Estimates in Random-Effects Models: Advantages and Drawbacks of Correlated Random Effects and Hybrid Models). Looking at my results from the CRE estimation, how can I be sure that this is actually the case? So how can I be sure that the variable means actually "absorbed" this correlation?

I would be really glad if you could help me with this.

Best regards
Nikolaus

(Version: STATA 16)

Last edited by Nikolaus Schueler; 15 Jul 2024, 08:05.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#2

15 Jul 2024, 09:07

Your emulation of the CRE model is not quite right. You do not merely add the mean value variables to the model. You also have to mean-center the original variables. After running your original -xtreg, fe- regression you do this:

Code:

foreach v of varlist Y1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 { summ `v' if e(sample), meanonly gen `v'_m = r(mean) gen `v'_c = `v' - `v'_m } xtreg *_c *_m, re i(id) vce(cluster id)

Notice, and this is important, that in calculating the means and centered variables, you must do the calculations restricted to the estimation sample of the original regression. If you do them on the whole data set, but some of the data set was omitted from the -xtreg, fe- for whatever reason, you will get incorrect variables and incorrect results.

Actually, you can save yourself some trouble. Instead of doing all this "by hand," download the -xthybrid- command from SSC and use that--it implements all of this for you.

Last edited by Clyde Schechter; 15 Jul 2024, 09:15.
Comment

Nikolaus Schueler

Join Date: Oct 2022
Posts: 16

15 Jul 2024, 10:18

Dear Clyde,
thank you for your response!

As far as I understand it, there are two different kinds of models:

1.) The model you describe is referred to as the hybrid model which indeed uses the demeaned variables in addition to the mean value variables.
2.) The second one is referred to as the Correlated RE model which uses the original value variables. This one was - among others - described by Prof. Wooldridge in "Wooldridge (2013): Introductory Econometrics, A Modern Approach, pp. 497-499" (I don't know whether I am allowed to post screenshots from the chapter in here, which is why I do it like this). If I use the -xthybrid- command, there is an option called "cre" which actually calculates this exact model.

Code:

 xthybrid Y1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13, clusterid(ID) cre vce(cluster ID) se t p star full

Code:

Model model
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

Mixed-effects GLM                               Number of obs     =      6,285
Family:                Gaussian
Link:                  identity
Group variable:              ID                 Number of groups  =        419

                                                Obs per group:
                                                              min =         15
                                                              avg =       15.0
                                                              max =         15

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(26)     =    2622.53
Log pseudolikelihood = -28490.503               Prob > chi2       =     0.0000
                                   (Std. Err. adjusted for 419 clusters in ID)
------------------------------------------------------------------------------
             |               Robust
          Y1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       W__X1 |   55.02444   6.373189     8.63   0.000     42.53322    67.51567
       W__X2 |    13.8213   3.899945     3.54   0.000     6.177553    21.46506
       W__X3 |   11.46342   3.489463     3.29   0.001     4.624197    18.30264
       W__X4 |   7.465484   4.073435     1.83   0.067     -.518302    15.44927
       W__X5 |   636.4464   326.0489     1.95   0.051    -2.597774    1275.491
       W__X6 |  -25.81329   8.707305    -2.96   0.003    -42.87929   -8.747284
       W__X7 |    13.2044   31.36532     0.42   0.674    -48.27049     74.6793
       W__X8 |  -2.024588   .6295879    -3.22   0.001    -3.258558   -.7906185
       W__X9 |    .004547   .0063754     0.71   0.476    -.0079485    .0170425
      W__X10 |   -.005173   .0058231    -0.89   0.374     -.016586    .0062401
      W__X11 |  -280.4201   37.08054    -7.56   0.000    -353.0967   -207.7436
      W__X12 |   80.67299   22.85953     3.53   0.000     35.86915    125.4768
      W__X13 |   35.65686   13.66556     2.61   0.009     8.872846    62.44086
       D__X1 |   60.99144   17.24184     3.54   0.000     27.19806    94.78482
       D__X2 |  -38.48039    10.7337    -3.59   0.000    -59.51805   -17.44273
       D__X3 |    6.23338   7.146347     0.87   0.383    -7.773203    20.23996
       D__X4 |  -8.892025    4.48763    -1.98   0.048    -17.68762   -.0964317
       D__X5 |    568.623   440.7778     1.29   0.197    -295.2857    1432.532
       D__X6 |  -45.20005   16.16263    -2.80   0.005    -76.87822   -13.52187
       D__X7 |   718.6223   326.4671     2.20   0.028      78.7585    1358.486
       D__X8 |   1.474413   .7388249     2.00   0.046     .0263432    2.922484
       D__X9 |   .0083508   .0107495     0.78   0.437    -.0127179    .0294195
      D__X10 |   .0024656   .0063364     0.39   0.697    -.0099536    .0148847
      D__X11 |   382.5769   88.65636     4.32   0.000     208.8136    556.3402
      D__X12 |  -65.78881   40.41732    -1.63   0.104    -145.0053    13.42769
      D__X13 |  -39.10997   25.56868    -1.53   0.126    -89.22366    11.00373
       _cons |   80.27785   22.84029     3.51   0.000     35.51171     125.044
-------------+----------------------------------------------------------------
ID           |
   var(_cons)|   332.8481   47.42801                      251.7429    440.0835
-------------+----------------------------------------------------------------
    var(e.Y1)|   427.9691   47.59812                      344.1462    532.2086
------------------------------------------------------------------------------

Now the results for the time-variing variables are actually the exact same as the ones in the FE, and yet I don't know which mistake I made in my "original", self-developed CRE.

Furthermore, can you tell me how I can find out whether the addition of mean values variables absorbs the correlation between the regressors and the entity-specific error?

Thanks and best regards!

Last edited by Nikolaus Schueler; 15 Jul 2024, 10:34.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#4

15 Jul 2024, 11:41

1.) The model you describe is referred to as the hybrid model which indeed uses the demeaned variables in addition to the mean value variables.
2.) The second one is referred to as the Correlated RE model which uses the original value variables. This one was - among others - described by Prof. Wooldridge in "Wooldridge (2013): Introductory Econometrics, A Modern Approach, pp. 497-499" (I don't know whether I am allowed to post screenshots from the chapter in here, which is why I do it like this). If I use the -xthybrid- command, there is an option called "cre" which actually calculates this exact model.

Yes, sorry I confused the two.

Furthermore, can you tell me how I can find out whether the addition of mean values variables absorbs the correlation between the regressors and the entity-specific error?

I'm not sure what you mean by "absorbs" the correlation.

I don't know which mistake I made in my "original", self-developed CRE.

There are two common mistakes made doing this. You might not have restricted the mean calculations to the estimation sample of the original fixed-effects model. In that situation, the means would be incorrect, though probably fairly close, and could account for the kind of results you found. The other common mistake is the one I made in the code I showed in #3: there I calculated the overall means in the estimation sample, instead of group-specific means. (I should not have used -summarize, and I should have used -by group_id: egen `v'_mean = mean(`v') if e(sample)- Here things will usually go farther awry, but sometimes they come out fairly close if there is relatively little dispersion among the group-specific means..
Comment
Nikolaus Schueler

Join Date: Oct 2022

Posts: 16
#5

15 Jul 2024, 13:17

Dear Clyde,

There are two common mistakes made doing this. You might not have restricted the mean calculations to the estimation sample of the original fixed-effects model. In that situation, the means would be incorrect, though probably fairly close, and could account for the kind of results you found. The other common mistake is the one I made in the code I showed in #3: there I calculated the overall means in the estimation sample, instead of group-specific means. (I should not have used -summarize, and I should have used -by group_id: egen `v'_mean = mean(`v') if e(sample)- Here things will usually go farther awry, but sometimes they come out fairly close if there is relatively little dispersion among the group-specific means..

you were right, I calculated the mean for a wrong period of time! Now it works out!

I'm not sure what you mean by "absorbs" the correlation.

The random effects model assumes that the omitted heterogeneity is uncorrelated with the regressors which is quite unlikely in many circumstances and in my case the Hausman-Tests clearly states that there ist correlation between the regressors and the entity-specific effect. The CRE claims that by adding the mean values of time-variing variables this problem is being taken care of. But is there any way to find out whether - after implementing the means - the results are no longer biased by some remaining correlation?

Last edited by Nikolaus Schueler; 15 Jul 2024, 13:42.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#6

15 Jul 2024, 15:14

But is there any way to find out whether - after implementing the means - the results are no longer biased by some remaining correlation?

The use of the CRE model assures that there is no correlation between the residuals or random intercepts and the regressors. But the possibility always remains that there is correlation to unmeasured variables. That possibility is only eliminated in randomized studies or in "experiments of nature" where the exposure variable is compellingly exogenous.
Comment
Nikolaus Schueler

Join Date: Oct 2022

Posts: 16
#7

15 Jul 2024, 23:25

Thank you, that helped a lot!
Comment

Announcement

Slightly different estimators between Correlated Random Effects / Mundlak Regression and Fixed Effects?

Comment

Comment

Comment

Comment

Comment

Comment