Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Slightly different estimators between Correlated Random Effects / Mundlak Regression and Fixed Effects?

    Dear all,

    I have some questions regarding the Correlated Random Effects approach for panel data in comparison to the Fixed Effects Approach.

    I am using a panel of 419 IDs over 15 years. I am investigating the effects of 13 time-variant independent variables (X1...X13) on a continuous outcome variable Y1 and I am further planing on implemeting time-constant variables which is why I am interested in using the Mundlak approach. Therefore i additionally centered all 13 time-variing variables (prefix "m_").

    As far as I know, the regression coefficients for time-variing variables received via Fixed Effects estimation are supposed to be (exactly?) the same as for the CRE/Mundlad approach, and yet my results seem to differ at least slightly, as you can see below:

    Fixed Effects Regression:

    Code:
     xtreg Y1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13, fe vce(cluster ID)
    Code:
    Fixed-effects (within) regression               Number of obs     =      6,285
    Group variable: ID                              Number of groups  =        419
    
    R-sq:                                           Obs per group:
         within  = 0.5562                                         min =         15
         between = 0.0835                                         avg =       15.0
         overall = 0.3207                                         max =         15
    
                                                    F(13,418)         =     180.30
    corr(u_i, Xb)  = -0.3159                        Prob > F          =     0.0000
    
                                       (Std. Err. adjusted for 419 clusters in ID)
    ------------------------------------------------------------------------------
                 |               Robust
              Y1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              X1 |   55.02444   6.379792     8.62   0.000     42.48397    67.56492
              X2 |    13.8213   3.903985     3.54   0.000     6.147415    21.49519
              X3 |   11.46342   3.493078     3.28   0.001     4.597231    18.32961
              X4 |   7.465484   4.077655     1.83   0.068     -.549781    15.48075
              X5 |   636.4464   326.3867     1.95   0.052    -5.117453     1278.01
              X6 |  -25.81329   8.716326    -2.96   0.003    -42.94658   -8.679995
              X7 |   13.20441   31.39781     0.42   0.674    -48.51287    74.92168
              X8 |  -2.024588   .6302402    -3.21   0.001    -3.263423   -.7857531
              X9 |    .004547    .006382     0.71   0.477    -.0079978    .0170917
             X10 |   -.005173   .0058291    -0.89   0.375     -.016631    .0062851
             X11 |  -280.4201   37.11896    -7.55   0.000    -353.3832   -207.4571
             X12 |   80.67299   22.88321     3.53   0.000     35.69249    125.6535
             X13 |   35.65685   13.67972     2.61   0.009      8.76724    62.54647
           _cons |   56.10956   21.37515     2.62   0.009     14.09339    98.12574
    -------------+----------------------------------------------------------------
         sigma_u |  25.115418
         sigma_e |  20.710376
             rho |  .59524574   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------

    Correlated RE/Mundlak:

    Code:
    xtreg Y1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 m_X1 m_X2 m_X3 m_X4 m_X5 m_X6 m_X7 m_X8 m_X9 m_X10 m_X11 m_X12 m_X13, i(ID) vce(cluster ID) re
    Code:
    Random-effects GLS regression                   Number of obs     =      6,285
    Group variable: ID                              Number of groups  =        419
    
    R-sq:                                           Obs per group:
         within  = 0.5562                                         min =         15
         between = 0.3116                                         avg =       15.0
         overall = 0.4664                                         max =         15
    
                                                    Wald chi2(26)     =    2639.61
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                       (Std. Err. adjusted for 419 clusters in ID)
    ------------------------------------------------------------------------------
                 |               Robust
              Y1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              X1 |   55.09628    6.37843     8.64   0.000     42.59479    67.59778
              X2 |   13.86291   3.902933     3.55   0.000     6.213306    21.51252
              X3 |   11.50361   3.498475     3.29   0.001     4.646728     18.3605
              X4 |   7.485479   4.080035     1.83   0.067    -.5112435     15.4822
              X5 |   627.0565   323.8607     1.94   0.053     -7.69889    1261.812
              X6 |  -25.74439   8.706317    -2.96   0.003    -42.80846   -8.680323
              X7 |   14.46523     31.356     0.46   0.645    -46.99141    75.92186
              X8 |  -1.982969   .6405773    -3.10   0.002    -3.238477   -.7274601
              X9 |   .0043312   .0064786     0.67   0.504    -.0083665     .017029
             X10 |  -.0052707   .0057847    -0.91   0.362    -.0166086    .0060671
             X11 |  -280.7377   37.10206    -7.57   0.000    -353.4564    -208.019
             X12 |   81.23751   23.02698     3.53   0.000     36.10546    126.3696
             X13 |   35.68913   13.52256     2.64   0.008     9.185402    62.19286
            m_X1 |   63.07001   17.62765     3.58   0.000     28.52045    97.61956
            m_X2 |  -39.11101   10.73944    -3.64   0.000    -60.15992    -18.0621
            m_X3 |   7.172989   7.381349     0.97   0.331     -7.29419    21.64017
            m_X4 |  -9.116723   4.497211    -2.03   0.043    -17.93109   -.3023521
            m_X5 |   661.9385    433.975     1.53   0.127    -188.6368    1512.514
            m_X6 |  -48.76732   16.34738    -2.98   0.003     -80.8076   -16.72703
            m_X7 |   652.9672   296.4707     2.20   0.028      71.8953    1234.039
            m_X8 |   1.328787   .7064285     1.88   0.060    -.0557874    2.713361
            m_X9 |   .0169634   .0122089     1.39   0.165    -.0069655    .0408923
           m_X10 |   .0033445   .0064843     0.52   0.606    -.0093645    .0160534
           m_X11 |   368.3401   85.69027     4.30   0.000     200.3902    536.2899
           m_X12 |   -78.2951   43.22375    -1.81   0.070    -163.0121    6.421903
           m_X13 |  -43.49377   25.80072    -1.69   0.092    -94.06225    7.074716
           _cons |   74.06801   24.27393     3.05   0.002     26.49198     121.644
    -------------+----------------------------------------------------------------
         sigma_u |  18.355635
         sigma_e |  20.710376
             rho |  .43994227   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    My questions:

    1.) Are those slight differences between the estimation results from FE and those from CRE estimation a cause for concern?
    2.) In the literature about CRE/Mundlak it is said that the correlated random-effects model relaxes the assumption of zero correlation between the entity-specific error and the time variing variables (Schunck (2013):Within and between Estimates in Random-Effects Models: Advantages and Drawbacks of Correlated Random Effects and Hybrid Models). Looking at my results from the CRE estimation, how can I be sure that this is actually the case? So how can I be sure that the variable means actually "absorbed" this correlation?

    I would be really glad if you could help me with this.

    Best regards
    Nikolaus

    (Version: STATA 16)
    Last edited by Nikolaus Schueler; 15 Jul 2024, 08:05.

  • #2
    Your emulation of the CRE model is not quite right. You do not merely add the mean value variables to the model. You also have to mean-center the original variables. After running your original -xtreg, fe- regression you do this:

    Code:
    foreach v of varlist Y1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 {
        summ `v' if e(sample), meanonly
        gen `v'_m = r(mean)
        gen `v'_c = `v' - `v'_m
    }
    xtreg *_c *_m, re i(id) vce(cluster id)
    Notice, and this is important, that in calculating the means and centered variables, you must do the calculations restricted to the estimation sample of the original regression. If you do them on the whole data set, but some of the data set was omitted from the -xtreg, fe- for whatever reason, you will get incorrect variables and incorrect results.

    Actually, you can save yourself some trouble. Instead of doing all this "by hand," download the -xthybrid- command from SSC and use that--it implements all of this for you.
    Last edited by Clyde Schechter; 15 Jul 2024, 09:15.

    Comment


    • #3
      Dear Clyde,
      thank you for your response!

      As far as I understand it, there are two different kinds of models:

      1.) The model you describe is referred to as the hybrid model which indeed uses the demeaned variables in addition to the mean value variables.
      2.) The second one is referred to as the Correlated RE model which uses the original value variables. This one was - among others - described by Prof. Wooldridge in "Wooldridge (2013): Introductory Econometrics, A Modern Approach, pp. 497-499" (I don't know whether I am allowed to post screenshots from the chapter in here, which is why I do it like this). If I use the -xthybrid- command, there is an option called "cre" which actually calculates this exact model.

      Code:
       xthybrid Y1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13, clusterid(ID) cre vce(cluster ID) se t p star full
      Code:
      Model model
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Mixed-effects GLM                               Number of obs     =      6,285
      Family:                Gaussian
      Link:                  identity
      Group variable:              ID                 Number of groups  =        419
      
                                                      Obs per group:
                                                                    min =         15
                                                                    avg =       15.0
                                                                    max =         15
      
      Integration method: mvaghermite                 Integration pts.  =          7
      
                                                      Wald chi2(26)     =    2622.53
      Log pseudolikelihood = -28490.503               Prob > chi2       =     0.0000
                                         (Std. Err. adjusted for 419 clusters in ID)
      ------------------------------------------------------------------------------
                   |               Robust
                Y1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             W__X1 |   55.02444   6.373189     8.63   0.000     42.53322    67.51567
             W__X2 |    13.8213   3.899945     3.54   0.000     6.177553    21.46506
             W__X3 |   11.46342   3.489463     3.29   0.001     4.624197    18.30264
             W__X4 |   7.465484   4.073435     1.83   0.067     -.518302    15.44927
             W__X5 |   636.4464   326.0489     1.95   0.051    -2.597774    1275.491
             W__X6 |  -25.81329   8.707305    -2.96   0.003    -42.87929   -8.747284
             W__X7 |    13.2044   31.36532     0.42   0.674    -48.27049     74.6793
             W__X8 |  -2.024588   .6295879    -3.22   0.001    -3.258558   -.7906185
             W__X9 |    .004547   .0063754     0.71   0.476    -.0079485    .0170425
            W__X10 |   -.005173   .0058231    -0.89   0.374     -.016586    .0062401
            W__X11 |  -280.4201   37.08054    -7.56   0.000    -353.0967   -207.7436
            W__X12 |   80.67299   22.85953     3.53   0.000     35.86915    125.4768
            W__X13 |   35.65686   13.66556     2.61   0.009     8.872846    62.44086
             D__X1 |   60.99144   17.24184     3.54   0.000     27.19806    94.78482
             D__X2 |  -38.48039    10.7337    -3.59   0.000    -59.51805   -17.44273
             D__X3 |    6.23338   7.146347     0.87   0.383    -7.773203    20.23996
             D__X4 |  -8.892025    4.48763    -1.98   0.048    -17.68762   -.0964317
             D__X5 |    568.623   440.7778     1.29   0.197    -295.2857    1432.532
             D__X6 |  -45.20005   16.16263    -2.80   0.005    -76.87822   -13.52187
             D__X7 |   718.6223   326.4671     2.20   0.028      78.7585    1358.486
             D__X8 |   1.474413   .7388249     2.00   0.046     .0263432    2.922484
             D__X9 |   .0083508   .0107495     0.78   0.437    -.0127179    .0294195
            D__X10 |   .0024656   .0063364     0.39   0.697    -.0099536    .0148847
            D__X11 |   382.5769   88.65636     4.32   0.000     208.8136    556.3402
            D__X12 |  -65.78881   40.41732    -1.63   0.104    -145.0053    13.42769
            D__X13 |  -39.10997   25.56868    -1.53   0.126    -89.22366    11.00373
             _cons |   80.27785   22.84029     3.51   0.000     35.51171     125.044
      -------------+----------------------------------------------------------------
      ID           |
         var(_cons)|   332.8481   47.42801                      251.7429    440.0835
      -------------+----------------------------------------------------------------
          var(e.Y1)|   427.9691   47.59812                      344.1462    532.2086
      ------------------------------------------------------------------------------
      Now the results for the time-variing variables are actually the exact same as the ones in the FE, and yet I don't know which mistake I made in my "original", self-developed CRE.

      Furthermore, can you tell me how I can find out whether the addition of mean values variables absorbs the correlation between the regressors and the entity-specific error?

      Thanks and best regards!
      Last edited by Nikolaus Schueler; 15 Jul 2024, 10:34.

      Comment


      • #4
        1.) The model you describe is referred to as the hybrid model which indeed uses the demeaned variables in addition to the mean value variables.
        2.) The second one is referred to as the Correlated RE model which uses the original value variables. This one was - among others - described by Prof. Wooldridge in "Wooldridge (2013): Introductory Econometrics, A Modern Approach, pp. 497-499" (I don't know whether I am allowed to post screenshots from the chapter in here, which is why I do it like this). If I use the -xthybrid- command, there is an option called "cre" which actually calculates this exact model.
        Yes, sorry I confused the two.

        Furthermore, can you tell me how I can find out whether the addition of mean values variables absorbs the correlation between the regressors and the entity-specific error?
        I'm not sure what you mean by "absorbs" the correlation.

        I don't know which mistake I made in my "original", self-developed CRE.
        There are two common mistakes made doing this. You might not have restricted the mean calculations to the estimation sample of the original fixed-effects model. In that situation, the means would be incorrect, though probably fairly close, and could account for the kind of results you found. The other common mistake is the one I made in the code I showed in #3: there I calculated the overall means in the estimation sample, instead of group-specific means. (I should not have used -summarize, and I should have used -by group_id: egen `v'_mean = mean(`v') if e(sample)- Here things will usually go farther awry, but sometimes they come out fairly close if there is relatively little dispersion among the group-specific means..

        Comment


        • #5
          Dear Clyde,

          There are two common mistakes made doing this. You might not have restricted the mean calculations to the estimation sample of the original fixed-effects model. In that situation, the means would be incorrect, though probably fairly close, and could account for the kind of results you found. The other common mistake is the one I made in the code I showed in #3: there I calculated the overall means in the estimation sample, instead of group-specific means. (I should not have used -summarize, and I should have used -by group_id: egen `v'_mean = mean(`v') if e(sample)- Here things will usually go farther awry, but sometimes they come out fairly close if there is relatively little dispersion among the group-specific means..
          you were right, I calculated the mean for a wrong period of time! Now it works out!

          I'm not sure what you mean by "absorbs" the correlation.
          The random effects model assumes that the omitted heterogeneity is uncorrelated with the regressors which is quite unlikely in many circumstances and in my case the Hausman-Tests clearly states that there ist correlation between the regressors and the entity-specific effect. The CRE claims that by adding the mean values of time-variing variables this problem is being taken care of. But is there any way to find out whether - after implementing the means - the results are no longer biased by some remaining correlation?
          Last edited by Nikolaus Schueler; 15 Jul 2024, 13:42.

          Comment


          • #6
            But is there any way to find out whether - after implementing the means - the results are no longer biased by some remaining correlation?
            The use of the CRE model assures that there is no correlation between the residuals or random intercepts and the regressors. But the possibility always remains that there is correlation to unmeasured variables. That possibility is only eliminated in randomized studies or in "experiments of nature" where the exposure variable is compellingly exogenous.

            Comment


            • #7
              Thank you, that helped a lot!

              Comment

              Working...
              X