Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Equivalence between Mundlak and FE estimator: what with the generalized residuals of the control function?

    I want to model the effect of an endogenous (left-censored) explanatory variable on a continuous outcome variable using an unbalanced panel dataset. For this, I use a control function approach.

    Because of the left-censoring, the selection equation/reduced form equation is estimated with a correlated random effects tobit model (xttobit in Stata, with time averages of the time-varying variables as additional explanatory variables, i.e., Mundlak). From this model, I compute the generalised residuals. I use the formula from Wooldridge (2014) for this.

    I then estimate the outcome equation using pooled OLS, again with time averages of the time-varying variables (Mundlak). To this model I add the generalised residuals from the first step.

    Wooldridge (2015) shows that a pooled OLS estimator with time averages is equivalent to the fixed effects estimator. When I compare my results for the outcome equation between both estimators I do not get the same results. I found that the reason for this is that I do not add the time average of the generalised residuals to my pooled OLS model.

    Consequently, my questions are:
    1. Do I have to add the time average of the generalised residuals to the pooled OLS model?
    2. Can I choose to use FE estimation? In literature, it seems as if pooled OLS or CRE is mostly used for the control function approach.
    Thank you!

    Sources:(I have also posted these questions to Cross Validated, where I have not received a reply yet)

  • #2
    Originally posted by Charlotte Fabri View Post
    Wooldridge (2015) shows that a pooled OLS estimator with time averages is equivalent to the fixed effects estimator. When I compare my results for the outcome equation between both estimators I do not get the same results.
    You probably have an unbalanced panel. This will be true in the case of a balanced panel. The standard errors need to be corrected to reflect the fact that the time averages are generated regressors.

    Code:
    *GRUNFELD DATASET: BALANCED PANEL
    webuse grunfeld, clear
    xtset company time
    
    *FIXED EFFECTS
    xtreg invest mvalue kstock, fe
    
    *GENERATE TIME AVERAGES
    foreach var in mvalue kstock{
        bys company: egen t_avg_`var'= mean(`var')
    }
    
    *POOLED OLS WITH TIME AVERAGES
    regress invest mvalue kstock t_avg_mvalue t_avg_kstock
    Res.:

    Code:
    . 
    . *FIXED EFFECTS
    
    . 
    . xtreg invest mvalue kstock, fe
    
    Fixed-effects (within) regression               Number of obs     =        200
    Group variable: company                         Number of groups  =         10
    
    R-squared:                                      Obs per group:
         Within  = 0.7668                                         min =         20
         Between = 0.8194                                         avg =       20.0
         Overall = 0.8060                                         max =         20
    
                                                    F(2,188)          =     309.01
    corr(u_i, Xb) = -0.1517                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
          invest | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          mvalue |   .1101238   .0118567     9.29   0.000     .0867345    .1335131
          kstock |   .3100653   .0173545    17.87   0.000     .2758308    .3442999
           _cons |  -58.74393   12.45369    -4.72   0.000    -83.31086     -34.177
    -------------+----------------------------------------------------------------
         sigma_u |  85.732501
         sigma_e |  52.767964
             rho |  .72525012   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(9, 188) = 49.18                     Prob > F = 0.0000
    
    . 
    . 
    . 
    . *GENERATE TIME AVERAGES
    
    . 
    . foreach var in mvalue kstock{
      2. 
    .     bys company: egen t_avg_`var'= mean(`var')
      3. 
    . }
    
    . 
    . 
    . 
    . *POOLED OLS WITH TIME AVERAGES
    
    . 
    . regress invest mvalue kstock t_avg_mvalue t_avg_kstock
    
          Source |       SS           df       MS      Number of obs   =       200
    -------------+----------------------------------   F(4, 195)       =    248.41
           Model |  7824402.58         4  1956100.65   Prob > F        =    0.0000
        Residual |  1535541.34       195  7874.57095   R-squared       =    0.8359
    -------------+----------------------------------   Adj R-squared   =    0.8326
           Total |  9359943.92       199  47034.8941   Root MSE        =    88.739
    
    ------------------------------------------------------------------------------
          invest | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          mvalue |   .1101238   .0199392     5.52   0.000     .0707997    .1494479
          kstock |   .3100653   .0291847    10.62   0.000     .2525072    .3676235
    t_avg_mvalue |   .0245223   .0210374     1.17   0.245    -.0169679    .0660124
    t_avg_kstock |  -.2780339   .0532672    -5.22   0.000    -.3830875   -.1729802
           _cons |  -8.527114     11.089    -0.77   0.443    -30.39688    13.34265
    ------------------------------------------------------------------------------
    
    .

    Comment


    • #3
      This also holds for unbalanced panels:

      In the balanced case, it has been known for some time – see Mundlak (1978) – that the FE estimator can be computed as a pooled OLS estimator using the original data and adding the time averages of the covariates as additional explanatory variables. Conveniently, this algebraic result carries over to the unbalanced case.
      The reason why I am not getting the exact same coefficients is because of the 'generalized residuals' variable, not because the two estimators are not equivalent.

      Source:
      Wooldridge, J. M. (2019). Correlated random effects models with unbalanced panels. Journal of Econometrics, 211(1), 137-150. https://doi.org/https://doi.org/10.1...om.2018.12.010

      Comment


      • #4
        Originally posted by Charlotte Fabri View Post
        This also holds for unbalanced panels:
        If you generate time averages using the FE estimation sample, then yes. It is not clear whether this is what you did because you do not show any code or output.

        Comment

        Working...
        X