Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -xtoverid- or Variable Addition Test (VAT)

    Dear community,

    after having found heteroscedasticity and autocorrelation in my panel data, I account for it with -xtreg ..., fe/re vce (cluster Company_ID)-. Because of this, I am unable to execute the conventional -hausman- command.

    My regression is as follows: ROA = c.Var1_c##i.Industry Var2 Var3 Var4 Var5 i.Year, with Var1 being compensation to the CEO, Var2-5 control variables and Industry being a dummy variable (1 to 10 for different industries). I winsorized at (5 95). Var1_c is the mean-centered version of Var1, which itself is the squareroot of compensation to the CEO. This was done to account for multicollinearity arising from the interaction term.

    I have come across two ways to determine whether to use RE or FE: 1) the -xtoverid- command and 2) Variable Addition Test (VAT) as discussed by J. Wooldridge.

    1) -xtoverid-

    Code:
    . gen Interaction = Var1_c*Industry
    
    . xi: xtreg ROA c.Var1_c i.Industry Interaction Var2 Var3 Var4 Var5 i.Year, re vce(cluster Company_ID)
    
    . xtoverid
    
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  robust cluster(Company_ID)
    Sargan-Hansen statistic  26.143  Chi-sq(11)   P-value = 0.0062
    This result indicates, that I should use FE.

    2) Variable Addition Test

    Code:
    . egen Var1bar = mean(Var1_c), by(Company_ID)
    
    . xtreg ROA c.Var1_c Var1bar i.Industry c.Var1_c#i.Industry Var2 Var3 Var4 Var5 i.Year, re vce(cluster Company_ID)
    
    Random-effects GLS regression                   Number of obs     =        466
    Group variable: Company_ID                      Number of groups  =         99
    
    R-sq:                                           Obs per group:
         within  = 0.2107                                         min =          1
         between = 0.3837                                         avg =        4.7
         overall = 0.4470                                         max =          6
    
                                                    Wald chi2(29)     =     312.45
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                           (Std. Err. adjusted for 99 clusters in Company_ID)
    -----------------------------------------------------------------------------------------
                            |               Robust
                    ROA_new |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------+----------------------------------------------------------------
                    Var1_c  |   .0004388   .0002735     1.60   0.109    -.0000973    .0009749
                   Var1bar  |   .0001705    .000198     0.86   0.389    -.0002176    .0005585
                            |
                 ...
                            |
                       Year |
                      2015  |   .0009689   .0020876     0.46   0.643    -.0031227    .0050605
                      2016  |   .0001361   .0021731     0.06   0.950     -.004123    .0043952
                      2017  |   .0041183   .0030348     1.36   0.175    -.0018297    .0100663
                      2018  |  -.0000602   .0035568    -0.02   0.986    -.0070314     .006911
                      2019  |  -.0037212   .0041598    -0.89   0.371    -.0118744    .0044319
                            |
                      _cons |   .1900278   .0648387     2.93   0.003     .0629462    .3171093
    ------------------------+----------------------------------------------------------------
                    sigma_u |  .03361633
                    sigma_e |  .01488897
                        rho |  .83600278   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------------------
    As Var1bar > 0.05, this result indicates, that I should use RE.

    What method should I use? Did I execute both commands/methods correctly?

    Thank you,
    Pietro

  • #2
    Pietro:
    as per its helpfile, the community-contributed module -xtoverid- already includes the augmented regression described by Jeff in his Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press,2002:290-91.
    Hence, I would go -fe-, as per .xtioverid- outcome.
    As an aside, I would also edouble-check whether your regression is correctly specified via an augmented regression that includes fitted an sq_fitted values among the predictors.
    As you can see from the following toy-example, despite -xtoverid- outcome points toward -fe-, the model is misspecified (sq_fitted values reach statistical significance):
    Code:
    use "https://www.stata-press.com/data/r16/nlswork.dta"
    . xtreg ln_wage age, fe vce(cluster idcode)
    
    Fixed-effects (within) regression               Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1026                                         min =          1
         between = 0.0877                                         avg =        6.1
         overall = 0.0774                                         max =         15
    
                                                    F(1,4709)         =     884.05
    corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000
    
                                 (Std. Err. adjusted for 4,710 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
           _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
    -------------+----------------------------------------------------------------
         sigma_u |  .40635023
         sigma_e |  .30349389
             rho |  .64192015   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . predict fitted_fe, xb
    (24 missing values generated)
    
    . g sq_fitted_fe=fitted_fe^2
    (24 missing values generated)
    
    . xtreg ln_wage age fitted_fe sq_fitted_fe , fe vce(cluster idcode)
    note: fitted_fe omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1087                                         min =          1
         between = 0.1006                                         avg =        6.1
         overall = 0.0865                                         max =         15
    
                                                    F(2,4709)         =     507.42
    corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000
    
                                 (Std. Err. adjusted for 4,710 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .1295461   .0133923     9.67   0.000     .1032908    .1558014
       fitted_fe |          0  (omitted)
    sq_fitted_fe |  -1.816244   .2188484    -8.30   0.000     -2.24529   -1.387199
           _cons |   3.034438   .2292752    13.23   0.000     2.584951    3.483925
    -------------+----------------------------------------------------------------
         sigma_u |  .40391529
         sigma_e |  .30245467
             rho |  .64073313   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    
    . xtreg ln_wage age, re vce(cluster idcode)
    
    Random-effects GLS regression                   Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1026                                         min =          1
         between = 0.0877                                         avg =        6.1
         overall = 0.0774                                         max =         15
    
                                                    Wald chi2(1)      =    1064.91
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                 (Std. Err. adjusted for 4,710 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0185667    .000569    32.63   0.000     .0174516    .0196819
           _cons |   1.120439   .0159154    70.40   0.000     1.089245    1.151632
    -------------+----------------------------------------------------------------
         sigma_u |  .36972456
         sigma_e |  .30349389
             rho |  .59743613   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . xtoverid
    
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  robust cluster(idcode)
    Sargan-Hansen statistic  14.529  Chi-sq(1)    P-value = 0.0001
    
    . xtreg ln_wage age, re vce(cluster idcode)
    
    Random-effects GLS regression                   Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1026                                         min =          1
         between = 0.0877                                         avg =        6.1
         overall = 0.0774                                         max =         15
    
                                                    Wald chi2(1)      =    1064.91
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                 (Std. Err. adjusted for 4,710 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0185667    .000569    32.63   0.000     .0174516    .0196819
           _cons |   1.120439   .0159154    70.40   0.000     1.089245    1.151632
    -------------+----------------------------------------------------------------
         sigma_u |  .36972456
         sigma_e |  .30349389
             rho |  .59743613   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . xtoverid
    
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  robust cluster(idcode)
    Sargan-Hansen statistic  14.529  Chi-sq(1)    P-value = 0.0001
    
    . predict fitted, xb
    (24 missing values generated)
    
    . g sq_fitted=fitted^2
    (24 missing values generated)
    
    . xtreg ln_wage age fitted sq_fitted , re vce(cluster idcode)
    note: fitted omitted because of collinearity
    
    Random-effects GLS regression                   Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1087                                         min =          1
         between = 0.1015                                         avg =        6.1
         overall = 0.0870                                         max =         15
    
                                                    Wald chi2(2)      =    1258.33
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                 (Std. Err. adjusted for 4,710 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .1406041   .0123812    11.36   0.000     .1163374    .1648708
          fitted |          0  (omitted)
       sq_fitted |   -1.96055   .1995384    -9.83   0.000    -2.351639   -1.569462
           _cons |   3.009213   .1945142    15.47   0.000     2.627972    3.390453
    -------------+----------------------------------------------------------------
         sigma_u |   .3654049
         sigma_e |  .30245467
             rho |  .59342665   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment

    Working...
    X