-xtoverid- or Variable Addition Test (VAT)

Pietro Russo

Join Date: Jul 2021
Posts: 7

-xtoverid- or Variable Addition Test (VAT)

03 Aug 2021, 11:34

Dear community,

after having found heteroscedasticity and autocorrelation in my panel data, I account for it with -xtreg ..., fe/re vce (cluster Company_ID)-. Because of this, I am unable to execute the conventional -hausman- command.

My regression is as follows: ROA = c.Var1_c##i.Industry Var2 Var3 Var4 Var5 i.Year, with Var1 being compensation to the CEO, Var2-5 control variables and Industry being a dummy variable (1 to 10 for different industries). I winsorized at (5 95). Var1_c is the mean-centered version of Var1, which itself is the squareroot of compensation to the CEO. This was done to account for multicollinearity arising from the interaction term.

I have come across two ways to determine whether to use RE or FE: 1) the -xtoverid- command and 2) Variable Addition Test (VAT) as discussed by J. Wooldridge.

1) -xtoverid-

Code:

. gen Interaction = Var1_c*Industry

. xi: xtreg ROA c.Var1_c i.Industry Interaction Var2 Var3 Var4 Var5 i.Year, re vce(cluster Company_ID)

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(Company_ID)
Sargan-Hansen statistic  26.143  Chi-sq(11)   P-value = 0.0062

This result indicates, that I should use FE.

2) Variable Addition Test

Code:

. egen Var1bar = mean(Var1_c), by(Company_ID)

. xtreg ROA c.Var1_c Var1bar i.Industry c.Var1_c#i.Industry Var2 Var3 Var4 Var5 i.Year, re vce(cluster Company_ID)

Random-effects GLS regression                   Number of obs     =        466
Group variable: Company_ID                      Number of groups  =         99

R-sq:                                           Obs per group:
     within  = 0.2107                                         min =          1
     between = 0.3837                                         avg =        4.7
     overall = 0.4470                                         max =          6

                                                Wald chi2(29)     =     312.45
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                       (Std. Err. adjusted for 99 clusters in Company_ID)
-----------------------------------------------------------------------------------------
                        |               Robust
                ROA_new |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
                Var1_c  |   .0004388   .0002735     1.60   0.109    -.0000973    .0009749
               Var1bar  |   .0001705    .000198     0.86   0.389    -.0002176    .0005585
                        |
             ...
                        |
                   Year |
                  2015  |   .0009689   .0020876     0.46   0.643    -.0031227    .0050605
                  2016  |   .0001361   .0021731     0.06   0.950     -.004123    .0043952
                  2017  |   .0041183   .0030348     1.36   0.175    -.0018297    .0100663
                  2018  |  -.0000602   .0035568    -0.02   0.986    -.0070314     .006911
                  2019  |  -.0037212   .0041598    -0.89   0.371    -.0118744    .0044319
                        |
                  _cons |   .1900278   .0648387     2.93   0.003     .0629462    .3171093
------------------------+----------------------------------------------------------------
                sigma_u |  .03361633
                sigma_e |  .01488897
                    rho |  .83600278   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------

As Var1bar > 0.05, this result indicates, that I should use RE.

What method should I use? Did I execute both commands/methods correctly?

Thank you,
Pietro

Tags: categorical, fixed effects, interaction, panel data, Time Series

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

04 Aug 2021, 01:56

Pietro:
as per its helpfile, the community-contributed module -xtoverid- already includes the augmented regression described by Jeff in his Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press,2002:290-91.
Hence, I would go -fe-, as per .xtioverid- outcome.
As an aside, I would also edouble-check whether your regression is correctly specified via an augmented regression that includes fitted an sq_fitted values among the predictors.
As you can see from the following toy-example, despite -xtoverid- outcome points toward -fe-, the model is misspecified (sq_fitted values reach statistical significance):

Code:

use "https://www.stata-press.com/data/r16/nlswork.dta"
. xtreg ln_wage age, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                F(1,4709)         =     884.05
corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
       _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted_fe, xb
(24 missing values generated)

. g sq_fitted_fe=fitted_fe^2
(24 missing values generated)

. xtreg ln_wage age fitted_fe sq_fitted_fe , fe vce(cluster idcode)
note: fitted_fe omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .1295461   .0133923     9.67   0.000     .1032908    .1558014
   fitted_fe |          0  (omitted)
sq_fitted_fe |  -1.816244   .2188484    -8.30   0.000     -2.24529   -1.387199
       _cons |   3.034438   .2292752    13.23   0.000     2.584951    3.483925
-------------+----------------------------------------------------------------
     sigma_u |  .40391529
     sigma_e |  .30245467
         rho |  .64073313   (fraction of variance due to u_i)
------------------------------------------------------------------------------


. xtreg ln_wage age, re vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                Wald chi2(1)      =    1064.91
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0185667    .000569    32.63   0.000     .0174516    .0196819
       _cons |   1.120439   .0159154    70.40   0.000     1.089245    1.151632
-------------+----------------------------------------------------------------
     sigma_u |  .36972456
     sigma_e |  .30349389
         rho |  .59743613   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(idcode)
Sargan-Hansen statistic  14.529  Chi-sq(1)    P-value = 0.0001

. xtreg ln_wage age, re vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                Wald chi2(1)      =    1064.91
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0185667    .000569    32.63   0.000     .0174516    .0196819
       _cons |   1.120439   .0159154    70.40   0.000     1.089245    1.151632
-------------+----------------------------------------------------------------
     sigma_u |  .36972456
     sigma_e |  .30349389
         rho |  .59743613   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(idcode)
Sargan-Hansen statistic  14.529  Chi-sq(1)    P-value = 0.0001

. predict fitted, xb
(24 missing values generated)

. g sq_fitted=fitted^2
(24 missing values generated)

. xtreg ln_wage age fitted sq_fitted , re vce(cluster idcode)
note: fitted omitted because of collinearity

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1015                                         avg =        6.1
     overall = 0.0870                                         max =         15

                                                Wald chi2(2)      =    1258.33
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .1406041   .0123812    11.36   0.000     .1163374    .1648708
      fitted |          0  (omitted)
   sq_fitted |   -1.96055   .1995384    -9.83   0.000    -2.351639   -1.569462
       _cons |   3.009213   .1945142    15.47   0.000     2.627972    3.390453
-------------+----------------------------------------------------------------
     sigma_u |   .3654049
     sigma_e |  .30245467
         rho |  .59342665   (fraction of variance due to u_i)
------------------------------------------------------------------------------
.

Kind regards,
Carlo
(Stata 19.0)

Announcement

-xtoverid- or Variable Addition Test (VAT)

Comment