xtreg - error or feature

M. Cleves

Join Date: Jun 2014
Posts: 21

xtreg - error or feature

04 Nov 2014, 07:02

I got an unexpected output when running -xtreg-. I think that the output is in error and could be misleading (Stata 13.1)

I first fit a model using -xtreg, mle-:

Code:

. use http://www.stata-press.com/data/r13/nlswork
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtset idcode
       panel variable:  idcode (unbalanced)

. xtreg ln_w grade age  ttl_exp   tenure   i.south if south==1, mle  
note: 1.south omitted because of collinearity

Fitting constant-only model:
Iteration 0:   log likelihood =  -5043.018
Iteration 1:   log likelihood = -4771.2104
Iteration 2:   log likelihood =  -4728.134
Iteration 3:   log likelihood = -4725.7725
Iteration 4:   log likelihood = -4725.7612

Fitting full model:
Iteration 0:   log likelihood = -3441.4136
Iteration 1:   log likelihood =  -3433.151
Iteration 2:   log likelihood = -3433.1413

Random-effects ML regression                    Number of obs      =     11501
Group variable: idcode                          Number of groups   =      2138

Random effects u_i ~ Gaussian                   Obs per group: min =         1
                                                               avg =       5.4
                                                               max =        15

                                                LR chi2(4)         =   2585.24
Log likelihood  = -3433.1413                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       grade |   .0730982   .0024847    29.42   0.000     .0682283    .0779681
         age |  -.0034273   .0010427    -3.29   0.001     -.005471   -.0013837
     ttl_exp |   .0302944   .0017186    17.63   0.000     .0269261    .0336627
      tenure |   .0083978    .001274     6.59   0.000     .0059007    .0108948
     1.south |          0  (omitted)
       _cons |    .559949   .0393612    14.23   0.000     .4828025    .6370956
-------------+----------------------------------------------------------------
    /sigma_u |   .2579499   .0053908                      .2475975    .2687351
    /sigma_e |    .283043   .0020695                      .2790158    .2871284
         rho |   .4537161   .0113972                      .4314628    .4761157
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)= 3445.57 Prob>=chibar2 = 0.000

As expected the variable south which takes values (0, 1) is dropped because I restricted the model to south==1.
Now, if I fit the same model using GLS (the default):

Code:

. xtreg ln_w grade age  ttl_exp   tenure   i.south if south==1,   

Random-effects GLS regression                   Number of obs      =     11501
Group variable: idcode                          Number of groups   =      2138

R-sq:  within  = 0.1293                         Obs per group: min =         1
       between = 0.4414                                        avg =       5.4
       overall = 0.3206                                        max =        15

                                                Wald chi2(5)       =  58459.39
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       grade |   .0731688   .0025421    28.78   0.000     .0681864    .0781513
         age |  -.0033255   .0010488    -3.17   0.002    -.0053811   -.0012699
     ttl_exp |   .0301467   .0017275    17.45   0.000     .0267609    .0335324
      tenure |   .0083321   .0012727     6.55   0.000     .0058377    .0108265
             |
       south |
          0  |          0  (empty)
          1  |   .5570191   .0400338    13.91   0.000     .4785543    .6354839
             |
       _cons |          0  (omitted)
-------------+----------------------------------------------------------------
     sigma_u |  .26686548
     sigma_e |  .28204221
         rho |  .47237215   (fraction of variance due to u_i)
------------------------------------------------------------------------------

South is not dropped, but the constant is. In fact the value of the constant is being outputted as the value of south.
This looks like a bug to me.

I know that simply dropping the variable south from the model statement fixes the problem, but it is still a bug that could be misleading.
For example, assume that you only have south =1 in your data and you do not realize it. Fitting the model that includes "south" would lead to an incorrect inference about the variable south.
Also the matrices e(b) and e(V) are incorrectly labeled, which compounds the problem if you are using them for output and analyses.

Tags: None

Richard Williams

Join Date: Apr 2014

Posts: 5008
#2

04 Nov 2014, 08:06

Whether you drop south or drop the constant, the results are correct either way. I agree that I would rather drop south, but when something has to be dropped because of perfect collinearity you leave yourself at the mercy of Stata as to what it is that gets dropped.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Martin Bresslein

Join Date: Apr 2014
Posts: 51

04 Nov 2014, 08:09

Also...

Code:

use http://www.stata-press.com/data/r13/nlswork
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtset idcode
       panel variable:  idcode (unbalanced)


. keep if south == 1
(16851 observations deleted)

. xtreg ln_w grade age  ttl_exp   tenure   i.south
note: 1.south omitted because of collinearity

Random-effects GLS regression                   Number of obs      =     11501
Group variable: idcode                          Number of groups   =      2138

R-sq:  within  = 0.1293                         Obs per group: min =         1
       between = 0.4414                                        avg =       5.4
       overall = 0.3206                                        max =        15

                                                Wald chi2(4)       =   3091.00
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       grade |   .0731688   .0025421    28.78   0.000     .0681864    .0781513
         age |  -.0033255   .0010488    -3.17   0.002    -.0053811   -.0012699
     ttl_exp |   .0301467   .0017275    17.45   0.000     .0267609    .0335324
      tenure |   .0083321   .0012727     6.55   0.000     .0058377    .0108265
     1.south |          0  (omitted)
       _cons |   .5570191   .0400338    13.91   0.000     .4785543    .6354839
-------------+----------------------------------------------------------------
     sigma_u |  .26686548
     sigma_e |  .28204221
         rho |  .47237215   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Comment

daniel klein

Join Date: Mar 2014

Posts: 3860
#4

04 Nov 2014, 08:26

I think the question here is, why it (should) depend on the selected estimator which variable to drop. I agree that it seems like a (minor) bug, that Stata would (i) drop different variables according to the selected estimator and (ii) drop the constant that it adds in the first place.

One thing I noticed from a quick glance into xtreg.ado is that the initial checks for collinearity do not pass touse (a temporary variable to identify the estimation sample) the to _rmcoll (an undocumented built-in command that flags variables to be omitted), which I think they should do. However, since these checks are the same for both model types, it cannot explain the observed differences.

It probably has something to to with the way the respective subroutines xtreg_re and xtreg_ml handle collinear variables. But the details are to be dealt with by others.

Best
Daniel
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 700
#5

04 Nov 2014, 09:19

Daniel's observation about _rmcoll explains the difference in choice for omitted coefficient between Mario's and Martin's call to xtreg, re.

xtreg should be calling _rmcoll using if to identify the estimation sample.
Comment
M. Cleves

Join Date: Jun 2014

Posts: 21
#6

04 Nov 2014, 09:52

Thank you Jeff.
Comment

Announcement

xtreg - error or feature

Comment

Comment

Comment

Comment

Comment