Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtreg - error or feature

    I got an unexpected output when running -xtreg-. I think that the output is in error and could be misleading (Stata 13.1)

    I first fit a model using -xtreg, mle-:
    Code:
    . use http://www.stata-press.com/data/r13/nlswork
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . xtset idcode
           panel variable:  idcode (unbalanced)
    
    . xtreg ln_w grade age  ttl_exp   tenure   i.south if south==1, mle  
    note: 1.south omitted because of collinearity
    
    Fitting constant-only model:
    Iteration 0:   log likelihood =  -5043.018
    Iteration 1:   log likelihood = -4771.2104
    Iteration 2:   log likelihood =  -4728.134
    Iteration 3:   log likelihood = -4725.7725
    Iteration 4:   log likelihood = -4725.7612
    
    Fitting full model:
    Iteration 0:   log likelihood = -3441.4136
    Iteration 1:   log likelihood =  -3433.151
    Iteration 2:   log likelihood = -3433.1413
    
    Random-effects ML regression                    Number of obs      =     11501
    Group variable: idcode                          Number of groups   =      2138
    
    Random effects u_i ~ Gaussian                   Obs per group: min =         1
                                                                   avg =       5.4
                                                                   max =        15
    
                                                    LR chi2(4)         =   2585.24
    Log likelihood  = -3433.1413                    Prob > chi2        =    0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           grade |   .0730982   .0024847    29.42   0.000     .0682283    .0779681
             age |  -.0034273   .0010427    -3.29   0.001     -.005471   -.0013837
         ttl_exp |   .0302944   .0017186    17.63   0.000     .0269261    .0336627
          tenure |   .0083978    .001274     6.59   0.000     .0059007    .0108948
         1.south |          0  (omitted)
           _cons |    .559949   .0393612    14.23   0.000     .4828025    .6370956
    -------------+----------------------------------------------------------------
        /sigma_u |   .2579499   .0053908                      .2475975    .2687351
        /sigma_e |    .283043   .0020695                      .2790158    .2871284
             rho |   .4537161   .0113972                      .4314628    .4761157
    ------------------------------------------------------------------------------
    Likelihood-ratio test of sigma_u=0: chibar2(01)= 3445.57 Prob>=chibar2 = 0.000
    As expected the variable south which takes values (0, 1) is dropped because I restricted the model to south==1.
    Now, if I fit the same model using GLS (the default):

    Code:
    . xtreg ln_w grade age  ttl_exp   tenure   i.south if south==1,   
    
    Random-effects GLS regression                   Number of obs      =     11501
    Group variable: idcode                          Number of groups   =      2138
    
    R-sq:  within  = 0.1293                         Obs per group: min =         1
           between = 0.4414                                        avg =       5.4
           overall = 0.3206                                        max =        15
    
                                                    Wald chi2(5)       =  58459.39
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           grade |   .0731688   .0025421    28.78   0.000     .0681864    .0781513
             age |  -.0033255   .0010488    -3.17   0.002    -.0053811   -.0012699
         ttl_exp |   .0301467   .0017275    17.45   0.000     .0267609    .0335324
          tenure |   .0083321   .0012727     6.55   0.000     .0058377    .0108265
                 |
           south |
              0  |          0  (empty)
              1  |   .5570191   .0400338    13.91   0.000     .4785543    .6354839
                 |
           _cons |          0  (omitted)
    -------------+----------------------------------------------------------------
         sigma_u |  .26686548
         sigma_e |  .28204221
             rho |  .47237215   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    South is not dropped, but the constant is. In fact the value of the constant is being outputted as the value of south.
    This looks like a bug to me.

    I know that simply dropping the variable south from the model statement fixes the problem, but it is still a bug that could be misleading.
    For example, assume that you only have south =1 in your data and you do not realize it. Fitting the model that includes "south" would lead to an incorrect inference about the variable south.
    Also the matrices e(b) and e(V) are incorrectly labeled, which compounds the problem if you are using them for output and analyses.

  • #2
    Whether you drop south or drop the constant, the results are correct either way. I agree that I would rather drop south, but when something has to be dropped because of perfect collinearity you leave yourself at the mercy of Stata as to what it is that gets dropped.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Also...

      Code:
      use http://www.stata-press.com/data/r13/nlswork
      (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
      
      . xtset idcode
             panel variable:  idcode (unbalanced)
      
      
      . keep if south == 1
      (16851 observations deleted)
      
      . xtreg ln_w grade age  ttl_exp   tenure   i.south
      note: 1.south omitted because of collinearity
      
      Random-effects GLS regression                   Number of obs      =     11501
      Group variable: idcode                          Number of groups   =      2138
      
      R-sq:  within  = 0.1293                         Obs per group: min =         1
             between = 0.4414                                        avg =       5.4
             overall = 0.3206                                        max =        15
      
                                                      Wald chi2(4)       =   3091.00
      corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000
      
      ------------------------------------------------------------------------------
           ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             grade |   .0731688   .0025421    28.78   0.000     .0681864    .0781513
               age |  -.0033255   .0010488    -3.17   0.002    -.0053811   -.0012699
           ttl_exp |   .0301467   .0017275    17.45   0.000     .0267609    .0335324
            tenure |   .0083321   .0012727     6.55   0.000     .0058377    .0108265
           1.south |          0  (omitted)
             _cons |   .5570191   .0400338    13.91   0.000     .4785543    .6354839
      -------------+----------------------------------------------------------------
           sigma_u |  .26686548
           sigma_e |  .28204221
               rho |  .47237215   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------

      Comment


      • #4
        I think the question here is, why it (should) depend on the selected estimator which variable to drop. I agree that it seems like a (minor) bug, that Stata would (i) drop different variables according to the selected estimator and (ii) drop the constant that it adds in the first place.

        One thing I noticed from a quick glance into xtreg.ado is that the initial checks for collinearity do not pass touse (a temporary variable to identify the estimation sample) the to _rmcoll (an undocumented built-in command that flags variables to be omitted), which I think they should do. However, since these checks are the same for both model types, it cannot explain the observed differences.

        It probably has something to to with the way the respective subroutines xtreg_re and xtreg_ml handle collinear variables. But the details are to be dealt with by others.

        Best
        Daniel

        Comment


        • #5
          Daniel's observation about _rmcoll explains the difference in choice for omitted coefficient between Mario's and Martin's call to xtreg, re.

          xtreg should be calling _rmcoll using if to identify the estimation sample.

          Comment


          • #6
            Thank you Jeff.

            Comment

            Working...
            X