I got an unexpected output when running -xtreg-. I think that the output is in error and could be misleading (Stata 13.1)
I first fit a model using -xtreg, mle-:
As expected the variable south which takes values (0, 1) is dropped because I restricted the model to south==1.
Now, if I fit the same model using GLS (the default):
South is not dropped, but the constant is. In fact the value of the constant is being outputted as the value of south.
This looks like a bug to me.
I know that simply dropping the variable south from the model statement fixes the problem, but it is still a bug that could be misleading.
For example, assume that you only have south =1 in your data and you do not realize it. Fitting the model that includes "south" would lead to an incorrect inference about the variable south.
Also the matrices e(b) and e(V) are incorrectly labeled, which compounds the problem if you are using them for output and analyses.
I first fit a model using -xtreg, mle-:
Code:
. use http://www.stata-press.com/data/r13/nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. xtset idcode
panel variable: idcode (unbalanced)
. xtreg ln_w grade age ttl_exp tenure i.south if south==1, mle
note: 1.south omitted because of collinearity
Fitting constant-only model:
Iteration 0: log likelihood = -5043.018
Iteration 1: log likelihood = -4771.2104
Iteration 2: log likelihood = -4728.134
Iteration 3: log likelihood = -4725.7725
Iteration 4: log likelihood = -4725.7612
Fitting full model:
Iteration 0: log likelihood = -3441.4136
Iteration 1: log likelihood = -3433.151
Iteration 2: log likelihood = -3433.1413
Random-effects ML regression Number of obs = 11501
Group variable: idcode Number of groups = 2138
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 5.4
max = 15
LR chi2(4) = 2585.24
Log likelihood = -3433.1413 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grade | .0730982 .0024847 29.42 0.000 .0682283 .0779681
age | -.0034273 .0010427 -3.29 0.001 -.005471 -.0013837
ttl_exp | .0302944 .0017186 17.63 0.000 .0269261 .0336627
tenure | .0083978 .001274 6.59 0.000 .0059007 .0108948
1.south | 0 (omitted)
_cons | .559949 .0393612 14.23 0.000 .4828025 .6370956
-------------+----------------------------------------------------------------
/sigma_u | .2579499 .0053908 .2475975 .2687351
/sigma_e | .283043 .0020695 .2790158 .2871284
rho | .4537161 .0113972 .4314628 .4761157
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)= 3445.57 Prob>=chibar2 = 0.000
Now, if I fit the same model using GLS (the default):
Code:
. xtreg ln_w grade age ttl_exp tenure i.south if south==1,
Random-effects GLS regression Number of obs = 11501
Group variable: idcode Number of groups = 2138
R-sq: within = 0.1293 Obs per group: min = 1
between = 0.4414 avg = 5.4
overall = 0.3206 max = 15
Wald chi2(5) = 58459.39
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grade | .0731688 .0025421 28.78 0.000 .0681864 .0781513
age | -.0033255 .0010488 -3.17 0.002 -.0053811 -.0012699
ttl_exp | .0301467 .0017275 17.45 0.000 .0267609 .0335324
tenure | .0083321 .0012727 6.55 0.000 .0058377 .0108265
|
south |
0 | 0 (empty)
1 | .5570191 .0400338 13.91 0.000 .4785543 .6354839
|
_cons | 0 (omitted)
-------------+----------------------------------------------------------------
sigma_u | .26686548
sigma_e | .28204221
rho | .47237215 (fraction of variance due to u_i)
------------------------------------------------------------------------------
This looks like a bug to me.
I know that simply dropping the variable south from the model statement fixes the problem, but it is still a bug that could be misleading.
For example, assume that you only have south =1 in your data and you do not realize it. Fitting the model that includes "south" would lead to an incorrect inference about the variable south.
Also the matrices e(b) and e(V) are incorrectly labeled, which compounds the problem if you are using them for output and analyses.

Comment