I got an unexpected output when running -xtreg-. I think that the output is in error and could be misleading (Stata 13.1)
I first fit a model using -xtreg, mle-:
As expected the variable south which takes values (0, 1) is dropped because I restricted the model to south==1.
Now, if I fit the same model using GLS (the default):
South is not dropped, but the constant is. In fact the value of the constant is being outputted as the value of south.
This looks like a bug to me.
I know that simply dropping the variable south from the model statement fixes the problem, but it is still a bug that could be misleading.
For example, assume that you only have south =1 in your data and you do not realize it. Fitting the model that includes "south" would lead to an incorrect inference about the variable south.
Also the matrices e(b) and e(V) are incorrectly labeled, which compounds the problem if you are using them for output and analyses.
I first fit a model using -xtreg, mle-:
Code:
. use http://www.stata-press.com/data/r13/nlswork (National Longitudinal Survey. Young Women 14-26 years of age in 1968) . xtset idcode panel variable: idcode (unbalanced) . xtreg ln_w grade age ttl_exp tenure i.south if south==1, mle note: 1.south omitted because of collinearity Fitting constant-only model: Iteration 0: log likelihood = -5043.018 Iteration 1: log likelihood = -4771.2104 Iteration 2: log likelihood = -4728.134 Iteration 3: log likelihood = -4725.7725 Iteration 4: log likelihood = -4725.7612 Fitting full model: Iteration 0: log likelihood = -3441.4136 Iteration 1: log likelihood = -3433.151 Iteration 2: log likelihood = -3433.1413 Random-effects ML regression Number of obs = 11501 Group variable: idcode Number of groups = 2138 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 5.4 max = 15 LR chi2(4) = 2585.24 Log likelihood = -3433.1413 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- grade | .0730982 .0024847 29.42 0.000 .0682283 .0779681 age | -.0034273 .0010427 -3.29 0.001 -.005471 -.0013837 ttl_exp | .0302944 .0017186 17.63 0.000 .0269261 .0336627 tenure | .0083978 .001274 6.59 0.000 .0059007 .0108948 1.south | 0 (omitted) _cons | .559949 .0393612 14.23 0.000 .4828025 .6370956 -------------+---------------------------------------------------------------- /sigma_u | .2579499 .0053908 .2475975 .2687351 /sigma_e | .283043 .0020695 .2790158 .2871284 rho | .4537161 .0113972 .4314628 .4761157 ------------------------------------------------------------------------------ Likelihood-ratio test of sigma_u=0: chibar2(01)= 3445.57 Prob>=chibar2 = 0.000
Now, if I fit the same model using GLS (the default):
Code:
. xtreg ln_w grade age ttl_exp tenure i.south if south==1, Random-effects GLS regression Number of obs = 11501 Group variable: idcode Number of groups = 2138 R-sq: within = 0.1293 Obs per group: min = 1 between = 0.4414 avg = 5.4 overall = 0.3206 max = 15 Wald chi2(5) = 58459.39 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- grade | .0731688 .0025421 28.78 0.000 .0681864 .0781513 age | -.0033255 .0010488 -3.17 0.002 -.0053811 -.0012699 ttl_exp | .0301467 .0017275 17.45 0.000 .0267609 .0335324 tenure | .0083321 .0012727 6.55 0.000 .0058377 .0108265 | south | 0 | 0 (empty) 1 | .5570191 .0400338 13.91 0.000 .4785543 .6354839 | _cons | 0 (omitted) -------------+---------------------------------------------------------------- sigma_u | .26686548 sigma_e | .28204221 rho | .47237215 (fraction of variance due to u_i) ------------------------------------------------------------------------------
This looks like a bug to me.
I know that simply dropping the variable south from the model statement fixes the problem, but it is still a bug that could be misleading.
For example, assume that you only have south =1 in your data and you do not realize it. Fitting the model that includes "south" would lead to an incorrect inference about the variable south.
Also the matrices e(b) and e(V) are incorrectly labeled, which compounds the problem if you are using them for output and analyses.
Comment