Dear Stata users,
I am using STATA 14.0/MP and have the following problem (which also aplies to STATA 12/SE). According to Rabe-Hesketh and Skroddal STATA multilevel book xtreg, mle and xtmixed (or mixed) should be the same.
I've run both commands in many data sets and, in fact, they both provide identical parameters estimates and almost exactly similar standard deviations (even though, as I read from an unfinished discusion here (xtmixed vs. xtreg - results are not identical) sometimes there are differences in standar deviations). However, I have a data set where parameter estimates are exactly the same, but residual predictions and random effects predictions are quite different and it seems that xtreg, mle is right and xtmixed is wrong.
¿Does anybody knows whether there is a problem in residual and random effect predictions with xtmixed (or mixed)? ¿Is my reasoning right?
As I cannot share my own data set, I've build a data set where the same problem can be found, see the code below. As you run the code below, you will see that estimates are identical. However predictions are quite different. The reason I think the problem is in xtmixed (or mixed) is because of HLM properties. The residual mean within each cluster should be 0 in HLM models. In my data set this applies to xtreg, mle but does not work to xtmixed.
I truly would appreciate anyone help on this issue.
Thanks a lot,
Nico Rojas
** Comparison of residuals and random effects between xtreg and xtmixed
** ------------------------------------------------------------------ **
** Generate data SET
clear
set obs 1000
gen id=trunc(_n/10)
gen x1=runiform()*5*id
gen y=rnormal() + runiform()*x1
gen lny = ln(y) /*Note: we could drop negative y, it does not matter*/
** Run xtmixed and generate residuals and reffects
xtmixed lny x1, || id:
predict u_hlm1, reffect
predict e_hlm1, resid
** Run xtreg and generate residuals and reffects
xtset id
xtreg lny x1, mle
predict u_hlm2, u
predict e_hlm2, e
** Generate differnces
gen dif_u = abs(u_hlm2 - u_hlm1)
gen dif_e = abs(e_hlm2 - e_hlm1)
** Tabulate differences (Note that there shouldn't be any differences)
tabstat dif_u, s(min p50 max)
tabstat dif_e, s(min p50 max)
** Check if residual sum in each cluser is equal to 0
bysort id: egen prom_e_hlm1=mean(e_hlm1)
bysort id: egen prom_e_hlm2=mean(e_hlm2)
tabstat prom_e_hlm1, s(min p50 max)
tabstat prom_e_hlm2, s(min p50 max)
** Note:
/*
It does not matter whether "u" statistics are made keeping only one obs per cluster
because there is exactly ten obs per cluster.
*/
I am using STATA 14.0/MP and have the following problem (which also aplies to STATA 12/SE). According to Rabe-Hesketh and Skroddal STATA multilevel book xtreg, mle and xtmixed (or mixed) should be the same.
I've run both commands in many data sets and, in fact, they both provide identical parameters estimates and almost exactly similar standard deviations (even though, as I read from an unfinished discusion here (xtmixed vs. xtreg - results are not identical) sometimes there are differences in standar deviations). However, I have a data set where parameter estimates are exactly the same, but residual predictions and random effects predictions are quite different and it seems that xtreg, mle is right and xtmixed is wrong.
¿Does anybody knows whether there is a problem in residual and random effect predictions with xtmixed (or mixed)? ¿Is my reasoning right?
As I cannot share my own data set, I've build a data set where the same problem can be found, see the code below. As you run the code below, you will see that estimates are identical. However predictions are quite different. The reason I think the problem is in xtmixed (or mixed) is because of HLM properties. The residual mean within each cluster should be 0 in HLM models. In my data set this applies to xtreg, mle but does not work to xtmixed.
I truly would appreciate anyone help on this issue.
Thanks a lot,
Nico Rojas
** Comparison of residuals and random effects between xtreg and xtmixed
** ------------------------------------------------------------------ **
** Generate data SET
clear
set obs 1000
gen id=trunc(_n/10)
gen x1=runiform()*5*id
gen y=rnormal() + runiform()*x1
gen lny = ln(y) /*Note: we could drop negative y, it does not matter*/
** Run xtmixed and generate residuals and reffects
xtmixed lny x1, || id:
predict u_hlm1, reffect
predict e_hlm1, resid
** Run xtreg and generate residuals and reffects
xtset id
xtreg lny x1, mle
predict u_hlm2, u
predict e_hlm2, e
** Generate differnces
gen dif_u = abs(u_hlm2 - u_hlm1)
gen dif_e = abs(e_hlm2 - e_hlm1)
** Tabulate differences (Note that there shouldn't be any differences)
tabstat dif_u, s(min p50 max)
tabstat dif_e, s(min p50 max)
** Check if residual sum in each cluser is equal to 0
bysort id: egen prom_e_hlm1=mean(e_hlm1)
bysort id: egen prom_e_hlm2=mean(e_hlm2)
tabstat prom_e_hlm1, s(min p50 max)
tabstat prom_e_hlm2, s(min p50 max)
** Note:
/*
It does not matter whether "u" statistics are made keeping only one obs per cluster
because there is exactly ten obs per cluster.
*/
Comment