xtmixed (or mixed) vs xtreg: resiudal and random effects problem

Nico Rojas

Join Date: Dec 2015

Posts: 4
#1

xtmixed (or mixed) vs xtreg: resiudal and random effects problem

30 Dec 2015, 15:32

Dear Stata users,

I am using STATA 14.0/MP and have the following problem (which also aplies to STATA 12/SE). According to Rabe-Hesketh and Skroddal STATA multilevel book xtreg, mle and xtmixed (or mixed) should be the same.

I've run both commands in many data sets and, in fact, they both provide identical parameters estimates and almost exactly similar standard deviations (even though, as I read from an unfinished discusion here (xtmixed vs. xtreg - results are not identical) sometimes there are differences in standar deviations). However, I have a data set where parameter estimates are exactly the same, but residual predictions and random effects predictions are quite different and it seems that xtreg, mle is right and xtmixed is wrong.

¿Does anybody knows whether there is a problem in residual and random effect predictions with xtmixed (or mixed)? ¿Is my reasoning right?

As I cannot share my own data set, I've build a data set where the same problem can be found, see the code below. As you run the code below, you will see that estimates are identical. However predictions are quite different. The reason I think the problem is in xtmixed (or mixed) is because of HLM properties. The residual mean within each cluster should be 0 in HLM models. In my data set this applies to xtreg, mle but does not work to xtmixed.

I truly would appreciate anyone help on this issue.

Thanks a lot,

Nico Rojas

** Comparison of residuals and random effects between xtreg and xtmixed
** ------------------------------------------------------------------ **

** Generate data SET
clear
set obs 1000
gen id=trunc(_n/10)
gen x1=runiform()*5*id
gen y=rnormal() + runiform()*x1
gen lny = ln(y) /*Note: we could drop negative y, it does not matter*/

** Run xtmixed and generate residuals and reffects
xtmixed lny x1, || id:

predict u_hlm1, reffect
predict e_hlm1, resid

** Run xtreg and generate residuals and reffects
xtset id
xtreg lny x1, mle

predict u_hlm2, u
predict e_hlm2, e

** Generate differnces
gen dif_u = abs(u_hlm2 - u_hlm1)
gen dif_e = abs(e_hlm2 - e_hlm1)

** Tabulate differences (Note that there shouldn't be any differences)
tabstat dif_u, s(min p50 max)
tabstat dif_e, s(min p50 max)

** Check if residual sum in each cluser is equal to 0
bysort id: egen prom_e_hlm1=mean(e_hlm1)
bysort id: egen prom_e_hlm2=mean(e_hlm2)

tabstat prom_e_hlm1, s(min p50 max)
tabstat prom_e_hlm2, s(min p50 max)

** Note:
/*
It does not matter whether "u" statistics are made keeping only one obs per cluster
because there is exactly ten obs per cluster.
*/
Tags: None
Nico Rojas

Join Date: Dec 2015

Posts: 4
#2

08 Jan 2016, 16:05

An update on this subject:

I asked a friend to make the same computation in R and ... level 1 residuals were almost identical to those in xtmixed but not to those in xtreg.... this is somehow puzzling because in those of xtreg the sum of residuals are equal to 0 ...
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 873
#3

10 Jan 2016, 15:01

Hi Nico,

Is it possible to provide a smaple of the results from both Stata and R outputs? I am interested too to get a reply on this topic from someone responsible in Stata. Hoping an example of the two would encourage them to have a quick look.

I noticed the same difference when you use 'mle' option with 'xtreg'. If you do not use 'mle' option after 'xtreg', the difference between 'mixed' and 'xtreg' residuals (both upper and lower) are diffrence of decimals after 3 points. I don't have R at this moment otherwise could have looked at the difference. But yes, I agree that intuitively use of 'mle' after 'xtreg' should provide the same residual variance as they are after 'mixed' as both implementing ML.

Regards,

Last edited by Roman Mostazir; 10 Jan 2016, 15:08.

Roman
Comment
Nico Rojas

Join Date: Dec 2015

Posts: 4
#4

12 Jun 2017, 18:00

Dear all,

First, I would like to apologize to Roman Mostazir for not answering this before .... Even I'm subscribed to this post somehow I did't see the reply... I'll work on that. I have already solved the problem and the solution was in Rabe-Hesketh and Skrondal's "Multilevel and Longitudinal Modeling Using STATA: Vol1" (2012).

xtreg and mixed (or xtmixed) might be almost the same for estimating betas, and variance components (sigma_u sigma_e). However, the predict option for computing residuals are quite different. While xtreg residuals deals with random effects like fixed parameters to be estimated and estimates them with MLE method, mixed uses a Bayesian approach and calculates the mean of the a posteriori distribution of the random effect. Interestingly though, for an individual cluster, both approaches converge if the number of observations is large and/or the random effect estimated variance is relatively large compared with the estimates variance of idiosyncratic results.

mixed random effects´ are the same as SAS (proc mixed) and R (lmer). This probably might be because the Bayesian approach leads to residuals that have a better mean square error (at least for linear estimators) and because the MLE estimation treats the random effect as fixed instead of random. In that sense, it is somehow puzzling that STATA is more flexible because you can go either way by choosing xtreg or xmixed but:

i) it is not specified in the help description of "predict" for xtreg that you are making a big decision.
ii) you are not able to choose one way or the other in xtreg or mixed. The capacity is already there, it just a matter of connecting both commands.

In what respect to the model properties, Rabe-Hesketh and Skrondal (2012) states that the model is conditionally bias. Although I'm not a completely sure yet, It seems this might mean that for any individual cluster the estimation is biased, which is why the assumption E(e_ij/u_i)=0 does hold for the mean of e_ifor an individual j. The advantage would be that this conditional bias is countered by a lower mean-squared error for the entire population.

Thanks again.
Best regards.
Nico.
2 likes
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 873
#5

15 Jun 2017, 16:07

Thanks Nico, for getting down to the root of the problem. I completely forgot about this old thread and didn't quite follow it up. I found the Rabe-Hesketh book today in the library and had a chance to look at it. Yes the bayesian approach for prediction is clearly specified there. But the anomalies between xtreg and mixed, while using 'predict', are not acknowledged. However, at least we know about it now. Many thanks again for your effort.
Best,

Roman
1 like
Comment

Announcement

xtmixed (or mixed) vs xtreg: resiudal and random effects problem

Comment

Comment

Comment

Comment