Discrepancy in results between SAS proc mixed and STATA mixed

Lazaro Mwakesi

Join Date: May 2015

Posts: 3
#1

Discrepancy in results between SAS proc mixed and STATA mixed

06 May 2015, 04:50

Hello all,

I'm fairly new to STATA so I am teaching myself as I go along. I am proficient in SAS, to be sure I am doing the right thing in STATA I am currently fitting the same model in SAS and STATA simultaneously and comparing results. I am using SAS proc mixed and STATA mixed commands.

I am baffled by the differences in the results of covariance parameters in the 2 softwares. However, I have noticed the values of the log-likelihood and solution for fixed effects are approximately the same. SAS uses Newton Raphson algorithm while STATA uses EM algorithm to converge to a solution, In theory both algorithms should lead to the same solutions. Also, I am using REML in both and denominator degree of freedom in both softwares is Kenward Rodgers. To clarify, I am estimating variances ( not sds in STATA ) so as to be able to compare the results directly with the one from SAS.

SAS CODE
procmixeddata=yr_2 covtesticmethod=reml;
model z_theory_r=zaq /solutionddfm=kr chisq ;
random intercept /subject=uni_id;
run;

STATA CODE

mixed z_theory_r zaq if yr_prog==2 || uni_id: , var reml dfmethod(kr)

Does anyone have an Idea why the results for the random effects differ ? Thanks !
Tags: None
Xiao Yang (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 3
#2

06 May 2015, 12:01

Lazaro asked about why the results for the random effects in a mixed model with Kenward-Roger method are different between SAS and Stata.

The main reason for the difference between the estimates of covariance parameters between Stata and SAS is because of different methods used to estimate the parameters. As Lazaro mentioned, SAS uses Newton-Raphson algorithm whereas Stata uses Pinheiro-Bates algorithm to estimate parameters. Although in theory, both algorithms should lead to the same solution. In
practice, there may be numerical differences between the parameter estimates using different estimation methods. The difference between estimates of variance components will generally also lead to the differences between denominator degrees of freedom.

In the case of the Kenward-Roger DDF method, there may also be differences between the estimated degrees of freedom from Stata and SAS even if the estimates of coefficients and variance components are the same. By default, Stata uses the expected information matrix when computing DDF as described in Kenward and Roger (1997). SAS is using the observed information matrix instead.

As of update 22apr2015, you can use the new oim suboption of the dfmethod() option to use the observed information matrix in the computation. This should produce results similar to SAS provided that the estimates of model parameters are similar.

So, to obtain DDF computation similar to SAS, Lazaro needs to specify the suboption oim by typing:

Code:

. mixed z_theory_r zaq if yr_prog==2 || uni_id:, reml dfmethod(kroger, oim)

Reference:
Kenward, M. G., and J. H. Roger. 1997. Small sample inference for fixed
effects from restricted maximum likelihood. Biometrics 53: 983-997.
Comment
Lazaro Mwakesi

Join Date: May 2015

Posts: 3
#3

07 May 2015, 08:45

Thanks for the insight Xiao Yang, Your diagnosis seems to be correct. I have confirmed from literature that the algorithm from Pinheiro and Bates ( A hybrid of EM and NR ) computes the variance components from Expected Fisher's Information Matrix since the Observed Fisher's Information Matrix is not accesible. This algorithm, as I understand, is also implemented in R functions lme() and lmer() which were written by Bates er. al. I ran the same model from before in R and the results are similar to what STATA outputs.

I downloaded my STATA 14 SE version on the 22apr2015 but for some reason I am unable to run the code below as you had recommended.

code:
mixed z_theory_r zaq if yr_prog==2 || uni_id:, reml dfmethod(kroger, oim)

due to an error mesage "invalid 'oim'". Instead, based on the STATA mixed documentation, I used

code:
mixed z_theory_r zaq if yr_prog==2|| uni_id:, reml dfmethod(kroger) vce(oim)

The results I obtained are similar to what I had before as such I haven't found a solution to the problem yet. Since I am unable to use dfmethod(kroger, oim), what other options around the problem are there in STATA ?

Reference:Missing Data in Clinical Studies - Geert Molenberghs, Michael Kenward
Comment
Xiao Yang (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 3
#4

07 May 2015, 12:23

Dear Lazaro,

You can check your current update level by typing

Code:

. update query

And you will always want to update to the latest version. Please let us know if you still have trouble to run the command after updating.

Also, dfmethod(kroger, oim) is not equivalent to dfmethod(kroger) and vce(oim). These are two different specifications.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

07 May 2015, 19:22

Lazaro, see FAQ section 12 for how to format commands and results with CODE delimiters. That will make your posts easier to read.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Lazaro Mwakesi

Join Date: May 2015

Posts: 3
#6

08 May 2015, 06:08

Dear Xiao,

Thanks for the response. After the update, I was able to run the command. As you had mentioned before, the ddf don't line up with what SAS outputs because the variance components from SAS and Stata are not the same.

As I understand, when using the Pinheiro and Bates algorithm, the estimated variance components will be different to what one may get from the Observed Fisher's Information Matrix ( expect in cases of classical exponential distributions). My question is ( as a matter of interest ) is there a consideration to apply a correction to the Pinheiro and Bates algorithm estimates so that there is little ( if any) difference between the estimates derived from Observed and Expected Fisher Information Matrix in future versions of Stata ?

I ask because my understanding is that estimates from the Observed Fisher Information Matrix are to be preferred.

Dear Steve,

Noted and thanks.
Comment
Xiao Yang (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 3
#7

08 May 2015, 12:38

Dear Lazaro,

We are not aware of any corrections available for the Pinheiro and Bates algorithm. If you have any reference paper, we would like to take a look at it. There are no plans at the moment to modify this algorithm in future versions of Stata. Thank you very much!

Xiao
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#8

10 May 2015, 23:03

Originally posted by Lazaro Mwakesi View Post

I am baffled by the differences in the results of covariance parameters in the 2 softwares. However, I have noticed the values of the log-likelihood and solution for fixed effects are approximately the same. . . . Does anyone have an Idea why the results for the random effects differ ?

I find this disturbing. Could you give us an idea of how big the discrepancy is between PROC MIXED and Stata's mixed?

Perhaps you could attach excerpts from the SAS .LST and .LOG text files and the Stata .log file to illustrate the discrepancy and associated model-fit information. (I assume that you just forgot to copy into your post the CLASS uni_id that you had just before the MODEL statement in your actual the PROC MIXED code.)

Also, have you tried fitting the model using generalized least squares?

Code:

xtreg z_theory_r c.zaq if yr_prog == 2, i(uni_id) re

Does the squared sigma_u value agree more with the variance component from PROC MIXED or Stata's mixed (or neither)?

I assume that you've verified that the data=yr_2 dataset loaded into SAS for PROC MIXED gives identical summary statistics to the dataset loaded into Stata with the if yr_prog==2 restriction.

It wouldn't be possible to share the yr_2 SAS datasset would it? (Or at least the three pertinent variables of the dataset.) I think that it is important to better understand the nature of the discrepancy and the circumstances under which we should expect such a discrepancy to arise.
1 like
Comment

Announcement

Discrepancy in results between SAS proc mixed and STATA mixed

Comment

Comment

Comment

Comment

Comment

Comment

Comment