A multilevel model succeeds in STATA but fails in R

Runguo Wu

Join Date: Mar 2016

Posts: 8
#1

A multilevel model succeeds in STATA but fails in R

22 May 2019, 08:43

Hi,

I fit a simple variance component multilevel model with a large longitudinal dataset (obs>60,000). The model has three levels: obs->individual->community. The dependent variable 'phi' is a binary variable, with 95% 0 and 5% 1.

I tried both STATA and R ('lme4' package)

The STATA code is:

melogit phi || commid: || idind:

The R code is:

fit <- glmer(phi~ (1 | commid/idind), data = dt, family = binomial("logit"))

The problem is that STATA can quickly estimate this model, while R gives an error message: "Model failed to converge with max|grad| = 0.0377982 (tol = 0.001, component 1)".

I reckon the problem is the unbalanced dependent variable, which R struggles to handle (I tried other more balanced binary dependent variables, R managed).

Does anyone know why STATA and R have so different performance in this case?

Last edited by Runguo Wu; 22 May 2019, 08:45.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#2

22 May 2019, 17:02

Originally posted by Runguo Wu View Post

Does anyone know why STATA and R have so different performance in this case?

I don't know, but there might be differences in model parameterization, algorithm to implement the model, algorithm to find the maximum, tolerances in the maximization. Any number of factors could account for this.

It's also my impression that development in R is driven more by method advancement, where the impetus is on implementing cutting-edge statistical methods by the academic community (there are exceptions, of course). In commercial software development and maintenance (not just Stata, but I sense it also with SAS), there seems to be a different emphasis, more on nuts and bolts software engineering, and implementing a statistical method that has been around long enough that there is a demonstrated commercial interest in it and that there has been a greater understanding and appreciation of its frailties in a wider variety of use cases. The result is sometimes that the implementation is a little more rigorously engineered to accommodate those limitations in the interest of avoiding (paying) customer dissatisfaction.

These are, of course, generalizations, but I suspect that these kinds of difference between the open-source, free-as-in-beer (but not cheap) and proprietary software realms might account for at least some of the examples like yours.
1 like
Comment
German Rodriguez

Join Date: Feb 2017

Posts: 169
#3

22 May 2019, 20:25

I have a similar experience. Stata's -xtlogit- and -melogit- fit multilevel models using adaptive quadrature with 12 and 7 quadrature points by default. R's -glmr- uses by default the Laplace method, which is equivalent to one-point adaptive quadrature. This method can be very inaccurate and sometimes doesn't converge. For models with a single random effect -glmer- has an option to use adaptive quadrature with more points. See https://data.princeton.edu/pop510/hospmle for an example where Stata converges with the default settings, while R fails with Laplace but converges with 12 quadrature points. That page also has links to runs using Bayesian methods.
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4496
#4

23 May 2019, 06:13

you may find relevant information in the following: McCoach, DB, et al. (2018), "Does the package Matter? a comparison of 5 common multilevel modeling software packages", Journal of educational and behavioral statistics, 43(5): 594-627
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#5

23 May 2019, 06:24

Germán, thanks for that. I don't know whether it has anything to do with "engineering" or anything else I was trying to bring up, but the difference seems to be more than just using Laplace quadrature. -xtlogit- won't allow for anything fewer than four integration points, but -melogit- will allow for Laplace single-point quadrature. Although the coefficient estimates are largely similar to those with R's -glmer- (including underestimating the variance component), Stata does converge with Laplace integration, and with gradient element absolute values all of order 10^-7 or better. I doubt that Stata is off—any missteps in getting the gradient right in the context of Laplace integration would have long since been brought to light in complaints of Stata's always going kerflooey with cross-classified models.

.ÿ
.ÿversionÿ15.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿinfileÿhospÿlogincÿdistanceÿdropoutÿcollegeÿmotherÿ///
>ÿÿÿÿÿÿÿÿÿusingÿhttps://data.princeton.edu/pop510/hospital.dat
(1,060ÿobservationsÿread)

.ÿ
.ÿlocalÿlinesizeÿ`c(linesize)'

.ÿsetÿlinesizeÿ102

.ÿ
.ÿmelogitÿhospÿc.(logincÿdistance)ÿi.(dropoutÿcollege)ÿ||ÿmother:ÿ,ÿ///
>ÿÿÿÿÿÿÿÿÿintmethod(laplace)ÿgradientÿ//ÿnolog

Fittingÿfixed-effectsÿmodel:

Iterationÿ0:ÿÿÿlogÿlikelihoodÿ=ÿ-539.11554ÿÿ
Iterationÿ1:ÿÿÿlogÿlikelihoodÿ=ÿ-537.46251ÿÿ
Iterationÿ2:ÿÿÿlogÿlikelihoodÿ=ÿ-537.45771ÿÿ
Iterationÿ3:ÿÿÿlogÿlikelihoodÿ=ÿ-537.45771ÿÿ

Refiningÿstartingÿvalues:

Gridÿnodeÿ0:ÿÿÿlogÿlikelihoodÿ=ÿ-528.09149

Fittingÿfullÿmodel:

------------------------------------------------------------------------------------------------------
Iterationÿ0:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿlikelihoodÿ=ÿ-528.09149
Gradientÿvectorÿ(lengthÿ=ÿ107.6826):
ÿÿÿÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿloading8:ÿÿÿÿÿÿvar9:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿlogincÿÿÿdistanceÿÿÿÿdropoutÿÿÿÿdropoutÿÿÿÿcollegeÿÿÿÿcollegeÿÿÿÿÿÿ_consÿÿÿÿÿÿ_consÿÿÿÿÿÿ_cons
r1ÿÿ-87.45773ÿÿ-58.38878ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-15.81123ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ1.144455ÿÿÿ-16.6433ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-3.022222
------------------------------------------------------------------------------------------------------
Iterationÿ1:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿlikelihoodÿ=ÿ-524.68104
Gradientÿvectorÿ(lengthÿ=ÿ9.428585):
ÿÿÿÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿloading8:ÿÿÿÿÿÿvar9:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿlogincÿÿÿdistanceÿÿÿÿdropoutÿÿÿÿdropoutÿÿÿÿcollegeÿÿÿÿcollegeÿÿÿÿÿÿ_consÿÿÿÿÿÿ_consÿÿÿÿÿÿ_cons
r1ÿÿ-7.555169ÿÿ-5.235591ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-1.422595ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿ.0652ÿÿ-1.469113ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ.4689472
------------------------------------------------------------------------------------------------------
Iterationÿ2:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿlikelihoodÿ=ÿ-524.58269
Gradientÿvectorÿ(lengthÿ=ÿ.3427744):
ÿÿÿÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿloading8:ÿÿÿÿÿÿvar9:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿlogincÿÿÿdistanceÿÿÿÿdropoutÿÿÿÿdropoutÿÿÿÿcollegeÿÿÿÿcollegeÿÿÿÿÿÿ_consÿÿÿÿÿÿ_consÿÿÿÿÿÿ_cons
r1ÿÿ-.2780908ÿÿ-.1703766ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-.0462404ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ.0017095ÿÿ-.0462375ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ.0827806
------------------------------------------------------------------------------------------------------
Iterationÿ3:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿlikelihoodÿ=ÿÿ-524.5819
Gradientÿvectorÿ(lengthÿ=ÿ.0013233):
ÿÿÿÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿloading8:ÿÿÿÿÿÿvar9:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿlogincÿÿÿdistanceÿÿÿÿdropoutÿÿÿÿdropoutÿÿÿÿcollegeÿÿÿÿcollegeÿÿÿÿÿÿ_consÿÿÿÿÿÿ_consÿÿÿÿÿÿ_cons
r1ÿÿÿ.0000876ÿÿÿ.0002478ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ.0000629ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-9.68e-06ÿÿÿ.0000888ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ.0012923
------------------------------------------------------------------------------------------------------
Iterationÿ4:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿlikelihoodÿ=ÿÿ-524.5819
Gradientÿvectorÿ(lengthÿ=ÿ3.19e-07):
ÿÿÿÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿloading8:ÿÿÿÿÿÿvar9:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿlogincÿÿÿdistanceÿÿÿÿdropoutÿÿÿÿdropoutÿÿÿÿcollegeÿÿÿÿcollegeÿÿÿÿÿÿ_consÿÿÿÿÿÿ_consÿÿÿÿÿÿ_cons
r1ÿÿÿ1.63e-07ÿÿÿ1.13e-07ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ3.48e-08ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-2.94e-09ÿÿÿ4.13e-08ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ2.44e-07
------------------------------------------------------------------------------------------------------

Mixed-effectsÿlogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ1,060
Groupÿvariable:ÿÿÿÿÿÿÿÿÿÿmotherÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿ501

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ1
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ2.1
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿ10

Integrationÿmethod:ÿÿÿÿÿlaplace

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(4)ÿÿÿÿÿÿ=ÿÿÿÿÿ112.45
Logÿlikelihoodÿ=ÿÿ-524.5819ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000
------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿhospÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿlogincÿ|ÿÿÿ.5504028ÿÿÿ.0709469ÿÿÿÿÿ7.76ÿÿÿ0.000ÿÿÿÿÿ.4113494ÿÿÿÿ.6894563
ÿÿÿÿdistanceÿ|ÿÿ-.0774142ÿÿÿ.0316873ÿÿÿÿ-2.44ÿÿÿ0.015ÿÿÿÿ-.1395201ÿÿÿ-.0153083
ÿÿÿ1.dropoutÿ|ÿÿ-1.947307ÿÿÿ.2444338ÿÿÿÿ-7.97ÿÿÿ0.000ÿÿÿÿ-2.426388ÿÿÿ-1.468225
ÿÿÿ1.collegeÿ|ÿÿÿ1.023255ÿÿÿ.3726075ÿÿÿÿÿ2.75ÿÿÿ0.006ÿÿÿÿÿ.2929575ÿÿÿÿ1.753552
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-3.294535ÿÿÿ.4667035ÿÿÿÿ-7.06ÿÿÿ0.000ÿÿÿÿ-4.209257ÿÿÿ-2.379813
-------------+----------------------------------------------------------------
motherÿÿÿÿÿÿÿ|
ÿÿÿvar(_cons)|ÿÿÿ1.250743ÿÿÿ.4076078ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.6603401ÿÿÿÿ2.369017
------------------------------------------------------------------------------
LRÿtestÿvs.ÿlogisticÿmodel:ÿchibar2(01)ÿ=ÿ25.75ÿÿÿÿÿÿÿProbÿ>=ÿchibar2ÿ=ÿ0.0000

.ÿ
.ÿassertÿe(converged)

.ÿ
.ÿ//ÿmatrixÿlistÿe(gradient)
.ÿ
.ÿsetÿlinesizeÿ`linesize'

.ÿ
.ÿexit

endÿofÿdo-file

.

Whatever, there still seems to be a difference in what I would call the rigor of the implementation.
Comment
Runguo Wu

Join Date: Mar 2016

Posts: 8
#6

23 May 2019, 07:37

Hi both,

Thank you for your suggestions. I doubt R cannot substitute STATA in some cases. In R, increasing quadrature points only works for single random effect, but does not work for a three-level model.
Comment
German Rodriguez

Join Date: Feb 2017

Posts: 169
#7

23 May 2019, 10:51

Runguo Wu. That was exactly the point. With R you can only fit multilevel logit models accurately if you have a single random effect, which rules out a three-level model such as yours. If you want reliable maximum likelihood estimates, my recommendation is to use Stata, which offers adaptive quadrature for a wide range of models. Alternatively, you can turn Bayesian and use Stan, which can be run from the command line, R or Stata.

Joseph Coveney. I agree that there are implementation differences in what's supposed to be the same method, as shown by the fact that Stata converged with Laplace and R didn't. But I think it is important to keep in mind that the Laplace approximation, no matter how rigorously implemented, is notoriously unreliable for logit models. In the example at hand it underestimates the variance of the random effect by 20%. It is not Stata that's off, its Laplace.
Comment

Announcement

A multilevel model succeeds in STATA but fails in R

Comment

Comment

Comment

Comment

Comment

Comment