Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A multilevel model succeeds in STATA but fails in R

    Hi,

    I fit a simple variance component multilevel model with a large longitudinal dataset (obs>60,000). The model has three levels: obs->individual->community. The dependent variable 'phi' is a binary variable, with 95% 0 and 5% 1.

    I tried both STATA and R ('lme4' package)

    The STATA code is:

    melogit phi || commid: || idind:

    The R code is:

    fit <- glmer(phi~ (1 | commid/idind), data = dt, family = binomial("logit"))


    The problem is that STATA can quickly estimate this model, while R gives an error message: "Model failed to converge with max|grad| = 0.0377982 (tol = 0.001, component 1)".

    I reckon the problem is the unbalanced dependent variable, which R struggles to handle (I tried other more balanced binary dependent variables, R managed).


    Does anyone know why STATA and R have so different performance in this case?
    Last edited by Runguo Wu; 22 May 2019, 08:45.

  • #2
    Originally posted by Runguo Wu View Post
    Does anyone know why STATA and R have so different performance in this case?
    I don't know, but there might be differences in model parameterization, algorithm to implement the model, algorithm to find the maximum, tolerances in the maximization. Any number of factors could account for this.

    It's also my impression that development in R is driven more by method advancement, where the impetus is on implementing cutting-edge statistical methods by the academic community (there are exceptions, of course). In commercial software development and maintenance (not just Stata, but I sense it also with SAS), there seems to be a different emphasis, more on nuts and bolts software engineering, and implementing a statistical method that has been around long enough that there is a demonstrated commercial interest in it and that there has been a greater understanding and appreciation of its frailties in a wider variety of use cases. The result is sometimes that the implementation is a little more rigorously engineered to accommodate those limitations in the interest of avoiding (paying) customer dissatisfaction.

    These are, of course, generalizations, but I suspect that these kinds of difference between the open-source, free-as-in-beer (but not cheap) and proprietary software realms might account for at least some of the examples like yours.

    Comment


    • #3
      I have a similar experience. Stata's -xtlogit- and -melogit- fit multilevel models using adaptive quadrature with 12 and 7 quadrature points by default. R's -glmr- uses by default the Laplace method, which is equivalent to one-point adaptive quadrature. This method can be very inaccurate and sometimes doesn't converge. For models with a single random effect -glmer- has an option to use adaptive quadrature with more points. See https://data.princeton.edu/pop510/hospmle for an example where Stata converges with the default settings, while R fails with Laplace but converges with 12 quadrature points. That page also has links to runs using Bayesian methods.

      Comment


      • #4
        you may find relevant information in the following: McCoach, DB, et al. (2018), "Does the package Matter? a comparison of 5 common multilevel modeling software packages", Journal of educational and behavioral statistics, 43(5): 594-627

        Comment


        • #5
          Germán, thanks for that. I don't know whether it has anything to do with "engineering" or anything else I was trying to bring up, but the difference seems to be more than just using Laplace quadrature. -xtlogit- won't allow for anything fewer than four integration points, but -melogit- will allow for Laplace single-point quadrature. Although the coefficient estimates are largely similar to those with R's -glmer- (including underestimating the variance component), Stata does converge with Laplace integration, and with gradient element absolute values all of order 10-7 or better. I doubt that Stata is off—any missteps in getting the gradient right in the context of Laplace integration would have long since been brought to light in complaints of Stata's always going kerflooey with cross-classified models.

          .ÿ
          .ÿversionÿ15.1

          .ÿ
          .ÿclearÿ*

          .ÿ
          .ÿinfileÿhospÿlogincÿdistanceÿdropoutÿcollegeÿmotherÿ///
          >ÿÿÿÿÿÿÿÿÿusingÿhttps://data.princeton.edu/pop510/hospital.dat
          (1,060ÿobservationsÿread)

          .ÿ
          .ÿlocalÿlinesizeÿ`c(linesize)'

          .ÿsetÿlinesizeÿ102

          .ÿ
          .ÿmelogitÿhospÿc.(logincÿdistance)ÿi.(dropoutÿcollege)ÿ||ÿmother:ÿ,ÿ///
          >ÿÿÿÿÿÿÿÿÿintmethod(laplace)ÿgradientÿ//ÿnolog

          Fittingÿfixed-effectsÿmodel:

          Iterationÿ0:ÿÿÿlogÿlikelihoodÿ=ÿ-539.11554ÿÿ
          Iterationÿ1:ÿÿÿlogÿlikelihoodÿ=ÿ-537.46251ÿÿ
          Iterationÿ2:ÿÿÿlogÿlikelihoodÿ=ÿ-537.45771ÿÿ
          Iterationÿ3:ÿÿÿlogÿlikelihoodÿ=ÿ-537.45771ÿÿ

          Refiningÿstartingÿvalues:

          Gridÿnodeÿ0:ÿÿÿlogÿlikelihoodÿ=ÿ-528.09149

          Fittingÿfullÿmodel:

          ------------------------------------------------------------------------------------------------------
          Iterationÿ0:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿlikelihoodÿ=ÿ-528.09149
          Gradientÿvectorÿ(lengthÿ=ÿ107.6826):
          ÿÿÿÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿloading8:ÿÿÿÿÿÿvar9:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
          ÿÿÿÿÿÿÿlogincÿÿÿdistanceÿÿÿÿdropoutÿÿÿÿdropoutÿÿÿÿcollegeÿÿÿÿcollegeÿÿÿÿÿÿ_consÿÿÿÿÿÿ_consÿÿÿÿÿÿ_cons
          r1ÿÿ-87.45773ÿÿ-58.38878ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-15.81123ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ1.144455ÿÿÿ-16.6433ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-3.022222
          ------------------------------------------------------------------------------------------------------
          Iterationÿ1:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿlikelihoodÿ=ÿ-524.68104
          Gradientÿvectorÿ(lengthÿ=ÿ9.428585):
          ÿÿÿÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿloading8:ÿÿÿÿÿÿvar9:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
          ÿÿÿÿÿÿÿlogincÿÿÿdistanceÿÿÿÿdropoutÿÿÿÿdropoutÿÿÿÿcollegeÿÿÿÿcollegeÿÿÿÿÿÿ_consÿÿÿÿÿÿ_consÿÿÿÿÿÿ_cons
          r1ÿÿ-7.555169ÿÿ-5.235591ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-1.422595ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿ.0652ÿÿ-1.469113ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ.4689472
          ------------------------------------------------------------------------------------------------------
          Iterationÿ2:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿlikelihoodÿ=ÿ-524.58269
          Gradientÿvectorÿ(lengthÿ=ÿ.3427744):
          ÿÿÿÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿloading8:ÿÿÿÿÿÿvar9:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
          ÿÿÿÿÿÿÿlogincÿÿÿdistanceÿÿÿÿdropoutÿÿÿÿdropoutÿÿÿÿcollegeÿÿÿÿcollegeÿÿÿÿÿÿ_consÿÿÿÿÿÿ_consÿÿÿÿÿÿ_cons
          r1ÿÿ-.2780908ÿÿ-.1703766ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-.0462404ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ.0017095ÿÿ-.0462375ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ.0827806
          ------------------------------------------------------------------------------------------------------
          Iterationÿ3:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿlikelihoodÿ=ÿÿ-524.5819
          Gradientÿvectorÿ(lengthÿ=ÿ.0013233):
          ÿÿÿÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿloading8:ÿÿÿÿÿÿvar9:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
          ÿÿÿÿÿÿÿlogincÿÿÿdistanceÿÿÿÿdropoutÿÿÿÿdropoutÿÿÿÿcollegeÿÿÿÿcollegeÿÿÿÿÿÿ_consÿÿÿÿÿÿ_consÿÿÿÿÿÿ_cons
          r1ÿÿÿ.0000876ÿÿÿ.0002478ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ.0000629ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-9.68e-06ÿÿÿ.0000888ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ.0012923
          ------------------------------------------------------------------------------------------------------
          Iterationÿ4:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿlikelihoodÿ=ÿÿ-524.5819
          Gradientÿvectorÿ(lengthÿ=ÿ3.19e-07):
          ÿÿÿÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿÿÿÿÿÿeq1:ÿÿloading8:ÿÿÿÿÿÿvar9:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿ0b.ÿÿÿÿÿÿÿÿÿ1.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
          ÿÿÿÿÿÿÿlogincÿÿÿdistanceÿÿÿÿdropoutÿÿÿÿdropoutÿÿÿÿcollegeÿÿÿÿcollegeÿÿÿÿÿÿ_consÿÿÿÿÿÿ_consÿÿÿÿÿÿ_cons
          r1ÿÿÿ1.63e-07ÿÿÿ1.13e-07ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ3.48e-08ÿÿÿÿÿÿÿÿÿÿ0ÿÿ-2.94e-09ÿÿÿ4.13e-08ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿ2.44e-07
          ------------------------------------------------------------------------------------------------------

          Mixed-effectsÿlogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ1,060
          Groupÿvariable:ÿÿÿÿÿÿÿÿÿÿmotherÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿ501

          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ1
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ2.1
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿ10

          Integrationÿmethod:ÿÿÿÿÿlaplace

          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(4)ÿÿÿÿÿÿ=ÿÿÿÿÿ112.45
          Logÿlikelihoodÿ=ÿÿ-524.5819ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000
          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿhospÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿÿlogincÿ|ÿÿÿ.5504028ÿÿÿ.0709469ÿÿÿÿÿ7.76ÿÿÿ0.000ÿÿÿÿÿ.4113494ÿÿÿÿ.6894563
          ÿÿÿÿdistanceÿ|ÿÿ-.0774142ÿÿÿ.0316873ÿÿÿÿ-2.44ÿÿÿ0.015ÿÿÿÿ-.1395201ÿÿÿ-.0153083
          ÿÿÿ1.dropoutÿ|ÿÿ-1.947307ÿÿÿ.2444338ÿÿÿÿ-7.97ÿÿÿ0.000ÿÿÿÿ-2.426388ÿÿÿ-1.468225
          ÿÿÿ1.collegeÿ|ÿÿÿ1.023255ÿÿÿ.3726075ÿÿÿÿÿ2.75ÿÿÿ0.006ÿÿÿÿÿ.2929575ÿÿÿÿ1.753552
          ÿÿÿÿÿÿÿ_consÿ|ÿÿ-3.294535ÿÿÿ.4667035ÿÿÿÿ-7.06ÿÿÿ0.000ÿÿÿÿ-4.209257ÿÿÿ-2.379813
          -------------+----------------------------------------------------------------
          motherÿÿÿÿÿÿÿ|
          ÿÿÿvar(_cons)|ÿÿÿ1.250743ÿÿÿ.4076078ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.6603401ÿÿÿÿ2.369017
          ------------------------------------------------------------------------------
          LRÿtestÿvs.ÿlogisticÿmodel:ÿchibar2(01)ÿ=ÿ25.75ÿÿÿÿÿÿÿProbÿ>=ÿchibar2ÿ=ÿ0.0000

          .ÿ
          .ÿassertÿe(converged)

          .ÿ
          .ÿ//ÿmatrixÿlistÿe(gradient)
          .ÿ
          .ÿsetÿlinesizeÿ`linesize'

          .ÿ
          .ÿexit

          endÿofÿdo-file


          .


          Whatever, there still seems to be a difference in what I would call the rigor of the implementation.

          Comment


          • #6
            Hi both,

            Thank you for your suggestions. I doubt R cannot substitute STATA in some cases. In R, increasing quadrature points only works for single random effect, but does not work for a three-level model.

            Comment


            • #7
              Runguo Wu. That was exactly the point. With R you can only fit multilevel logit models accurately if you have a single random effect, which rules out a three-level model such as yours. If you want reliable maximum likelihood estimates, my recommendation is to use Stata, which offers adaptive quadrature for a wide range of models. Alternatively, you can turn Bayesian and use Stan, which can be run from the command line, R or Stata.

              Joseph Coveney. I agree that there are implementation differences in what's supposed to be the same method, as shown by the fact that Stata converged with Laplace and R didn't. But I think it is important to keep in mind that the Laplace approximation, no matter how rigorously implemented, is notoriously unreliable for logit models. In the example at hand it underestimates the variance of the random effect by 20%. It is not Stata that's off, its Laplace.

              Comment

              Working...
              X