Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • melogit model not converging

    I have specified an melogit mixed effects model with survey design parameters, and it does not converge.

    The data are from annual state-wide school-based surveys with two-stage sampling and weights adjusting for a few demographic variables. A total of 15 states are included, each collected in multiple years over a 25-year period. State is included as a random effect and its weight is 1, each being selected with certainty.

    The simplest model that I am testing is based on the following svyset specification

    svyset sitecode, weight(weight2) strata(stratum) || _n, weight(weight)

    and the command

    svy: melogit outcome predictor || sitecode_i:

    Stata support showed me that running this results in more than 30 iterations where the log-likelihood is not-concave, and the log-likelihood also stays unchanged, unable to perform improvements.

    Could anyone suggest a solution for analyzing these data?


  • #2
    Non-converging models are a difficult problem. In this case, I would start by trying to find out if there is a particular parameter that is causing difficulties. So run the model again, and specify the -iterate(#)- option. Choose a value of # that will give you enough iterations to reach, and add on a couple for good measure. Stata will then run the estimation, but will stop and give interim results at that point. These results are incorrect and not usable, but by looking at the output you may be able to identify a particular parameter of the model that is clearly problematic. Most often, in my experience, it is a variance component that is being estimated as some number extremely close to zero. Or it may be one of the fixed effects with a standard error that is obviously absurdly large. Then refining the model by eliminating the problematic parameter from it is your solution.

    Comment


    • #3
      Also, there are a number of maximization options to try. See the help for melogit and for maximize. Also, try the non-survey, unweighted, version of melogit, which might help with starting values.

      I'd also like to remind you of the strong preference on Statalist for registering with full-real names, both to promote collegiality and to show respect for one another. You can reregister by hitting the "contact us" button at the bottom right.
      Last edited by Steve Samuels; 03 Aug 2018, 07:21.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Thank you both for your suggestions. I am trying a number of them, and this is a tough nut to crack.

        The minimum model that I am attempting to fit has only one fixed effect predictor and one random effect variable, so I cannot reduce it beyond that unless I remove the random effect. No number of iterations among a range that I have specified has allowed the model to converge. I have tried all of the startingvalue() options, one startgrid, the "difficult" option, and some combinations of them.
        melogit, without the survey design parameters has resulted in an error message, "cannot compute an improvement -- discontinuous region encountered".

        The one model so far that has not resulted in an error message is with meqrlogit, without the survey design parameters.
        The command and output follow. If this is a useful start, would either of you be able to advise how to identify and apply the starting follows to the model with survey design variables? Many thanks!

        meqrlogit outcome predictor || state:

        Refining starting values:

        Iteration 0: log likelihood = -195483.17 (not concave)
        Iteration 1: log likelihood = -195351.85 (not concave)
        Iteration 2: log likelihood = -195343.46 (backed up)

        Performing gradient-based optimization:

        Iteration 0: log likelihood = -195343.46
        Iteration 1: log likelihood = -195341.4
        Iteration 2: log likelihood = -195341.24
        Iteration 3: log likelihood = -195341.24

        Mixed-effects logistic regression Number of obs = 449,775
        Group variable: sitecode_i Number of groups = 15

        Obs per group:
        min = 10,804
        avg = 29,985.0
        max = 101,742

        Integration points = 7 Wald chi2(1) = 511.26
        Log likelihood = -195341.24 Prob > chi2 = 0.0000

        ------------------------------------------------------------------------------
        couse4 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        predictor | -.2222694 .0098301 -22.61 0.000 -.2415361 -.2030027
        _cons | -1.624712 .0815647 -19.92 0.000 -1.784575 -1.464848
        ------------------------------------------------------------------------------

        ------------------------------------------------------------------------------
        Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
        -----------------------------+------------------------------------------------
        state: Identity |
        var(_cons) | .0989762 .0362949 .0482383 .2030811
        ------------------------------------------------------------------------------
        LR test vs. logistic model: chibar2(01) = 4159.32 Prob >= chibar2 = 0.0000


        Comment


        • #5
          So, after running your -meqrlogit- successfully, do this:

          Code:
          matrix b = e(b)
          svyset sitecode, weight(weight2) strata(stratum) || _n, weight(weight)
          svy: melogit outcome predictor || sitecode_i:, from(b) refineopts(iterate(0))
          This will cause Stata to use the coefficients estimated by -meqrlogit- as starting values for the -svy: melogit- command.

          Good luck!

          Added: See -help maximize- for more information about -from()- and -refineopts()-.

          Comment


          • #6
            Many thanks. This produced the error message, "option refineopts() not allowed". Perhaps it should be modified in some way?

            By the way, I read that meqrlogit differs from melogit in using the QR decomposition of the variance-components matrix. I wonder if an option to use this exists for melogit?

            Comment


            • #7
              Hmm, -refineopts()- is a -maxmize- option; and most of Stata's maximum likelihood estimators allow most -maxmize- options, but on further inquiry it does appear that -melogit- does not allow -refineopts()-. So just leave out that part and go with -from(b)-.

              The "option" to use the QR decomposition in -melogit- is to use -meqrlogit- instead! This is precisely the difference between the two commands. Unfortunately, -svy:- does not support -meqrlogit-, so I'm afraid you are stuck with -melogit-'s estimation approach.

              Comment


              • #8
                Clyde,

                Thank you again.

                With -from(b)- , the following error message results:

                "initial vector: extra parameter eq1:ip_i found
                specify skip option if necessary"

                I added a skip option as follows:

                svy: melogit outcome predictor || sitecode_i:, from(b, skip)

                and received the following error message:

                "initial values not feasible"

                Any hope left for meglm on these data?


                Comment


                • #9
                  Perhaps the model is too simple. As is, it assumes that the logits of site means have a Gaussian distribution centered around a single mean. If, in fact, site proportions are grouped around different means,swith sparse areas between, this could account for some of the difficulty. Perhaps the groupings are associated with known site chacteristics. I suggest that you plot histograms of the crude site proportions and their logits.
                  Steve Samuels
                  Statistical Consulting
                  [email protected]

                  Stata 14.2

                  Comment


                  • #10
                    Thank you. I do not see groupings of site proportions around different means, at least they are generally not wide apart or so obvious to me. Each site (i.e. state), though, was surveyed annually over a number of years, and I am not sure whether I should look at the cumulative, or individual year site means. I also would have to inquire how to address use those groupings in the analysis.

                    I have begun to revisit gllamm for these data, and it is not running to convergence issues. However, I am getting an error message, "1m2 invalid name", for example with the following analysis:

                    gllamm outcome predictor sitecode_i i(state) family(binomial) link(logit) pweight(weight) nip(12) nrf(1) adapt trace eform

                    The output shows fixed effects, but not random effects, whether or not they are valid to interpret.

                    With continued appreciation for advice on getting a valid analysis for these data.

                    Comment

                    Working...
                    X