Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • help with the error message after xtmixed

    Hi all, I am using xtmixed to run a multilevel regression model and I have two types of continuous outcomes: one for student reading score and the other one is student math score. So there is no problem with the xtmixed model using math scores as the outcome measure and my code is like the following:

    Code:
    xtmixed MATH AGEYEARS AGEYRS_SQR ANYSPED || SCHID:, || CHILDID: AGEYEARS ANYSPED, mle cov(un) var pweight(C1_7FC0)
    then, I replaced the math outcome with reading scores and all of a sudden, stata can't give me output, instead it spits out an error. Below is the code for reading outcomes

    Code:
    xtmixed READ AGEYEARS AGEYRS_SQR ANYSPED || SCHID:, || CHILDID: AGEYEARS ANYSPED, mle cov(un) var pweight(C1_7FC0)
    and here is the error message

    Code:
    Hessian is not negative semidefinite
    last estimates not found
    r(301);
     
    end of do-file
     
    r(301);
    All I did is to replace the continuous variable and I don't know what is going on. Any clues for me please?

    Thanks!!!

  • #2
    In Stata 14, the output of help xtmixed tells me
    xtmixed has been renamed to mixed. xtmixed continues to work but, as of Stata 13, is no longer an official part of Stata.
    Assuming you are not using a very old version of Stata, you can find documentation for mixed in the Stata Multi-Level Mixed Effects Reference Manual PDF which, like all Stata PDF documentation, is included in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu.

    Perhaps newer code has better procedures for avoiding or dealing with this fundamental problem.

    Comment


    • #3
      Just to clarify what the message means. -mixed- (or the older -xtmixed-) fits these models by maximum likelihood estimation. Think back to differential calculus. When you want to find the value of x that maximizes y = f(x) you do that by solving the equation df(x)/dx = 0. But, not all solutions of df(x)/dx = 0 are necessarily maxima. To distinguish, you look at the second derivative d2f(x)/dx2. If the second derivative is negative you are at a maximum. Otherwise you might be at a minimum, or if it is zero, you could be at a maximum, minimum, or point of inflection--which requires further exploration.

      So, the likelihood is a very complicated function of the model parameters (regression coefficients, variance components) and the themselves. The values of the regression coefficients and variance components are sought which maximize the likelihood function. The multivariate analog of df(x)/dx is the vector of partial derivatives of the likelihood function with respect to the regression coefficients and variance components. It is known as the gradient. And the analog of the second derivative is a matrix of crossed second partial-derivatives that goes by the name of the Hessian. So, analogously, to know if we have arrived at a maximum of the likelihood function, we would like to be assured that the Hessian is negative definite so we know we are not at a local minimum or a saddle point, etc.. In this case, it turned out not to be.

      Anyway, since the likelihood function itself depends on all of the data, changing the values of any variable will change the likelihood function, and consequently also the gradient and the Hessian. This is not like the X'X matrix in OLS regression, which does not depend on the outcome variable. The likelihood is a function of all of the model variables. So it is not surprising that changing only the dependent variable can result in this problem.

      As for what to do, there is no easy answer here. William Lisowski's suggestion to switch to -mixed- may help, but I don't know if the maximization algorithm used in -mixed- is any different. You may get the same problem. If you do, it suggests that the maximization algorithm has landed on something that may not be a maximum. You can try a few things:

      1. Try -reml- instead of -ml- (Or, if you did -reml- before, try -ml-).
      2. Try giving Stata some starting values. For example, you might just run -regress- with the same variables and use those coefficients as starting points for -mixed-. Or try running it without the top level variance component and then use the starting values from there.
      3. Try the -difficult- option.
      4. Try a different -technique()- option.
      5. Try the -hessian- option. This will cause Stata to display the evolving Hessian matrix with each iteration. If you are very lucky, you will spot a column of zeroes, or a group of columns that are obviously linearly dependent. In that case, removing one of the offending variables from the model will help.

      It is also possible that none of these things will help: it may just be that that particular model cannot be fit to that particular set of data. Unlike simpler models such as one-level linear, logistic, or Poisson, where existence of a maximum is mathematically guaranteed, the likelihood function of a multi-level model may lack any maximum. Or it may have one that is simply hard to reach without getting trapped at a minimum or saddle point en route. You might have to just change your model specification (try starting with a very simple model and keep adding complexity moving towards your target model, and stopping with the step before you hit problems.) Or you might need to restrict your analysis to some subset of the data.

      Comment


      • #4
        Thanks you all for your reply!

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Just to clarify what the message means. -mixed- (or the older -xtmixed-) fits these models by maximum likelihood estimation. Think back to differential calculus. When you want to find the value of x that maximizes y = f(x) you do that by solving the equation df(x)/dx = 0. But, not all solutions of df(x)/dx = 0 are necessarily maxima. To distinguish, you look at the second derivative d2f(x)/dx2. If the second derivative is negative you are at a maximum. Otherwise you might be at a minimum, or if it is zero, you could be at a maximum, minimum, or point of inflection--which requires further exploration.

          So, the likelihood is a very complicated function of the model parameters (regression coefficients, variance components) and the themselves. The values of the regression coefficients and variance components are sought which maximize the likelihood function. The multivariate analog of df(x)/dx is the vector of partial derivatives of the likelihood function with respect to the regression coefficients and variance components. It is known as the gradient. And the analog of the second derivative is a matrix of crossed second partial-derivatives that goes by the name of the Hessian. So, analogously, to know if we have arrived at a maximum of the likelihood function, we would like to be assured that the Hessian is negative definite so we know we are not at a local minimum or a saddle point, etc.. In this case, it turned out not to be.

          Anyway, since the likelihood function itself depends on all of the data, changing the values of any variable will change the likelihood function, and consequently also the gradient and the Hessian. This is not like the X'X matrix in OLS regression, which does not depend on the outcome variable. The likelihood is a function of all of the model variables. So it is not surprising that changing only the dependent variable can result in this problem.

          As for what to do, there is no easy answer here. William Lisowski's suggestion to switch to -mixed- may help, but I don't know if the maximization algorithm used in -mixed- is any different. You may get the same problem. If you do, it suggests that the maximization algorithm has landed on something that may not be a maximum. You can try a few things:

          1. Try -reml- instead of -ml- (Or, if you did -reml- before, try -ml-).
          2. Try giving Stata some starting values. For example, you might just run -regress- with the same variables and use those coefficients as starting points for -mixed-. Or try running it without the top level variance component and then use the starting values from there.
          3. Try the -difficult- option.
          4. Try a different -technique()- option.
          5. Try the -hessian- option. This will cause Stata to display the evolving Hessian matrix with each iteration. If you are very lucky, you will spot a column of zeroes, or a group of columns that are obviously linearly dependent. In that case, removing one of the offending variables from the model will help.

          It is also possible that none of these things will help: it may just be that that particular model cannot be fit to that particular set of data. Unlike simpler models such as one-level linear, logistic, or Poisson, where existence of a maximum is mathematically guaranteed, the likelihood function of a multi-level model may lack any maximum. Or it may have one that is simply hard to reach without getting trapped at a minimum or saddle point en route. You might have to just change your model specification (try starting with a very simple model and keep adding complexity moving towards your target model, and stopping with the step before you hit problems.) Or you might need to restrict your analysis to some subset of the data.
          Hi Clyde, thanks for sharing the 5 possible ways to solve this "Hessian is not negative semidefinite" problem. I also encountered this problem recently. Regarding your #2 suggestion that giving Stata some starting value, I wonder how to set the starting value in stata, by adding the -EM(#)- option in -mixed-? Could you please elaborate this point a little bit more?

          Comment


          • #6
            I suppose you have nothing to lose by trying the emiterate(#) option, but I doubt it will prove helpful. -mixed- begins by running EM estimation and then takes those results as its starting values for ML anyway. One of the drawbacks of EM estimation is that it is slow to converge (and has a higher risk of not converging at all). So adding a few more cycles of EM is unlikely to accomplish much.

            In the post you quoted, I recommended two approaches for starting values: estimate with -regress- (ignoring the multi-level structure) and use those results as starting values, or just leave out the highest level in the model, and, if that converges, use those results as starting values. That said, don't forget about option #5 from that quote: it may be that your model is simply not estimable as written and needs simplification.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              I suppose you have nothing to lose by trying the emiterate(#) option, but I doubt it will prove helpful. -mixed- begins by running EM estimation and then takes those results as its starting values for ML anyway. One of the drawbacks of EM estimation is that it is slow to converge (and has a higher risk of not converging at all). So adding a few more cycles of EM is unlikely to accomplish much.

              In the post you quoted, I recommended two approaches for starting values: estimate with -regress- (ignoring the multi-level structure) and use those results as starting values, or just leave out the highest level in the model, and, if that converges, use those results as starting values. That said, don't forget about option #5 from that quote: it may be that your model is simply not estimable as written and needs simplification.
              Thanks Clyde, I tried the emiterate(#) option. But it didn't work for me. So I wanted to try setting the starting values. However, I really don't know how to set the starting values in the -mixed- command. Is there an option or something in the -mixed- command that can be used to set the starting values? I read the help file of -mixed- but didn't find anything? Could you help me out?

              Comment


              • #8
                Oh, I see. I misunderstood the thrust of your question. Actually, the option is not found under -mixed-, because the starting options are just passed through from -mixed- to the maximization routines. If you go to -help maximize- you will see an option -from()-, and that is what you need. Click on the blue link to -init_specs- to see the details of how to use it. Just add this -from()- option to your -mixed- command.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  Oh, I see. I misunderstood the thrust of your question. Actually, the option is not found under -mixed-, because the starting options are just passed through from -mixed- to the maximization routines. If you go to -help maximize- you will see an option -from()-, and that is what you need. Click on the blue link to -init_specs- to see the details of how to use it. Just add this -from()- option to your -mixed- command.
                  Hi Clyde, thanks for sharing these solutions. But when I add -from(b)- option to my mixed or xtmixed command, it said "option from() not allowed". I saw some examples showing how to use -from- option from -sem- or other model commands, but didn't see examples in -xtmixed-. I don't know how to use this option in xtmixed case. Could you give me some hints?

                  Comment


                  • #10
                    Well, I distinctly remember doing this in the past, and it was possible. But apparently no longer. You can, however, do the equivalent by switching from -mixed- (the current name for what used to be called -xtmixed-) to -meglm- and specifying the from() option there.

                    Code:
                    . webuse pig, clear
                    (Longitudinal analysis of pig weights)
                    
                    . regress weight week
                    
                          Source |       SS           df       MS      Number of obs   =       432
                    -------------+----------------------------------   F(1, 430)       =   5757.41
                           Model |  111060.882         1  111060.882   Prob > F        =    0.0000
                        Residual |  8294.72677       430  19.2900622   R-squared       =    0.9305
                    -------------+----------------------------------   Adj R-squared   =    0.9303
                           Total |  119355.609       431  276.927167   Root MSE        =     4.392
                    
                    ------------------------------------------------------------------------------
                          weight | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                            week |   6.209896   .0818409    75.88   0.000     6.049038    6.370754
                           _cons |   19.35561   .4605447    42.03   0.000     18.45041    20.26081
                    ------------------------------------------------------------------------------
                    
                    . matrix b = e(b)
                    
                    . meglm weight week || id:, from(b)
                    
                    Fitting fixed-effects model:
                    
                    Iteration 0:   log likelihood = -1251.2506  
                    Iteration 1:   log likelihood = -1251.2506  
                    
                    Refining starting values:
                    
                    Grid node 0:   log likelihood = -1150.6253
                    
                    Fitting full model:
                    
                    Iteration 0:   log likelihood = -1150.6253  (not concave)
                    Iteration 1:   log likelihood = -1036.1793  
                    Iteration 2:   log likelihood =  -1017.912  
                    Iteration 3:   log likelihood = -1014.9537  
                    Iteration 4:   log likelihood = -1014.9268  
                    Iteration 5:   log likelihood = -1014.9268  
                    
                    Mixed-effects GLM                               Number of obs     =        432
                    Family: Gaussian
                    Link:   Identity
                    Group variable: id                              Number of groups  =         48
                    
                                                                    Obs per group:
                                                                                  min =          9
                                                                                  avg =        9.0
                                                                                  max =          9
                    
                    Integration method: mvaghermite                 Integration pts.  =          7
                    
                                                                    Wald chi2(1)      =   25337.48
                    Log likelihood = -1014.9268                     Prob > chi2       =     0.0000
                    -------------------------------------------------------------------------------
                           weight | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                    --------------+----------------------------------------------------------------
                             week |   6.209896   .0390124   159.18   0.000     6.133433    6.286359
                            _cons |   19.35561   .5974047    32.40   0.000     18.18472    20.52651
                    --------------+----------------------------------------------------------------
                    id            |
                        var(_cons)|   14.81745   3.124202                      9.801687    22.39989
                    --------------+----------------------------------------------------------------
                     var(e.weight)|   4.383264   .3163349                      3.805112    5.049261
                    -------------------------------------------------------------------------------
                    LR test vs. linear model: chibar2(01) = 472.65        Prob >= chibar2 = 0.0000

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      Well, I distinctly remember doing this in the past, and it was possible. But apparently no longer. You can, however, do the equivalent by switching from -mixed- (the current name for what used to be called -xtmixed-) to -meglm- and specifying the from() option there.

                      Code:
                      . webuse pig, clear
                      (Longitudinal analysis of pig weights)
                      
                      . regress weight week
                      
                      Source | SS df MS Number of obs = 432
                      -------------+---------------------------------- F(1, 430) = 5757.41
                      Model | 111060.882 1 111060.882 Prob > F = 0.0000
                      Residual | 8294.72677 430 19.2900622 R-squared = 0.9305
                      -------------+---------------------------------- Adj R-squared = 0.9303
                      Total | 119355.609 431 276.927167 Root MSE = 4.392
                      
                      ------------------------------------------------------------------------------
                      weight | Coefficient Std. err. t P>|t| [95% conf. interval]
                      -------------+----------------------------------------------------------------
                      week | 6.209896 .0818409 75.88 0.000 6.049038 6.370754
                      _cons | 19.35561 .4605447 42.03 0.000 18.45041 20.26081
                      ------------------------------------------------------------------------------
                      
                      . matrix b = e(b)
                      
                      . meglm weight week || id:, from(b)
                      
                      Fitting fixed-effects model:
                      
                      Iteration 0: log likelihood = -1251.2506
                      Iteration 1: log likelihood = -1251.2506
                      
                      Refining starting values:
                      
                      Grid node 0: log likelihood = -1150.6253
                      
                      Fitting full model:
                      
                      Iteration 0: log likelihood = -1150.6253 (not concave)
                      Iteration 1: log likelihood = -1036.1793
                      Iteration 2: log likelihood = -1017.912
                      Iteration 3: log likelihood = -1014.9537
                      Iteration 4: log likelihood = -1014.9268
                      Iteration 5: log likelihood = -1014.9268
                      
                      Mixed-effects GLM Number of obs = 432
                      Family: Gaussian
                      Link: Identity
                      Group variable: id Number of groups = 48
                      
                      Obs per group:
                      min = 9
                      avg = 9.0
                      max = 9
                      
                      Integration method: mvaghermite Integration pts. = 7
                      
                      Wald chi2(1) = 25337.48
                      Log likelihood = -1014.9268 Prob > chi2 = 0.0000
                      -------------------------------------------------------------------------------
                      weight | Coefficient Std. err. z P>|z| [95% conf. interval]
                      --------------+----------------------------------------------------------------
                      week | 6.209896 .0390124 159.18 0.000 6.133433 6.286359
                      _cons | 19.35561 .5974047 32.40 0.000 18.18472 20.52651
                      --------------+----------------------------------------------------------------
                      id |
                      var(_cons)| 14.81745 3.124202 9.801687 22.39989
                      --------------+----------------------------------------------------------------
                      var(e.weight)| 4.383264 .3163349 3.805112 5.049261
                      -------------------------------------------------------------------------------
                      LR test vs. linear model: chibar2(01) = 472.65 Prob >= chibar2 = 0.0000
                      Thanks a lot! I'll give it a try!

                      Comment


                      • #12
                        Ashley Li, you can also use the bayes prefix to fit a bayesian model which might be more robust to convergence issues.

                        Comment


                        • #13
                          Originally posted by Jackson Monroe View Post
                          Ashley Li, you can also use the bayes prefix to fit a bayesian model which might be more robust to convergence issues.
                          Thank you for the suggestion!

                          Comment

                          Working...
                          X