help with the error message after xtmixed

Man Yang

Join Date: Mar 2016

Posts: 183
#1

help with the error message after xtmixed

06 Mar 2017, 16:42

Hi all, I am using xtmixed to run a multilevel regression model and I have two types of continuous outcomes: one for student reading score and the other one is student math score. So there is no problem with the xtmixed model using math scores as the outcome measure and my code is like the following:

Code:

xtmixed MATH AGEYEARS AGEYRS_SQR ANYSPED || SCHID:, || CHILDID: AGEYEARS ANYSPED, mle cov(un) var pweight(C1_7FC0)

then, I replaced the math outcome with reading scores and all of a sudden, stata can't give me output, instead it spits out an error. Below is the code for reading outcomes

Code:

xtmixed READ AGEYEARS AGEYRS_SQR ANYSPED || SCHID:, || CHILDID: AGEYEARS ANYSPED, mle cov(un) var pweight(C1_7FC0)

and here is the error message

Code:

Hessian is not negative semidefinite last estimates not found r(301); end of do-file r(301);

All I did is to replace the continuous variable and I don't know what is going on. Any clues for me please?

Thanks!!!
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

06 Mar 2017, 17:40

In Stata 14, the output of help xtmixed tells me

xtmixed has been renamed to mixed. xtmixed continues to work but, as of Stata 13, is no longer an official part of Stata.

Assuming you are not using a very old version of Stata, you can find documentation for mixed in the Stata Multi-Level Mixed Effects Reference Manual PDF which, like all Stata PDF documentation, is included in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu.

Perhaps newer code has better procedures for avoiding or dealing with this fundamental problem.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#3

06 Mar 2017, 19:07

Just to clarify what the message means. -mixed- (or the older -xtmixed-) fits these models by maximum likelihood estimation. Think back to differential calculus. When you want to find the value of x that maximizes y = f(x) you do that by solving the equation df(x)/dx = 0. But, not all solutions of df(x)/dx = 0 are necessarily maxima. To distinguish, you look at the second derivative d²f(x)/dx². If the second derivative is negative you are at a maximum. Otherwise you might be at a minimum, or if it is zero, you could be at a maximum, minimum, or point of inflection--which requires further exploration.

So, the likelihood is a very complicated function of the model parameters (regression coefficients, variance components) and the themselves. The values of the regression coefficients and variance components are sought which maximize the likelihood function. The multivariate analog of df(x)/dx is the vector of partial derivatives of the likelihood function with respect to the regression coefficients and variance components. It is known as the gradient. And the analog of the second derivative is a matrix of crossed second partial-derivatives that goes by the name of the Hessian. So, analogously, to know if we have arrived at a maximum of the likelihood function, we would like to be assured that the Hessian is negative definite so we know we are not at a local minimum or a saddle point, etc.. In this case, it turned out not to be.

Anyway, since the likelihood function itself depends on all of the data, changing the values of any variable will change the likelihood function, and consequently also the gradient and the Hessian. This is not like the X'X matrix in OLS regression, which does not depend on the outcome variable. The likelihood is a function of all of the model variables. So it is not surprising that changing only the dependent variable can result in this problem.

As for what to do, there is no easy answer here. William Lisowski's suggestion to switch to -mixed- may help, but I don't know if the maximization algorithm used in -mixed- is any different. You may get the same problem. If you do, it suggests that the maximization algorithm has landed on something that may not be a maximum. You can try a few things:

1. Try -reml- instead of -ml- (Or, if you did -reml- before, try -ml-).
2. Try giving Stata some starting values. For example, you might just run -regress- with the same variables and use those coefficients as starting points for -mixed-. Or try running it without the top level variance component and then use the starting values from there.
3. Try the -difficult- option.
4. Try a different -technique()- option.
5. Try the -hessian- option. This will cause Stata to display the evolving Hessian matrix with each iteration. If you are very lucky, you will spot a column of zeroes, or a group of columns that are obviously linearly dependent. In that case, removing one of the offending variables from the model will help.

It is also possible that none of these things will help: it may just be that that particular model cannot be fit to that particular set of data. Unlike simpler models such as one-level linear, logistic, or Poisson, where existence of a maximum is mathematically guaranteed, the likelihood function of a multi-level model may lack any maximum. Or it may have one that is simply hard to reach without getting trapped at a minimum or saddle point en route. You might have to just change your model specification (try starting with a very simple model and keep adding complexity moving towards your target model, and stopping with the step before you hit problems.) Or you might need to restrict your analysis to some subset of the data.
1 like
Comment
Man Yang

Join Date: Mar 2016

Posts: 183
#4

06 Mar 2017, 20:32

Thanks you all for your reply!
Comment
Victor Smith

Join Date: Feb 2018

Posts: 85
#5

21 Nov 2019, 07:07

Originally posted by Clyde Schechter View Post

Just to clarify what the message means. -mixed- (or the older -xtmixed-) fits these models by maximum likelihood estimation. Think back to differential calculus. When you want to find the value of x that maximizes y = f(x) you do that by solving the equation df(x)/dx = 0. But, not all solutions of df(x)/dx = 0 are necessarily maxima. To distinguish, you look at the second derivative d²f(x)/dx². If the second derivative is negative you are at a maximum. Otherwise you might be at a minimum, or if it is zero, you could be at a maximum, minimum, or point of inflection--which requires further exploration.

So, the likelihood is a very complicated function of the model parameters (regression coefficients, variance components) and the themselves. The values of the regression coefficients and variance components are sought which maximize the likelihood function. The multivariate analog of df(x)/dx is the vector of partial derivatives of the likelihood function with respect to the regression coefficients and variance components. It is known as the gradient. And the analog of the second derivative is a matrix of crossed second partial-derivatives that goes by the name of the Hessian. So, analogously, to know if we have arrived at a maximum of the likelihood function, we would like to be assured that the Hessian is negative definite so we know we are not at a local minimum or a saddle point, etc.. In this case, it turned out not to be.

Anyway, since the likelihood function itself depends on all of the data, changing the values of any variable will change the likelihood function, and consequently also the gradient and the Hessian. This is not like the X'X matrix in OLS regression, which does not depend on the outcome variable. The likelihood is a function of all of the model variables. So it is not surprising that changing only the dependent variable can result in this problem.

As for what to do, there is no easy answer here. William Lisowski's suggestion to switch to -mixed- may help, but I don't know if the maximization algorithm used in -mixed- is any different. You may get the same problem. If you do, it suggests that the maximization algorithm has landed on something that may not be a maximum. You can try a few things:

1. Try -reml- instead of -ml- (Or, if you did -reml- before, try -ml-).
2. Try giving Stata some starting values. For example, you might just run -regress- with the same variables and use those coefficients as starting points for -mixed-. Or try running it without the top level variance component and then use the starting values from there.
3. Try the -difficult- option.
4. Try a different -technique()- option.
5. Try the -hessian- option. This will cause Stata to display the evolving Hessian matrix with each iteration. If you are very lucky, you will spot a column of zeroes, or a group of columns that are obviously linearly dependent. In that case, removing one of the offending variables from the model will help.

It is also possible that none of these things will help: it may just be that that particular model cannot be fit to that particular set of data. Unlike simpler models such as one-level linear, logistic, or Poisson, where existence of a maximum is mathematically guaranteed, the likelihood function of a multi-level model may lack any maximum. Or it may have one that is simply hard to reach without getting trapped at a minimum or saddle point en route. You might have to just change your model specification (try starting with a very simple model and keep adding complexity moving towards your target model, and stopping with the step before you hit problems.) Or you might need to restrict your analysis to some subset of the data.

Hi Clyde, thanks for sharing the 5 possible ways to solve this "Hessian is not negative semidefinite" problem. I also encountered this problem recently. Regarding your #2 suggestion that giving Stata some starting value, I wonder how to set the starting value in stata, by adding the -EM(#)- option in -mixed-? Could you please elaborate this point a little bit more?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#6

21 Nov 2019, 11:01

I suppose you have nothing to lose by trying the emiterate(#) option, but I doubt it will prove helpful. -mixed- begins by running EM estimation and then takes those results as its starting values for ML anyway. One of the drawbacks of EM estimation is that it is slow to converge (and has a higher risk of not converging at all). So adding a few more cycles of EM is unlikely to accomplish much.

In the post you quoted, I recommended two approaches for starting values: estimate with -regress- (ignoring the multi-level structure) and use those results as starting values, or just leave out the highest level in the model, and, if that converges, use those results as starting values. That said, don't forget about option #5 from that quote: it may be that your model is simply not estimable as written and needs simplification.
Comment
Victor Smith

Join Date: Feb 2018

Posts: 85
#7

21 Nov 2019, 18:45

Originally posted by Clyde Schechter View Post

I suppose you have nothing to lose by trying the emiterate(#) option, but I doubt it will prove helpful. -mixed- begins by running EM estimation and then takes those results as its starting values for ML anyway. One of the drawbacks of EM estimation is that it is slow to converge (and has a higher risk of not converging at all). So adding a few more cycles of EM is unlikely to accomplish much.

In the post you quoted, I recommended two approaches for starting values: estimate with -regress- (ignoring the multi-level structure) and use those results as starting values, or just leave out the highest level in the model, and, if that converges, use those results as starting values. That said, don't forget about option #5 from that quote: it may be that your model is simply not estimable as written and needs simplification.

Thanks Clyde, I tried the emiterate(#) option. But it didn't work for me. So I wanted to try setting the starting values. However, I really don't know how to set the starting values in the -mixed- command. Is there an option or something in the -mixed- command that can be used to set the starting values? I read the help file of -mixed- but didn't find anything? Could you help me out?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#8

21 Nov 2019, 21:04

Oh, I see. I misunderstood the thrust of your question. Actually, the option is not found under -mixed-, because the starting options are just passed through from -mixed- to the maximization routines. If you go to -help maximize- you will see an option -from()-, and that is what you need. Click on the blue link to -init_specs- to see the details of how to use it. Just add this -from()- option to your -mixed- command.
Comment
Ashley Li

Join Date: Nov 2021

Posts: 3
#9

18 Nov 2021, 17:39

Originally posted by Clyde Schechter View Post

Oh, I see. I misunderstood the thrust of your question. Actually, the option is not found under -mixed-, because the starting options are just passed through from -mixed- to the maximization routines. If you go to -help maximize- you will see an option -from()-, and that is what you need. Click on the blue link to -init_specs- to see the details of how to use it. Just add this -from()- option to your -mixed- command.

Hi Clyde, thanks for sharing these solutions. But when I add -from(b)- option to my mixed or xtmixed command, it said "option from() not allowed". I saw some examples showing how to use -from- option from -sem- or other model commands, but didn't see examples in -xtmixed-. I don't know how to use this option in xtmixed case. Could you give me some hints?
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30091

#10

18 Nov 2021, 17:53

Well, I distinctly remember doing this in the past, and it was possible. But apparently no longer. You can, however, do the equivalent by switching from -mixed- (the current name for what used to be called -xtmixed-) to -meglm- and specifying the from() option there.

Code:

. webuse pig, clear
(Longitudinal analysis of pig weights)

. regress weight week

      Source |       SS           df       MS      Number of obs   =       432
-------------+----------------------------------   F(1, 430)       =   5757.41
       Model |  111060.882         1  111060.882   Prob > F        =    0.0000
    Residual |  8294.72677       430  19.2900622   R-squared       =    0.9305
-------------+----------------------------------   Adj R-squared   =    0.9303
       Total |  119355.609       431  276.927167   Root MSE        =     4.392

------------------------------------------------------------------------------
      weight | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        week |   6.209896   .0818409    75.88   0.000     6.049038    6.370754
       _cons |   19.35561   .4605447    42.03   0.000     18.45041    20.26081
------------------------------------------------------------------------------

. matrix b = e(b)

. meglm weight week || id:, from(b)

Fitting fixed-effects model:

Iteration 0:   log likelihood = -1251.2506  
Iteration 1:   log likelihood = -1251.2506  

Refining starting values:

Grid node 0:   log likelihood = -1150.6253

Fitting full model:

Iteration 0:   log likelihood = -1150.6253  (not concave)
Iteration 1:   log likelihood = -1036.1793  
Iteration 2:   log likelihood =  -1017.912  
Iteration 3:   log likelihood = -1014.9537  
Iteration 4:   log likelihood = -1014.9268  
Iteration 5:   log likelihood = -1014.9268  

Mixed-effects GLM                               Number of obs     =        432
Family: Gaussian
Link:   Identity
Group variable: id                              Number of groups  =         48

                                                Obs per group:
                                                              min =          9
                                                              avg =        9.0
                                                              max =          9

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(1)      =   25337.48
Log likelihood = -1014.9268                     Prob > chi2       =     0.0000
-------------------------------------------------------------------------------
       weight | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
         week |   6.209896   .0390124   159.18   0.000     6.133433    6.286359
        _cons |   19.35561   .5974047    32.40   0.000     18.18472    20.52651
--------------+----------------------------------------------------------------
id            |
    var(_cons)|   14.81745   3.124202                      9.801687    22.39989
--------------+----------------------------------------------------------------
 var(e.weight)|   4.383264   .3163349                      3.805112    5.049261
-------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 472.65        Prob >= chibar2 = 0.0000

Comment

Ashley Li

Join Date: Nov 2021
Posts: 3

#11

19 Nov 2021, 10:16

Originally posted by Clyde Schechter View Post

Code:

. webuse pig, clear
(Longitudinal analysis of pig weights)

. regress weight week

Source | SS df MS Number of obs = 432
-------------+---------------------------------- F(1, 430) = 5757.41
Model | 111060.882 1 111060.882 Prob > F = 0.0000
Residual | 8294.72677 430 19.2900622 R-squared = 0.9305
-------------+---------------------------------- Adj R-squared = 0.9303
Total | 119355.609 431 276.927167 Root MSE = 4.392

------------------------------------------------------------------------------
weight | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
week | 6.209896 .0818409 75.88 0.000 6.049038 6.370754
_cons | 19.35561 .4605447 42.03 0.000 18.45041 20.26081
------------------------------------------------------------------------------

. matrix b = e(b)

. meglm weight week || id:, from(b)

Fitting fixed-effects model:

Iteration 0: log likelihood = -1251.2506
Iteration 1: log likelihood = -1251.2506

Refining starting values:

Grid node 0: log likelihood = -1150.6253

Fitting full model:

Iteration 0: log likelihood = -1150.6253 (not concave)
Iteration 1: log likelihood = -1036.1793
Iteration 2: log likelihood = -1017.912
Iteration 3: log likelihood = -1014.9537
Iteration 4: log likelihood = -1014.9268
Iteration 5: log likelihood = -1014.9268

Mixed-effects GLM Number of obs = 432
Family: Gaussian
Link: Identity
Group variable: id Number of groups = 48

Obs per group:
min = 9
avg = 9.0
max = 9

Integration method: mvaghermite Integration pts. = 7

Wald chi2(1) = 25337.48
Log likelihood = -1014.9268 Prob > chi2 = 0.0000
-------------------------------------------------------------------------------
weight | Coefficient Std. err. z P>|z| [95% conf. interval]
--------------+----------------------------------------------------------------
week | 6.209896 .0390124 159.18 0.000 6.133433 6.286359
_cons | 19.35561 .5974047 32.40 0.000 18.18472 20.52651
--------------+----------------------------------------------------------------
id |
var(_cons)| 14.81745 3.124202 9.801687 22.39989
--------------+----------------------------------------------------------------
var(e.weight)| 4.383264 .3163349 3.805112 5.049261
-------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 472.65 Prob >= chibar2 = 0.0000

Thanks a lot! I'll give it a try!

Comment

Jackson Monroe

Join Date: Jul 2019

Posts: 60
#12

19 Nov 2021, 18:31

Ashley Li, you can also use the bayes prefix to fit a bayesian model which might be more robust to convergence issues.
Comment
Ashley Li

Join Date: Nov 2021

Posts: 3
#13

25 Nov 2021, 11:25

Originally posted by Jackson Monroe View Post

Ashley Li, you can also use the bayes prefix to fit a bayesian model which might be more robust to convergence issues.

Thank you for the suggestion!
Comment

Announcement

help with the error message after xtmixed

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment