Logit model convergence not achieved when adding non-linearities

Amy Duma

Join Date: Feb 2018

Posts: 13
#1

Logit model convergence not achieved when adding non-linearities

12 Jan 2019, 12:54

Hi,

I am working a logit model with a binary dependent variable. Some of the independent variables I am using are continuous, and from some preliminary graphical analysis (and theory) I have reasons to believe that they have a non-linear relationship with my dependent variable. However, when I try to add squared terms for one of these variables, my model does not achieve convergence. I have found that one way around this is to drop extreme values of the problematic independent variable. While the model achieves convergence with the squared term after this, I am not sure this is the appropriate way of dealing with the problem (I have no way of knowing if these extreme values are errors in the data or not, and therefore no valid reason to drop them). Does anyone have some advice? Should I abandon taking non-linearities into account because my model does not converge when they are added? Are there other ways around this?

Thanks
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

12 Jan 2019, 13:18

Amy, you re not providing enough detail to obtain useful suggestions.

Paste at least the exact command you executed and the error message Stata returned.

How did you enter the squared terms? Did you use factor variable notation? Or you generated the variables yourself? If you generated them, did you demean them prior to squaring them?

Do I understand correctly that the model fails to converge only when the square of a particular variable is added? Do the model converge when you estimate the same model except for excluding the square of this particular variable? Can provide detailed summary statistics of the problematic variable?
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

12 Jan 2019, 13:29

You can try the -difficult- option, and increase the number of iterations. If I am not wrong the default number of iterations is 16000, you can increase this see whether Stata finds a solution:

Code:

. webuse lbw, clear
(Hosmer & Lemeshow data)

. logit low age lwt i.race smoke ptl ht ui, difficult  iterate(30000)

Iteration 0:   log likelihood =   -117.336  
Iteration 1:   log likelihood = -101.28644  
Iteration 2:   log likelihood = -100.72617  
Iteration 3:   log likelihood =   -100.724  
Iteration 4:   log likelihood =   -100.724  

Logistic regression                             Number of obs     =        189
                                                LR chi2(8)        =      33.22
                                                Prob > chi2       =     0.0001
Log likelihood =   -100.724                     Pseudo R2         =     0.1416

------------------------------------------------------------------------------
         low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0271003   .0364504    -0.74   0.457    -.0985418    .0443412
         lwt |  -.0151508   .0069259    -2.19   0.029    -.0287253   -.0015763
             |
        race |
      black  |   1.262647   .5264101     2.40   0.016     .2309024    2.294392
      other  |   .8620792   .4391532     1.96   0.050     .0013548    1.722804
             |
       smoke |   .9233448   .4008266     2.30   0.021      .137739    1.708951
         ptl |   .5418366    .346249     1.56   0.118     -.136799    1.220472
          ht |   1.832518   .6916292     2.65   0.008     .4769494    3.188086
          ui |   .7585135   .4593768     1.65   0.099    -.1418484    1.658875
       _cons |   .4612239    1.20459     0.38   0.702    -1.899729    2.822176
------------------------------------------------------------------------------

.

Comment

Amy Duma

Join Date: Feb 2018

Posts: 13
#4

12 Jan 2019, 14:07

Originally posted by Joro Kolev View Post

Amy, you re not providing enough detail to obtain useful suggestions.

Paste at least the exact command you executed and the error message Stata returned.

How did you enter the squared terms? Did you use factor variable notation? Or you generated the variables yourself? If you generated them, did you demean them prior to squaring them?

Do I understand correctly that the model fails to converge only when the square of a particular variable is added? Do the model converge when you estimate the same model except for excluding the square of this particular variable? Can provide detailed summary statistics of the problematic variable?

Hi Joro,

Thanks for the reply. Yes, the model only fails to converge only when the squared term for a particular independent variable is included, which I added using factor variable notation. The message I get is

Code:

convergence not achieved

This is followed by the regression output. I am just not sure I can interpret this given that convergence was not achieved.

Summary statistics for the variable that when squared leads to non-convergence are:

Code:

Percentiles Smallest 1% 7 1 5% 36 1 10% 65 1 Obs 100,358 25% 153 1 Sum of Wgt. 100,358 50% 291 Mean 304.1462 Largest Std. Dev. 188.2457 75% 434 2809 90% 554 2852 Variance 35436.44 95% 623 2934 Skewness .7562467 99% 760 3191 Kurtosis 6.167888
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5011
#5

12 Jan 2019, 14:19

Try dividing the problematic variable by 100. Sometimes rescaling variables helps with convergence problems.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

12 Jan 2019, 14:23

Did you add the -, difficult iterate(30000)- options as in #3 ? The problematic variable has "outliers", the 99 percentile is 760 but the four largest values are of of order of magnitude of 3000...
Do the parameter estimates of the variables that are not the problematic square change a lot between the model where you do not have the problematic square included, and the model where you have the problematic square included?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

12 Jan 2019, 14:25

Originally posted by Richard Williams View Post

Try dividing the problematic variable by 100. Sometimes rescaling variables helps with convergence problems.

This is a good idea. Probably even a better idea would be to standardise the problematic variable, subtract the mean, and divide by the standard deviation.
Comment
Amy Duma

Join Date: Feb 2018

Posts: 13
#8

13 Jan 2019, 11:07

Originally posted by Joro Kolev View Post

This is a good idea. Probably even a better idea would be to standardise the problematic variable, subtract the mean, and divide by the standard deviation.

I have tried 30,000 iteration with the difficult option and convergence is still not achieved. I have tried to divide the variable by 100 and standardsing it but my model is still not converging. Should I just give up trying to introduce the squared term? I see no big change in the parameter estimate of the non-problematic variables when the square is included (alhtough i do not think i can interpret these as convergence is not achieved).
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#9

13 Jan 2019, 11:19

Amy, try to increase the tolerance to see whether this will not get you convergence, like this:

Code:

. webuse lbw, clear (Hosmer & Lemeshow data) . logit low age lwt i.race smoke ptl ht ui, difficult iterate(30000) tolerance(0.001)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5011
#10

13 Jan 2019, 11:24

Did you try rescaling the problematic variable? Oops I see you did.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Amy Duma

Join Date: Feb 2018

Posts: 13
#11

13 Jan 2019, 12:50

Originally posted by Joro Kolev View Post

Amy, try to increase the tolerance to see whether this will not get you convergence, like this:

Code:

. webuse lbw, clear (Hosmer & Lemeshow data) . logit low age lwt i.race smoke ptl ht ui, difficult iterate(30000) tolerance(0.001)

Thanks for all the advice, unfortunately this also did not work for me!
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

#12

13 Jan 2019, 13:03

Last advice that I have for you: feed the parameter estimates from the model excluding the problematic square term as initial values. Like this:

Code:

. webuse lbw, clear
(Hosmer & Lemeshow data)

. qui logit low age lwt i.race smoke ptl ht

. mat b = e(b)



. logit low age lwt i.race smoke ptl ht ui, difficult iterate(30000) tolerance(0.001) from(b)

Iteration 0:   log likelihood = -102.06354  
Iteration 1:   log likelihood = -100.73193  
Iteration 2:   log likelihood =   -100.724  
Iteration 3:   log likelihood =   -100.724  

Logistic regression                               Number of obs   =        189
                                                  LR chi2(8)      =      33.22
                                                  Prob > chi2     =     0.0001
Log likelihood =   -100.724                       Pseudo R2       =     0.1416

------------------------------------------------------------------------------
         low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0271003   .0364504    -0.74   0.457    -.0985418    .0443412
         lwt |  -.0151508   .0069259    -2.19   0.029    -.0287253   -.0015763
             |
        race |
          2  |   1.262647   .5264101     2.40   0.016     .2309024    2.294392
          3  |   .8620791   .4391532     1.96   0.050     .0013548    1.722804
             |
       smoke |   .9233448   .4008266     2.30   0.021      .137739    1.708951
         ptl |   .5418366    .346249     1.56   0.118     -.136799    1.220472
          ht |   1.832518   .6916292     2.65   0.008     .4769494    3.188086
          ui |   .7585135   .4593768     1.65   0.099    -.1418484    1.658875
       _cons |   .4612239    1.20459     0.38   0.702    -1.899729    2.822176
------------------------------------------------------------------------------

And if this does not work, I have no more ideas

Comment

Richard Williams

Join Date: Apr 2014

Posts: 5011
#13

13 Jan 2019, 17:35

I doubt this will work. But you can try another program that will do logit, e.g. glm or fracreg. You could also try using probit instead.

Also I don't think it is necessarily horrible to truncate or drop the most extreme values, especially if it appears the squared term should be in there. You of course want to explain what you did. I'd rather see you attempt a reasonable explanation with caveats than just throw up your hands.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4433
#14

13 Jan 2019, 19:53

Along the lines of Richard's last suggestion, you could try the user-written (SSC) -firthlogit-. The penalization can help convergence. Be sure to center and re-scale your problematic predictor beforehand.

Code:

generate double new_predictor = (problematic_predictor - 304.1462) / 100
Comment
Amy Duma

Join Date: Feb 2018

Posts: 13
#15

14 Jan 2019, 06:31

Originally posted by Richard Williams View Post

I doubt this will work. But you can try another program that will do logit, e.g. glm or fracreg. You could also try using probit instead.

Also I don't think it is necessarily horrible to truncate or drop the most extreme values, especially if it appears the squared term should be in there. You of course want to explain what you did. I'd rather see you attempt a reasonable explanation with caveats than just throw up your hands.

Thanks, that is useful advice. I understand that scaling continuous variables has an impact on convergence, but I do not think I understand why that is very well. Would you be able to expand on that?
Comment

Announcement

Logit model convergence not achieved when adding non-linearities

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment