Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logit model convergence not achieved when adding non-linearities

    Hi,

    I am working a logit model with a binary dependent variable. Some of the independent variables I am using are continuous, and from some preliminary graphical analysis (and theory) I have reasons to believe that they have a non-linear relationship with my dependent variable. However, when I try to add squared terms for one of these variables, my model does not achieve convergence. I have found that one way around this is to drop extreme values of the problematic independent variable. While the model achieves convergence with the squared term after this, I am not sure this is the appropriate way of dealing with the problem (I have no way of knowing if these extreme values are errors in the data or not, and therefore no valid reason to drop them). Does anyone have some advice? Should I abandon taking non-linearities into account because my model does not converge when they are added? Are there other ways around this?

    Thanks



  • #2
    Amy, you re not providing enough detail to obtain useful suggestions.

    Paste at least the exact command you executed and the error message Stata returned.

    How did you enter the squared terms? Did you use factor variable notation? Or you generated the variables yourself? If you generated them, did you demean them prior to squaring them?

    Do I understand correctly that the model fails to converge only when the square of a particular variable is added? Do the model converge when you estimate the same model except for excluding the square of this particular variable? Can provide detailed summary statistics of the problematic variable?

    Comment


    • #3
      You can try the -difficult- option, and increase the number of iterations. If I am not wrong the default number of iterations is 16000, you can increase this see whether Stata finds a solution:

      Code:
      . webuse lbw, clear
      (Hosmer & Lemeshow data)
      
      . logit low age lwt i.race smoke ptl ht ui, difficult  iterate(30000)
      
      Iteration 0:   log likelihood =   -117.336  
      Iteration 1:   log likelihood = -101.28644  
      Iteration 2:   log likelihood = -100.72617  
      Iteration 3:   log likelihood =   -100.724  
      Iteration 4:   log likelihood =   -100.724  
      
      Logistic regression                             Number of obs     =        189
                                                      LR chi2(8)        =      33.22
                                                      Prob > chi2       =     0.0001
      Log likelihood =   -100.724                     Pseudo R2         =     0.1416
      
      ------------------------------------------------------------------------------
               low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               age |  -.0271003   .0364504    -0.74   0.457    -.0985418    .0443412
               lwt |  -.0151508   .0069259    -2.19   0.029    -.0287253   -.0015763
                   |
              race |
            black  |   1.262647   .5264101     2.40   0.016     .2309024    2.294392
            other  |   .8620792   .4391532     1.96   0.050     .0013548    1.722804
                   |
             smoke |   .9233448   .4008266     2.30   0.021      .137739    1.708951
               ptl |   .5418366    .346249     1.56   0.118     -.136799    1.220472
                ht |   1.832518   .6916292     2.65   0.008     .4769494    3.188086
                ui |   .7585135   .4593768     1.65   0.099    -.1418484    1.658875
             _cons |   .4612239    1.20459     0.38   0.702    -1.899729    2.822176
      ------------------------------------------------------------------------------
      
      .

      Comment


      • #4
        Originally posted by Joro Kolev View Post
        Amy, you re not providing enough detail to obtain useful suggestions.

        Paste at least the exact command you executed and the error message Stata returned.

        How did you enter the squared terms? Did you use factor variable notation? Or you generated the variables yourself? If you generated them, did you demean them prior to squaring them?

        Do I understand correctly that the model fails to converge only when the square of a particular variable is added? Do the model converge when you estimate the same model except for excluding the square of this particular variable? Can provide detailed summary statistics of the problematic variable?
        Hi Joro,

        Thanks for the reply. Yes, the model only fails to converge only when the squared term for a particular independent variable is included, which I added using factor variable notation. The message I get is

        Code:
        convergence not achieved
        This is followed by the regression output. I am just not sure I can interpret this given that convergence was not achieved.

        Summary statistics for the variable that when squared leads to non-convergence are:

        Code:
                        
        Percentiles    Smallest
        1%       7        1
        5%      36       1
        10%    65       1           Obs                 100,358
        25%    153     1           Sum of Wgt.    100,358
        
        50%    291                   Mean    304.1462
        Largest                        Std. Dev.    188.2457
        75%    434    2809
        90%    554    2852     Variance    35436.44
        95%    623    2934     Skewness    .7562467
        99%    760    3191     Kurtosis    6.167888

        Comment


        • #5
          Try dividing the problematic variable by 100. Sometimes rescaling variables helps with convergence problems.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://academicweb.nd.edu/~rwilliam/

          Comment


          • #6
            Did you add the -, difficult iterate(30000)- options as in #3 ? The problematic variable has "outliers", the 99 percentile is 760 but the four largest values are of of order of magnitude of 3000...
            Do the parameter estimates of the variables that are not the problematic square change a lot between the model where you do not have the problematic square included, and the model where you have the problematic square included?

            Comment


            • #7
              Originally posted by Richard Williams View Post
              Try dividing the problematic variable by 100. Sometimes rescaling variables helps with convergence problems.
              This is a good idea. Probably even a better idea would be to standardise the problematic variable, subtract the mean, and divide by the standard deviation.

              Comment


              • #8
                Originally posted by Joro Kolev View Post

                This is a good idea. Probably even a better idea would be to standardise the problematic variable, subtract the mean, and divide by the standard deviation.
                I have tried 30,000 iteration with the difficult option and convergence is still not achieved. I have tried to divide the variable by 100 and standardsing it but my model is still not converging. Should I just give up trying to introduce the squared term? I see no big change in the parameter estimate of the non-problematic variables when the square is included (alhtough i do not think i can interpret these as convergence is not achieved).

                Comment


                • #9
                  Amy, try to increase the tolerance to see whether this will not get you convergence, like this:

                  Code:
                  . webuse lbw, clear
                  (Hosmer & Lemeshow data)
                  
                  . logit low age lwt i.race smoke ptl ht ui, difficult  iterate(30000) tolerance(0.001)

                  Comment


                  • #10
                    Did you try rescaling the problematic variable? Oops I see you did.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://academicweb.nd.edu/~rwilliam/

                    Comment


                    • #11
                      Originally posted by Joro Kolev View Post
                      Amy, try to increase the tolerance to see whether this will not get you convergence, like this:

                      Code:
                      . webuse lbw, clear
                      (Hosmer & Lemeshow data)
                      
                      . logit low age lwt i.race smoke ptl ht ui, difficult iterate(30000) tolerance(0.001)
                      Thanks for all the advice, unfortunately this also did not work for me!

                      Comment


                      • #12
                        Last advice that I have for you: feed the parameter estimates from the model excluding the problematic square term as initial values. Like this:

                        Code:
                        . webuse lbw, clear
                        (Hosmer & Lemeshow data)
                        
                        . qui logit low age lwt i.race smoke ptl ht
                        
                        . mat b = e(b)
                        
                        
                        
                        . logit low age lwt i.race smoke ptl ht ui, difficult iterate(30000) tolerance(0.001) from(b)
                        
                        Iteration 0:   log likelihood = -102.06354  
                        Iteration 1:   log likelihood = -100.73193  
                        Iteration 2:   log likelihood =   -100.724  
                        Iteration 3:   log likelihood =   -100.724  
                        
                        Logistic regression                               Number of obs   =        189
                                                                          LR chi2(8)      =      33.22
                                                                          Prob > chi2     =     0.0001
                        Log likelihood =   -100.724                       Pseudo R2       =     0.1416
                        
                        ------------------------------------------------------------------------------
                                 low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                                 age |  -.0271003   .0364504    -0.74   0.457    -.0985418    .0443412
                                 lwt |  -.0151508   .0069259    -2.19   0.029    -.0287253   -.0015763
                                     |
                                race |
                                  2  |   1.262647   .5264101     2.40   0.016     .2309024    2.294392
                                  3  |   .8620791   .4391532     1.96   0.050     .0013548    1.722804
                                     |
                               smoke |   .9233448   .4008266     2.30   0.021      .137739    1.708951
                                 ptl |   .5418366    .346249     1.56   0.118     -.136799    1.220472
                                  ht |   1.832518   .6916292     2.65   0.008     .4769494    3.188086
                                  ui |   .7585135   .4593768     1.65   0.099    -.1418484    1.658875
                               _cons |   .4612239    1.20459     0.38   0.702    -1.899729    2.822176
                        ------------------------------------------------------------------------------
                        And if this does not work, I have no more ideas .

                        Comment


                        • #13
                          I doubt this will work. But you can try another program that will do logit, e.g. glm or fracreg. You could also try using probit instead.

                          Also I don't think it is necessarily horrible to truncate or drop the most extreme values, especially if it appears the squared term should be in there. You of course want to explain what you did. I'd rather see you attempt a reasonable explanation with caveats than just throw up your hands.
                          -------------------------------------------
                          Richard Williams, Notre Dame Dept of Sociology
                          StataNow Version: 19.5 MP (2 processor)

                          EMAIL: [email protected]
                          WWW: https://academicweb.nd.edu/~rwilliam/

                          Comment


                          • #14
                            Along the lines of Richard's last suggestion, you could try the user-written (SSC) -firthlogit-. The penalization can help convergence. Be sure to center and re-scale your problematic predictor beforehand.
                            Code:
                            generate double new_predictor = (problematic_predictor - 304.1462) / 100

                            Comment


                            • #15
                              Originally posted by Richard Williams View Post
                              I doubt this will work. But you can try another program that will do logit, e.g. glm or fracreg. You could also try using probit instead.

                              Also I don't think it is necessarily horrible to truncate or drop the most extreme values, especially if it appears the squared term should be in there. You of course want to explain what you did. I'd rather see you attempt a reasonable explanation with caveats than just throw up your hands.
                              Thanks, that is useful advice. I understand that scaling continuous variables has an impact on convergence, but I do not think I understand why that is very well. Would you be able to expand on that?

                              Comment

                              Working...
                              X