Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convergence not achieved - I've identified the culprit, now what do I do?

    Hello all,

    I run a fixed effects logistic regression of self rated health and local employment change over three waves, as follows (clustering is baseline geographical location):

    Code:
    . clogit binary_health_y psum_unemployed_total_cont_y i.yrlycurrent_county_y1 i.year age_y i.maritalstatus_
    > y if has_y0_questionnaire==1 & has_y5_questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire=
    > =1 | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire=
    > =1 & cbmi_y5 !=. & has_y5_questionnaire==0 | has_y0_questionnaire==1 & cbmi_y10 !=. & has_y10_questionnai
    > re==0 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_questionnaire==0 & cbmi_y10 !=. & has_y10_question
    > naire==0 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_questionnaire==1 | has_y0_questionnaire==1 & cb
    > mi_y10 !=. & has_y10_questionnaire==1 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_questionnaire==1 &
    >  cbmi_y10 !=. & has_y10_questionnaire==1, group(id) cluster (current_county_y1) robust iterate(500) nolog
    note: multiple positive outcomes within groups encountered.
    note: 447 groups (1,057 obs) dropped because of all positive or
          all negative outcomes.
    note: 3.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 6.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 7.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 13.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 15.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 16.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 17.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 18.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 19.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 20.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 23.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 24.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 25.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 26.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 28.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 29.yrlycurrent_county_y1 omitted because of no within-group variance.
    note: 32.yrlycurrent_county_y1 omitted because of no within-group variance.
    convergence not achieved
    
    Conditional (fixed-effects) logistic regression
    
                                                    Number of obs     =        524
                                                    Wald chi2(16)     =          .
                                                    Prob > chi2       =          .
    Log pseudolikelihood = -178.37962               Pseudo R2         =     0.0592
    
                                         (Std. Err. adjusted for 20 clusters in current_county_y1)
    ----------------------------------------------------------------------------------------------
                                 |               Robust
                 binary_health_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------------+----------------------------------------------------------------
    psum_unemployed_total_cont_y |  -.0720918   .0388134    -1.86   0.063    -.1481648    .0039811
                                 |
           yrlycurrent_county_y1 |
                          Cavan  |          0  (empty)
                          Clare  |   3.508484   1.437552     2.44   0.015     .6909335    6.326035
                           Cork  |  -701.8535          .        .       .            .           .
                        Donegal  |          0  (empty)
                      Dublin 16  |          0  (empty)
                    Dublin City  |   5.209175   1.927489     2.70   0.007     1.431367    8.986984
         DĂșn Laoghaire-Rathdown  |   5.798566   1.803271     3.22   0.001     2.264218    9.332913
                         Fingal  |    5.31021   2.008511     2.64   0.008     1.373601    9.246819
                         Galway  |   .7730753   1.945584     0.40   0.691    -3.040199     4.58635
                    Galway City  |   .7289299          .        .       .            .           .
                          Kerry  |          0  (empty)
                        Kildare  |   .5227628   1.379929     0.38   0.705    -2.181848    3.227374
                       Kilkenny  |          0  (omitted)
                          Laois  |          0  (omitted)
                        Leitrim  |          0  (empty)
                       Limerick  |          0  (empty)
                       Longford  |          0  (empty)
                          Louth  |          0  (empty)
                           Mayo  |  -37.39307          .        .       .            .           .
                          Meath  |   5.840481   1.762588     3.31   0.001     2.385872    9.295089
                       Monaghan  |          0  (empty)
                         Offaly  |          0  (omitted)
                      Roscommon  |          0  (omitted)
                          Sligo  |          0  (empty)
                   South Dublin  |   5.426697   1.908334     2.84   0.004     1.686432    9.166962
                      Tipperary  |          0  (empty)
                Tipperary North  |          0  (empty)
                      Waterford  |  -559.1875          .        .       .            .           .
                      Westmeath  |   3.329881   1.503996     2.21   0.027      .382103    6.277658
                        Wexford  |          0  (omitted)
                        Wicklow  |  -97.28333          .        .       .            .           .
                                 |
                            year |
                              5  |  -.4421024   .2411563    -1.83   0.067    -.9147601    .0305553
                             10  |   .2906156          .        .       .            .           .
                                 |
                           age_y |   .0180895    .030657     0.59   0.555    -.0419972    .0781762
                                 |
                 maritalstatus_y |
                     Cohabiting  |   .2093029   .1423515     1.47   0.141     -.069701    .4883068
                      Separated  |  -.5001542   1.496891    -0.33   0.738    -3.434007    2.433699
                       Divorced  |  -1.252438   .6516711    -1.92   0.055    -2.529689    .0248143
                        Widowed  |   .5818359   1.804183     0.32   0.747    -2.954298    4.117969
           Single/Never married  |   -.025371   .4241849    -0.06   0.952    -.8567581    .8060161
    ----------------------------------------------------------------------------------------------
    Warning: convergence not achieved
    
    . margins, dydx(psum_unemployed_total_cont_y) post
    
    Average marginal effects                        Number of obs     =        524
    Model VCE    : Robust
    
    Expression   : Pr(binary_health_y|fixed effect is 0), predict(pu0)
    dy/dx w.r.t. : psum_unemployed_total_cont_y
    
    ----------------------------------------------------------------------------------------------
                                 |            Delta-method
                                 |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------------+----------------------------------------------------------------
    psum_unemployed_total_cont_y |  -.0083921   .0045424    -1.85   0.065    -.0172951    .0005108
    ----------------------------------------------------------------------------------------------

    However, convergence is not achieved. Reviewing the Statalist archives I removed variables one by one until I identified the two culprits, respondents age and geographical location, no longer included below:


    Code:
    . clogit binary_health_y psum_unemployed_total_cont_y i.year i.maritalstatus_y if has_y0_questionnaire==1 &
    >  has_y5_questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire==1 &
    >  has_y5_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_ques
    > tionnaire==0 | has_y0_questionnaire==1 & cbmi_y10 !=. & has_y10_questionnaire==0 | has_y0_questionnaire==
    > 1 & cbmi_y5 !=. & has_y5_questionnaire==0 & cbmi_y10 !=. & has_y10_questionnaire==0 | has_y0_questionnair
    > e==1 & cbmi_y5 !=. & has_y5_questionnaire==1 | has_y0_questionnaire==1 & cbmi_y10 !=. & has_y10_questionn
    > aire==1 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_questionnaire==1 & cbmi_y10 !=. & has_y10_questi
    > onnaire==1, group(id) cluster (current_county_y1) robust nolog
    note: multiple positive outcomes within groups encountered.
    note: 469 groups (1,106 obs) dropped because of all positive or
          all negative outcomes.
    
    Conditional (fixed-effects) logistic regression
    
                                                    Number of obs     =        547
                                                    Wald chi2(8)      =      86.50
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood = -194.59969               Pseudo R2         =     0.0172
    
                                         (Std. Err. adjusted for 20 clusters in current_county_y1)
    ----------------------------------------------------------------------------------------------
                                 |               Robust
                 binary_health_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------------+----------------------------------------------------------------
    psum_unemployed_total_cont_y |  -.0708697   .0309798    -2.29   0.022    -.1315889   -.0101504
                                 |
                            year |
                              5  |  -.3027965   .1272918    -2.38   0.017    -.5522839    -.053309
                             10  |   .4799054   .2673404     1.80   0.073    -.0440723    1.003883
                                 |
                 maritalstatus_y |
                     Cohabiting  |   .3540121   .2454215     1.44   0.149    -.1270052    .8350295
                      Separated  |  -.5968626   1.514624    -0.39   0.694    -3.565471    2.371746
                       Divorced  |  -1.213241   .6442842    -1.88   0.060    -2.476015    .0495326
                        Widowed  |  -.0610695   1.433941    -0.04   0.966    -2.871542    2.749404
           Single/Never married  |    .080062   .3159233     0.25   0.800    -.5391362    .6992603
    ----------------------------------------------------------------------------------------------
    
    . margins, dydx(psum_unemployed_total_cont_y) post
    
    Average marginal effects                        Number of obs     =        547
    Model VCE    : Robust
    
    Expression   : Pr(binary_health_y|fixed effect is 0), predict(pu0)
    dy/dx w.r.t. : psum_unemployed_total_cont_y
    
    ----------------------------------------------------------------------------------------------
                                 |            Delta-method
                                 |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------------+----------------------------------------------------------------
    psum_unemployed_total_cont_y |  -.0157168   .0055284    -2.84   0.004    -.0265523   -.0048812
    ----------------------------------------------------------------------------------------------
    
    . estimates store logitmod
    
    . estimates table logitmod, star stats(N r2 r2_a)
    
    ------------------------------
        Variable |   logitmod     
    -------------+----------------
    psum_unemp~y | -.01571676**   
    -------------+----------------
               N |        547     
              r2 |                
            r2_a |                
    ------------------------------
    legend: * p<0.05; ** p<0.01; *** p<0.001
    Some information on the troublemakers:

    Code:
    . des yrlycurrent_county_y
    
                  storage   display    value
    variable name   type    format     label      variable label
    -----------------------------------------------------------------------------------------------------------
    yrlycurrent_c~y str23   %23s                  
    
    . sum yrlycurrent_county_y
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
    yrlycurren~y |          0
    Code:
    . sum age_y
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           age_y |      3,123    35.06398    7.199812       15.1       53.7
    
    . des age_y
    
                  storage   display    value
    variable name   type    format     label      variable label
    -----------------------------------------------------------------------------------------------------------
    age_y           float   %9.0g                 
    
    .
    Having identified the trouble-makers, what can I do to fix this? I tried changing the data types and rounding the variables, as below, but that didn't change anything. Similarly banding age into categories was not useful as it changed my number of observations to 176.

    I would like to keep these variables in my analysis so what can I do?

    Code:
    gen int rage_y = round(age_y)
    
    gen float fyrlycurrent_county_y1 = yrlycurrent_county_y1
    
    
    replace age_y=round(age_y, 1)
    gen age_bands_y=.
    replace age_bands_y = 1 if age_y>=15 &  age_y<= 25
    replace age_bands_y = 2 if age_y>=25 &  age_y<= 35
    replace age_bands_y = 3 if age_y>=45 &  age_y<= 55

    Thanks for any and all advice,


    John

  • #2
    I think the note is the most informative parcel:
    "note: 447 groups (1,057 obs) dropped because of all positive or all negative outcomes"

    With regards to this topic as well as collinearity (also, an issue in the presented model), you may wish to take a look at the example 4 of - clogit - in the Stata Manual.
    Last edited by Marcos Almeida; 12 Nov 2019, 13:38.
    Best regards,

    Marcos

    Comment


    • #3
      Hi Marcos,

      Thank you for linking me to the study on union membership, is the idea that age is co-linear with year? How does this extend to geographic location though? I understand that about 40% of my observations are dropped, but many respondents health status does not change over this period, is this a problem?

      Very best,

      Jonathan

      Comment


      • #4
        Yes, in the case you shared, the categoy is "omitted because of no within-group variance".
        Best regards,

        Marcos

        Comment


        • #5
          I'm sorry Marcos, I must admit I'm lost, I understand that my model is not converging, and that this is due to age and geographic location. I think that you are saying that age and year are colinear, if this is the case will I have to remove age from the regression? Is there any way to keep it in? Why is location causing a problem then, and is there any way to fix this? Likewise, how large a problem are my dropped observations?

          Comment


          • #6
            Coming back to this, I think I have identified the problem Carlos is indicating:

            Age is colinear with year, which stops the model from converging and I think I also suffer from an issue of small numbers for regional location, which also stops my logistic regression from converging.

            Leaving aside age to focus on location, I include location fixed effects as I feel that these regional indicators account for time-invariant, region-specific effects that could effect health. However, migration between regions is low in the dataset I have (around 5%), so when I go to do a logistic regression I have the above issue of non-convergence.

            This leaves me in a quandry...

            My paper is under revise and resubmit with a journal who have asked me why I chose a linear probability model (lpm) over a logit model and would the results in a logit match my lpm.

            I had never considered the logit route as my sample size is quite small and felt it would be too punishing on the observations so my core analysis is completed in an lpm. Now I don't know what to do, I could re-estimate without county fixed effects to get convergence but this is a different model and thus results differ slightly, I can report my logit with county fixed effects included, but this has not reached convergence.

            I thought I could use this incompatibility in the argument to support my LPM, i.e. that a logistic regression does not allow me to account for county-specific effects which may influence health, and is thus not suited to my analysis and thus I have to use an lpm, but I really do not know what would be best here and could use your advice,

            All the very best,

            John

            Comment


            • #7
              John:
              you probably meant Marcos.
              The evidence that age and years go in the same direction (and are hence collinear) is niot surprising at all. I would kick -age- or -year- out the set of my predictors.
              As far as location is concerned: have you already checked what happen to convergence when you re-organize location in geographical categories (North, South...)?
              Kind regards,
              Carlo
              (Stata 16.0 SE)

              Comment


              • #8
                Just an additional note: please beware this model has, well, several "issues". 1) Collinearity; 2) Perfect prediction; 3) Lack of within-group variance.

                I don't want to be pessimistic, but there is (probably) no "wild card" solution for all these problems. You may need to tackle one by one.
                Best regards,

                Marcos

                Comment


                • #9
                  OK thanks for your advice, I'll work on these issues in order, thanks so much for your advice,

                  John

                  Comment

                  Working...
                  X