Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about Interaction term & collinearity in Logistic regression.

    Hi. During Logistic regression, I have a question about interaction term.
    I want to identify the relationship between blood pressure component (systolic blood pressure;SBP , diastolic blood pressure;DBP) and outcome.
    So I set the dependent, independent variables, and models as below.

    <Dependent variable>
    outcome (0: negative, 1: positive)

    <Independent variable>
    (main predictor)
    gr_sbp5 (5 level categorical variable, ref=1)
    gr_dbp5 (5 level categorical variable, ref=1)
    Because SBP showed non-linear relationship with outcome (U-shape). so made categorical variable with SBP & DBP.

    (covariate)
    gr_bmi (5 level categorical variable, ref=1)
    uob, dz_cvd, dz_dm (binary categorical variable, ref=0)
    wbc, hb, glu10, chol10, gfr10, u_ph (continuous variable)

    <Multivariate model>
    model1 : base model + SBP
    model2 : base model + DBP
    model3 : base model + SBP + DBP
    model4 : base model + SBP + DBP + Interaction term(SBP*DBP)



    Code:
    . logistic outcome i.gr_bmi uob dz_cvd dz_dm wbc hb glu10 chol10 gfr10 u_ph i.gr_sbp5
    
    Logistic regression                             Number of obs     =    307,996
                                                    LR chi2(17)       =    2818.92
                                                    Prob > chi2       =     0.0000
    Log likelihood =  -19670.63                     Pseudo R2         =     0.0669
    
    --------------------------------------------------------------------------------
           outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
            gr_bmi |
        18.5-22.9  |   .2274956   .0089222   -37.75   0.000     .2106636    .2456724
          23-24.9  |   .1281581   .0080337   -32.77   0.000     .1133413     .144912
          25-29.9  |   .1386081   .0080557   -34.00   0.000     .1236853    .1553315
              >30  |   .2342154   .0169318   -20.08   0.000     .2032735    .2698672
                   |
               uob |   4.106947   .2508892    23.12   0.000     3.643511     4.62933
            dz_cvd |   1.542221    .310151     2.15   0.031     1.039835    2.287331
             dz_dm |   11.24978   1.953465    13.94   0.000     8.004562    15.81069
               wbc |   1.038053   .0095519     4.06   0.000     1.019499    1.056944
                hb |   1.063652   .0198393     3.31   0.001     1.025469    1.103255
             glu10 |   .9808376   .0096815    -1.96   0.050     .9620446    .9999978
            chol10 |    1.02705   .0061042     4.49   0.000     1.015155    1.039084
             gfr10 |   .8396113   .0108823   -13.49   0.000      .818551    .8612135
              u_ph |   .8203162   .0242938    -6.69   0.000     .7740568    .8693401
                   |
           gr_sbp5 |
     2nd(115-123)  |    .920193   .0441731    -1.73   0.083     .8375634    1.010974
    3rd (123-130)  |   .9555914   .0479349    -0.91   0.365     .8661115    1.054316
    4th (130-136)  |   .8897213   .0479283    -2.17   0.030     .8005726    .9887973
       5th (>136)  |   .9576887   .0531013    -0.78   0.436     .8590679    1.067631
                   |
             _cons |   .2932171   .1202545    -2.99   0.003     .1312483    .6550658
    --------------------------------------------------------------------------------
    
    . testparm i.gr_sbp5
    -(omitted)-
               chi2(  4) =    5.84
             Prob > chi2 =    0.2117
    
    .
    . logistic outcome i.gr_bmi uob dz_cvd dz_dm wbc hb glu10 chol10 gfr10 u_ph i.gr_dbp5
    
    Logistic regression                             Number of obs     =    307,996
                                                    LR chi2(17)       =    2875.08
                                                    Prob > chi2       =     0.0000
    Log likelihood = -19642.551                     Pseudo R2         =     0.0682
    
    ------------------------------------------------------------------------------
         outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          gr_bmi |
      18.5-22.9  |   .2246587   .0087555   -38.31   0.000     .2081373    .2424915
        23-24.9  |   .1232036   .0076423   -33.76   0.000     .1090997    .1391308
        25-29.9  |   .1283074   .0072706   -36.24   0.000     .1148201    .1433791
            >30  |   .2004536   .0140431   -22.94   0.000     .1747357    .2299567
                 |
             uob |   4.106937   .2509981    23.11   0.000     3.643311     4.62956
          dz_cvd |   1.543388   .3104973     2.16   0.031     1.040474    2.289386
           dz_dm |   11.95432   2.058213    14.41   0.000     8.530423    16.75247
             wbc |    1.03261   .0095297     3.48   0.001     1.014101    1.051458
              hb |   1.042993   .0194805     2.25   0.024     1.005503    1.081882
           glu10 |    .973856   .0098721    -2.61   0.009      .954698    .9933984
          chol10 |   1.023836   .0060904     3.96   0.000     1.011968    1.035843
           gfr10 |     .83826   .0108643   -13.61   0.000     .8172345    .8598264
            u_ph |   .8177969   .0242136    -6.79   0.000     .7716899    .8666587
                 |
         gr_dbp5 |
    2nd (67-72)  |    1.08831   .0574541     1.60   0.109     .9813323     1.20695
    3rd (72-77)  |   1.191713   .0636803     3.28   0.001     1.073216    1.323295
    4th (77-82)  |   1.122692   .0625467     2.08   0.038     1.006559    1.252225
      5th (>82)  |   1.493472   .0821815     7.29   0.000     1.340781    1.663552
                 |
           _cons |   .4072293   .1672724    -2.19   0.029     .1820547    .9109117
    ------------------------------------------------------------------------------
    
    . testparm i.gr_dbp5
    -(omitted)-
               chi2(  4) =   63.82
             Prob > chi2 =    0.0000
    
    .
    . logistic outcome i.gr_bmi uob dz_cvd dz_dm wbc hb glu10 chol10 gfr10 u_ph i.gr_sbp5 i.gr_dbp5
    
    Logistic regression                             Number of obs     =    307,996
                                                    LR chi2(21)       =    2926.78
                                                    Prob > chi2       =     0.0000
    Log likelihood = -19616.696                     Pseudo R2         =     0.0694
    
    --------------------------------------------------------------------------------
           outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
            gr_bmi |
        18.5-22.9  |   .2326459   .0091386   -37.12   0.000     .2154066    .2512648
          23-24.9  |   .1316977   .0082627   -32.31   0.000     .1164593    .1489302
          25-29.9  |    .140246   .0081583   -33.77   0.000     .1251339    .1571832
              >30  |   .2230054   .0162042   -20.65   0.000     .1934037    .2571378
                   |
               uob |   4.109485   .2512613    23.12   0.000     3.645385     4.63267
            dz_cvd |     1.5523   .3124149     2.18   0.029     1.046317    2.302968
             dz_dm |   11.64676   2.016673    14.18   0.000     8.295009    16.35284
               wbc |   1.033418   .0095374     3.56   0.000     1.014893    1.052281
                hb |   1.045843   .0195443     2.40   0.016      1.00823    1.084859
             glu10 |   .9761361   .0098164    -2.40   0.016     .9570847    .9955667
            chol10 |   1.024662   .0060943     4.10   0.000     1.012786    1.036676
             gfr10 |   .8390002   .0108785   -13.54   0.000     .8179475    .8605948
              u_ph |   .8232253   .0243883    -6.57   0.000     .7767865    .8724405
                   |
           gr_sbp5 |
     2nd(115-123)  |   .8176541   .0417093    -3.95   0.000      .739859    .9036291
    3rd (123-130)  |     .77717    .043948    -4.46   0.000     .6956354    .8682611
    4th (130-136)  |   .6656185   .0421356    -6.43   0.000     .5879518    .7535446
       5th (>136)  |   .6369883   .0439094    -6.54   0.000     .5564879    .7291336
                   |
           gr_dbp5 |
      2nd (67-72)  |   1.179404   .0642708     3.03   0.002      1.05993    1.312346
      3rd (72-77)  |   1.411342   .0839404     5.79   0.000     1.256049    1.585835
      4th (77-82)  |   1.398934   .0897421     5.23   0.000     1.233651    1.586361
        5th (>82)  |   1.981014   .1352353    10.01   0.000     1.732924     2.26462
                   |
             _cons |    .373708   .1536299    -2.39   0.017     .1669578    .8364846
    --------------------------------------------------------------------------------
    
    .
    . logistic outcome i.gr_bmi uob dz_cvd dz_dm wbc hb glu10 chol10 gfr10 u_ph i.gr_sbp5##i.gr_dbp5
    
    Logistic regression                             Number of obs     =    307,996
                                                    LR chi2(37)       =    2941.57
                                                    Prob > chi2       =     0.0000
    Log likelihood = -19609.303                     Pseudo R2         =     0.0698
    
    --------------------------------------------------------------------------------------------
                       outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------------------+----------------------------------------------------------------
                        gr_bmi |
                    18.5-22.9  |   .2325644   .0091394   -37.12   0.000      .215324    .2511851
                      23-24.9  |   .1316118   .0082597   -32.31   0.000     .1163792    .1488382
                      25-29.9  |    .140364   .0081682   -33.74   0.000     .1252338    .1573221
                          >30  |   .2224026    .016214   -20.62   0.000     .1927898    .2565639
                               |
                           uob |   4.106376   .2511678    23.09   0.000     3.642459    4.629379
                        dz_cvd |   1.556121   .3132275     2.20   0.028     1.048835    2.308764
                         dz_dm |    11.6862   2.024792    14.19   0.000     8.321299    16.41177
                           wbc |   1.033174   .0095377     3.54   0.000     1.014649    1.052038
                            hb |   1.045368   .0195416     2.37   0.018      1.00776    1.084379
                         glu10 |   .9760597   .0098105    -2.41   0.016     .9570196    .9954787
                        chol10 |   1.024647   .0060968     4.09   0.000     1.012767    1.036666
                         gfr10 |   .8390364   .0108798   -13.53   0.000     .8179811    .8606337
                          u_ph |   .8235468   .0244002    -6.55   0.000     .7770854    .8727861
                               |
                       gr_sbp5 |
                 2nd(115-123)  |   .8260038   .0802381    -1.97   0.049     .6828043    .9992355
                3rd (123-130)  |     .85602   .1312613    -1.01   0.311     .6338135    1.156129
                4th (130-136)  |   .9636551   .2558051    -0.14   0.889     .5727534    1.621346
                   5th (>136)  |   .6518757   .3799761    -0.73   0.463     .2079708    2.043277
                               |
                       gr_dbp5 |
                  2nd (67-72)  |   1.157018   .0853214     1.98   0.048     1.001315    1.336934
                  3rd (72-77)  |   1.525007   .1551635     4.15   0.000     1.249296    1.861565
                  4th (77-82)  |   1.778231   .2553033     4.01   0.000     1.342084    2.356115
                    5th (>82)  |    2.00592   .4725639     2.95   0.003     1.264106    3.183051
                               |
               gr_sbp5#gr_dbp5 |
     2nd(115-123)#2nd (67-72)  |   1.033055   .1328648     0.25   0.800     .8028741    1.329228
     2nd(115-123)#3rd (72-77)  |   .9158564   .1363203    -0.59   0.555     .6841174    1.226095
     2nd(115-123)#4th (77-82)  |   .7906024   .1473929    -1.26   0.208     .5486149    1.139327
       2nd(115-123)#5th (>82)  |   1.041386   .2948189     0.14   0.886     .5979081    1.813798
    3rd (123-130)#2nd (67-72)  |   .9068627   .1664451    -0.53   0.594     .6328671    1.299483
    3rd (123-130)#3rd (72-77)  |   .8529421   .1616351    -0.84   0.401      .588321    1.236587
    3rd (123-130)#4th (77-82)  |   .7350782   .1601667    -1.41   0.158      .479584    1.126685
      3rd (123-130)#5th (>82)  |   .9090021   .2643235    -0.33   0.743     .5141015    1.607241
    4th (130-136)#2nd (67-72)  |    .885609   .2623133    -0.41   0.682     .4955872    1.582574
    4th (130-136)#3rd (72-77)  |   .6956298   .2030264    -1.24   0.214     .3925967    1.232565
    4th (130-136)#4th (77-82)  |   .5237542   .1607292    -2.11   0.035     .2870195    .9557482
      4th (130-136)#5th (>82)  |   .6361281   .2281251    -1.26   0.207     .3149858     1.28469
       5th (>136)#2nd (67-72)  |   1.055758   .6622867     0.09   0.931     .3087432    3.610198
       5th (>136)#3rd (72-77)  |   .7535714   .4546771    -0.47   0.639     .2309622    2.458714
       5th (>136)#4th (77-82)  |   .7626797   .4613183    -0.45   0.654     .2330666    2.495768
         5th (>136)#5th (>82)  |   1.012943   .6364825     0.02   0.984     .2956191    3.470862
                               |
                         _cons |   .3704338   .1525055    -2.41   0.016      .165301    .8301289
    --------------------------------------------------------------------------------------------
    
    . testparm i.gr_sbp5
    -(omitted)-
               chi2(  4) =    4.81
             Prob > chi2 =    0.3073
    
    
    . testparm i.gr_sbp5#i.gr_dbp5
    -(omitted)-
               chi2( 16) =   15.15
             Prob > chi2 =    0.5140
    As you can see above result.
    In model1, SBP was not significant , except 4th (130-136) group. And SBP's overall effect was also non-significant (teatparm result)
    In model 2, DBP was significant, except 2nd (67-72) group.
    In model 3, SBP became significant in all groups when it combined with DBP group.
    In model 4, SBP was not significant again. except 2nd(115-123) group. SBP 's overall effect was also non-significant (teatparm result)

    I guess the presence of collinearity between SBP & DBP.
    so I check the collinearity between independent variables.


    Code:
    . collin gr_bmi uob dz_cvd dz_dm wbc hb glu10 chol10 gfr10 u_ph gr_dbp5 gr_sbp5
    (obs=307,996)
    
      Collinearity Diagnostics
    
                            SQRT                   R-
      Variable      VIF     VIF    Tolerance    Squared
    ----------------------------------------------------
        gr_bmi      1.29    1.14    0.7729      0.2271
           uob      1.00    1.00    0.9956      0.0044
        dz_cvd      1.00    1.00    0.9996      0.0004
         dz_dm      1.26    1.12    0.7938      0.2062
           wbc      1.07    1.03    0.9352      0.0648
            hb      1.08    1.04    0.9242      0.0758
         glu10      1.31    1.14    0.7646      0.2354
        chol10      1.13    1.06    0.8886      0.1114
         gfr10      1.03    1.01    0.9714      0.0286
          u_ph      1.01    1.01    0.9863      0.0137
       gr_dbp5      1.89    1.37    0.5303      0.4697
       gr_sbp5      2.00    1.41    0.5002      0.4998
    ----------------------------------------------------
      Mean VIF      1.26
    
                               Cond
            Eigenval          Index
    ---------------------------------
        1     9.2823          1.0000
        2     1.0005          3.0460
        3     0.9954          3.0538
        4     0.9757          3.0844
        5     0.4311          4.6403
        6     0.1192          8.8239
        7     0.0938          9.9464
        8     0.0481         13.8984
        9     0.0240         19.6677
        10     0.0142         25.5434
        11     0.0090         32.1367
        12     0.0057         40.4496
        13     0.0011         92.5860
    ---------------------------------
     Condition Number        92.5860
     Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
     Det(correlation matrix)    0.2703
    gr_sbp5 and gr_dbp5 showed high VIF 1.89, 2.00 among variables, but they was below 10.
    But condition index was 92.58. lager than cutoff 30.

    To treat collinearity,
    When I added the interaction term between sbp&dbp, it increased both VIF and condition index.
    When I remove gr_sbp , VIF of gr_dbp decreased to 1.17, but condition index was still 88.68.

    Now, I ask questions

    1) How can I treat collinearity, showed by condition index? Can I ignore result of condition index, because VIF was below 10?
    I wonder why condition index is so high...
    To reduce the condition index to 30 or less in my data, I only have to leave about four variables.

    2) in model 4, Interaction term was not significant, but It changed the coefficient (odds ratio) and p-value of SBP.
    Which model do I have to select & report between 1&2 or 3 or 4?

    Do I have to use margin command? than what level should do I have to fix for each blood pressure group?
    Last edited by Heebyung Koh; 10 Aug 2019, 11:03.

  • #2
    Please somebody help me......

    Comment


    • #3
      You didn't get a quick answer. Your post is way too long and complex. Also, look at the FAQ on asking questions.

      While there are guidelines for colinearity, they are just suggestions. The issue is colinearity makes it hard to differentiate the effects of different variables, but with 300,000 observations you have lots of power. [Read Goldberger's econometric text on colinearity.] You often get high levels of reported colinearity with interactions.

      However, it looks like most of your interactions don't make much difference. I suspect you can't reject the H that the interactions all have the same value on the main variable.

      With non-linear models like logit, you will do best to use margins to look at the change in predicted probability for changes in the x's. You can look at the influence across the values of the interacting varibles.



      Comment


      • #4
        Thank you very much for the answer.

        Do you mean that because observation is about 300,000, I don't have to worry about multicollinearity?

        You said " I suspect you can't reject the H that the interactions all have the same value on the main variable."
        But adding interaction term makes main SBP variable unsignificant in model 4.

        So I couldn't decide whether to put the interaction term in the model or not.

        Best.

        Comment

        Working...
        X