Interpretation of coefficients in logistic regression

Eddy Simms

Join Date: Dec 2019
Posts: 41

Interpretation of coefficients in logistic regression

13 Aug 2021, 15:21

I have 5 possible independent categorical variables in a logistic regression. When each variable is regressed separately they are significantly related to m, the dependent variable. However when I include them all in the regression none of the variables are significant. If they were continous variables then I would look for problems with collinearity but I am not sure how to proceed or interpret the results with categorical variables.
I would be grateful for any pointers.

Thank you.
Eddy

Code:

. logistic m i.a
. testparm i.a
           chi2(  3) =   18.01
         Prob > chi2 =    0.0004

. testparm i.b
         chi2(  2) =    8.96
         Prob > chi2 =    0.0113
. logistic m i.c
. testparm i.c
           chi2(  3) =   22.19
         Prob > chi2 =    0.0001
. logistic m i.d
. testparm i.d
           chi2(  3) =   17.31
         Prob > chi2 =    0.0006
. logistic m i.e


. logistic m i.a i.b i.c i.d i.e
note: 3.d != 0 predicts failure perfectly;
      3.d omitted and 3 obs not used.
. testparm i.a
           chi2(  3) =    2.60
         Prob > chi2 =    0.4573
. testparm i.b
           chi2(  2) =    3.31
         Prob > chi2 =    0.1914
. testparm i.c
           chi2(  3) =    5.09
         Prob > chi2 =    0.1654
. testparm i.d
           chi2(  3) =    2.61
         Prob > chi2 =    0.4554
. testparm i.e
           chi2(  2) =    0.11
         Prob > chi2 =    0.9454

[CODE]. dataex

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id rater m sex a b c d e f)
 1 1 0 2 1 1 2 1 1 0
 2 1 1 2 4 1 3 5 1 0
 3 1 1 2 2 3 3 4 1 1
 4 1 1 2 3 3 3 5 1 0
 5 1 1 2 2 1 2 4 1 0
 6 1 1 2 4 3 3 5 1 0
 7 1 0 2 1 1 2 1 2 0
 8 1 0 2 1 1 2 2 2 0
 9 1 0 2 1 2 3 3 1 1
10 1 1 2 4 2 4 5 1 0
11 1 1 2 3 3 3 4 1 1
12 1 1 2 4 1 3 5 1 0
13 1 0 2 3 2 2 2 1 0
14 1 1 2 2 2 3 4 1 0
15 1 1 2 4 1 3 5 1 0
16 1 1 2 4 2 4 5 1 0
17 1 1 2 4 1 3 5 3 0
18 1 0 2 1 1 2 2 2 0
19 1 0 2 2 1 2 2 2 0
20 1 1 2 4 2 3 5 3 0
21 1 1 1 4 3 2 5 2 0
22 1 0 2 1 1 2 2 1 0
23 1 0 2 1 1 2 2 1 0
24 1 1 2 4 2 3 4 1 0
25 1 1 1 4 1 4 4 1 1
26 1 0 1 1 1 2 1 1 0
27 1 0 1 1 1 1 2 1 0
28 1 1 1 1 1 1 2 3 0
29 1 1 2 3 2 3 4 3 0
30 1 1 1 4 2 4 5 1 0
31 1 1 1 4 2 4 5 1 0
32 1 0 1 1 1 2 1 1 0
33 1 0 2 1 1 1 1 3 0
34 1 1 2 1 1 1 2 1 0
35 1 0 2 1 1 2 1 3 0
36 1 0 2 1 1 2 1 3 0
37 1 1 2 3 2 2 1 1 0
38 1 1 2 4 2 4 5 1 0
39 1 0 1 1 1 1 1 1 0
40 1 1 2 4 1 4 5 1 1
41 1 0 2 3 2 2 3 1 0
42 1 0 2 3 1 2 4 1 0
43 1 1 1 4 2 4 5 1 0
44 1 1 2 2 3 3 5 1 0
45 1 1 1 4 2 3 4 1 0
46 1 0 2 4 2 2 3 3 0
47 1 1 2 3 1 3 5 1 0
48 1 1 1 4 2 3 5 1 0
49 1 0 2 1 1 2 1 1 0
50 1 1 1 4 2 4 4 1 0
51 1 1 1 4 2 3 5 1 0
52 1 1 1 4 3 3 5 1 0
53 1 0 1 1 3 2 4 1 0
54 1 0 2 2 1 2 4 3 0
55 1 1 2 4 2 4 5 1 1
56 1 0 1 2 2 3 2 1 0
57 1 0 2 2 1 3 4 1 0
58 1 1 1 1 3 3 2 1 0
59 1 0 2 2 1 3 1 3 1
60 1 1 1 4 2 4 5 1 0
61 1 1 2 3 1 3 4 1 0
62 1 0 2 1 1 1 2 1 0
63 1 0 2 4 2 4 5 1 0
64 1 0 1 4 2 3 4 1 0
65 1 1 1 2 3 2 4 1 0
66 1 1 1 4 1 3 5 1 0
67 1 1 1 3 2 3 4 1 0
68 1 1 2 4 2 3 4 2 0
69 1 1 2 4 3 4 5 3 1
70 1 1 1 1 3 3 4 1 0
71 1 1 2 2 3 3 1 1 0
72 1 1 1 4 1 3 5 3 0
73 1 1 1 1 1 1 2 1 0
74 1 1 2 1 1 3 1 1 1
75 1 1 2 4 1 4 5 1 0
76 1 1 2 2 2 2 2 1 0
77 1 0 2 4 2 3 4 1 0
end

------------------ copy up to and including the previous line ------------------

Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30191

13 Aug 2021, 15:47

Code:

. logistic m i.(a-f)
note: 3.d != 0 predicts failure perfectly;
      3.d omitted and 3 obs not used.


Logistic regression                                     Number of obs =     74
                                                        LR chi2(14)   =  46.85
                                                        Prob > chi2   = 0.0000
Log likelihood = -24.548482                             Pseudo R2     = 0.4883

------------------------------------------------------------------------------
           m | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           a |
          2  |   4.068051   6.044492     0.94   0.345     .2211226    74.84102
          3  |   15.02915   27.47834     1.48   0.138     .4174959    541.0245
          4  |    3.44991   7.452382     0.57   0.566     .0500115    237.9827
             |
           b |
          2  |   2.656785   3.286937     0.79   0.430      .235105    30.02278
          3  |   26.13275   43.07203     1.98   0.048     1.033317    660.9013
             |
           c |
          2  |   .0321004   .0600293    -1.84   0.066     .0008217    1.254003
          3  |   .1206235   .2463161    -1.04   0.300     .0022042    6.600944
          4  |   .0666723   .1828007    -0.99   0.323     .0003091    14.37981
             |
           d |
          2  |   2.296501   2.909765     0.66   0.512     .1916718    27.51536
          3  |          1  (empty)
          4  |   4.390681   6.503782     1.00   0.318     .2408092    80.05542
          5  |   79.95313   192.0209     1.82   0.068     .7219904    8853.999
             |
           e |
          2  |   1.277315   2.251591     0.14   0.890     .0403496    40.43489
          3  |   .5525804   .6354498    -0.52   0.606     .0580151    5.263201
             |
         1.f |   15.31741   32.37936     1.29   0.197     .2431289    965.0148
       _cons |   .4770873   .6240354    -0.57   0.572     .0367466    6.194107
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

shows that your independent variables a through f are jointly quite strongly associated with m. And, in fact, some of the odds ratios are really huge--huge enough to make me suspicious, in fact. If you are interested in significance, you can also see that the joint significance of the variables is quite strong by looking at the omnibus chi square test. But significance is not the most important part of this story.

Seriously, some of those odds ratios really do look too high to be true. When working with discrete variables, odds ratios > 4 (or < 0.25) are uncommon in real life, and odds ratios > 10 (or < 0.10) are truly rare. Notice also how very wide the confidence intervals around those are. I think what we are seeing here is overfitting of the data: you have 14 predictor degrees of freedom and only 74 observations--that's far too many variables for that few observations. With 74 observations, you really shouldn't be using more than two or, to stretch it, 3, predictor variables. (Some people would even say you only have enough data for 1, but I'm not quite that stringent.)

Concerning your quest for colinearity, with nominal-level categorical variables there is nothing simple that is strictly analogous to correlation coefficients. But you can see that these variables are very much related to each other by running:

Code:

unab ivs: a-f
local vcount: word count `ivs'
forvalues i = 1/`vcount' {
    forvalues j = `=`i'+1'/`vcount' {
        local v: word `i' of `ivs'
        local w: word `j' of `ivs'
        tab `v' `w', chi2
    }
}

But that's neither here nor there. The overfitting is the big problem here. Even if all of your predictors came out "significant" in the joint model, the model itself is just not appropriate for this scanty data set and its results are not credible.

Comment

Richard Williams

Join Date: Apr 2014

Posts: 5025
#3

13 Aug 2021, 16:27

You've only got 77 cases, 3 of which drop out in your full model. With that few cases, it is very hard to have more than 1 or two variables in the model.

My guess is there is some collinearity in your variables. Try to find where it is. Maybe some variables measure very similar things. Maybe combine some categories of an independent variable. For example, if you do a frequencies of your variables, you see that category 3 of d only has 3 cases, which isn't much.

In short, with so few cases, I don't think you are going to have much luck with a model that has 13 independent variables. Even if all 13 make theoretical sense, you don't have enough statistical power to detect all their effects. Some will likely either have to be dropped completely or else you may have to combine some categories.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Eddy Simms

Join Date: Dec 2019

Posts: 41
#4

14 Aug 2021, 03:47

Thank you both for your helpful and incisive comments. I will tell the surgeon who was hoping to use this as an aid to treatment that it is not realistic and to treat a similar paper with scepticism.
Thank you.
Eddy.
Comment

Announcement

Interpretation of coefficients in logistic regression

Comment

Comment

Comment