Multicollinearity in binary logistic regression

Maria Zela

Join Date: Apr 2017

Posts: 20
#1

Multicollinearity in binary logistic regression

22 Jun 2017, 06:34

Dear Statalist Forum,

I'm running a binary logistic regression (independent variables are dichotomous and continuous) and want to test the multicollinearity of the independent variables. Given that I can not use VIF, I have read that the collin command is useful for logistic regression. When I type collin following all independent variables I get very low VIFs (maximum 2.45). Is this sufficient to prove that the multicollinearity is very low in my model?

Thank you very much in advance!
Maria
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

22 Jun 2017, 09:20

I think even people who believe in looking at VIF would agree that 2.45 is sufficiently low.

That said, VIF is a waste of time. In fact, worrying about multicollinearity is almost always a waste of time. It is the most overrated "problem" in statistics, in my opinion. There are basically two different situations with multicollinearity:

1. There is some multicollinearity among variables that have been included, not because they are of interest in their own right, but because you want to adjust for their effects. Crucially, the key variables you are concerned about are not involved. In this case, it doesn't matter how colinear those variables are. Including those variables will adequately adjust for their effects, regardless of colinearity, and the colinearity does not in any way adversely affect your estimates for the uninvolved variables that you actually care about. So this kind of colinearity is completely irrelevant.

2. There is multicolinearity that does involve one or more of the variables you are actually interested in. This may, indeed, be a problem. But if it is a problem, it is one that, for practical purposes, has no solution. To tell whether it is a problem, all you have to do is look at the standard errors (or, equivalently, the 95% CI) of the coefficient estimate(s) for the variable(s) of interest. If the standard error is small enough (or the CI is narrow enough) that you have a sufficiently precise estimate of the effects of your key variables for the purposes at hand, then there is no problem. After all, the only thing colinearity does is widen the standard errors of the involved variables: it makes their effect estimates less precise. But if your results are precise enough for your purposes, then there is nothing more to say.

On the other hand, if you are left with a gaping confidence interval (large standard error) and your estimate(s) of your key effect(s) are so imprecise that they are not useful, then you have a problem. Unfortunately, there is no practical way to solve that problem. You cannot simply omit the variables that are colinear, because you will likely be left with severe omitted variable bias. You could solve the problem with a larger sample, but in this circumstance the required sample size is typically much, much larger than the sample you have--and presumably if it were easy/affordable to get more data, you would have done so in the first place. So usually enlarging the sample size is not feasible. The other approach is to just scrap everything and start over with a new study design that breaks the colinearity among these variables--that typically involves matching or stratified sampling, etc. But that is an entirely new study altogether.

So, bottom line: forget about VIF. Just look at the standard errors (Confidence Intervals) for the key variables in your study. If you have adequate precision, you're fine, end of story. If you don't, you're sunk, end of story.

Last edited by Clyde Schechter; 22 Jun 2017, 09:23.
7 likes
Comment
John Mullahy

Join Date: Dec 2016

Posts: 752
#3

22 Jun 2017, 09:28

Maria: I agree 100% with Clyde, whose arguments are compelling. If you are interested in additional reading on this topic, see this piece on Art Goldberger and his ideas on multicollinearity and "micronumerosity."

http://davegiles.blogspot.com/2011/0...umerosity.html
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5010
#4

22 Jun 2017, 18:02

Paul Allison has a good blog entry on this. But like Clyde, I would be even less concerned than Allison is:

https://statisticalhorizons.com/multicollinearity

Some more thoughts are at

http://www3.nd.edu/~rwilliam/stats2/l11.pdf

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Maria Zela

Join Date: Apr 2017

Posts: 20
#5

24 Jun 2017, 07:33

Thank you so much! That was all I was looking for! I just have one question left: How should I exactly look at the standard errors. What do exactly mean with "adequate precision" ? Is there an exact value for interpretation?
Regards, Maria
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#6

24 Jun 2017, 10:00

The exact value for interpretation depends on your research goals. Here's how I would look at it. You are running these analyses for some reason. You want to estimate some effect(s), and somebody might take certain actions based on the results. (It might be some immediate action, or it might be something as remote as planning to do some different study in the future, or something in between.) The results of your study are there to guide those actions. Examine the confidence intervals and ask yourself: if the value were at the low end of the CI, would it make any practical difference in the real world if the lower end of the confidence interval were the result than if the upper end were? Would anybody do anything differently? If not, then you have adequate precision. If people might act differently in response to the results, then precision is insufficient.

Now, sometimes we do analyses for purely theoretical reasons and we are basically just curious about the magnitude of some effect(s), with no actions contingent on them. In that case, any degree of precision is acceptable--and in that case you just report the result with your confidence interval and say, in effect, "this is what we know, with this degree of uncertainty, based on this study."
2 likes
Comment
Corine EL Habr

Join Date: Jul 2019

Posts: 5
#7

23 Aug 2019, 03:44

Dear Statalist users,
I am regressing a binary variable on a set of continuous variables using a logit model. I realised that 2 of my main independent variables are correlated (0.5 correlation). When one used alone, it has the expected sign. However, when I add the other variable, the sign on the first one changes.
I do not want to drop any of my variables.
Is there a simple way to solve for this??
Thank you
Comment
Kensley Ndovi

Join Date: May 2024

Posts: 1
#8

02 May 2024, 11:32

what is the command for checking multicollinerity in binary logistic regression
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1139

03 May 2024, 08:53

Hello Kensley Ndovi. You could use Phil Ender's collin package.

Code:

net describe collin, from(https://stats.oarc.ucla.edu/stat/stata/ado/analysis)

But be careful to use only the estimation sample. E.g.,

Code:

logit foreign weight mpg price
collin weight mpg price if e(sample)

Output from -collin-:

Code:

collin weight mpg price if e(sample)
(obs=74)

  Collinearity Diagnostics

                        SQRT                   R-
  Variable      VIF     VIF    Tolerance    Squared
----------------------------------------------------
    weight      3.17    1.78    0.3155      0.6845
       mpg      2.88    1.70    0.3469      0.6531
     price      1.42    1.19    0.7066      0.2934
----------------------------------------------------
  Mean VIF      2.49

                           Cond
        Eigenval          Index
---------------------------------
    1     3.7380          1.0000
    2     0.1988          4.3362
    3     0.0589          7.9693
    4     0.0043         29.4270
---------------------------------
 Condition Number        29.4270
 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
 Det(correlation matrix)    0.2462

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2187
#10

03 May 2024, 22:11

Kensley: Why are you checking for multicollinearity? Are the estimates you care about too imprecise to be useful?
2 likes
Comment

Announcement

Multicollinearity in binary logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment