Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • multicollinearity

    Hello,

    When doing my regression, two variables are ommitted because of multicollinearity. I am now wondering what the threshold is, in other words, when does stata decide that a variable should be dropped? is it when the VIF is a certain value? And for the variables that are still included in the model, is the multicollinearity considered as not problematic and are additional tests therefore useless?

    Kind regards,
    Timea De Wispelaere

  • #2
    I believe, though I may be wrong or out of date on this, that variables are dropped for colinearity when Stata determines that the rank of the X'X matrix is less than full. There is some degree of numerical imprecision in that assessment, but for practical purposes you can think of this as absolute collinearity, not simply some highly correlated variables.

    So the fact that variables are retained does not imply that there is no problem with "multicollinearity." That said, I will take this opportunity to again proclaim that multicollinearity is one of the most overrated problems in statistics. Arthur Goldberger's econometrics textbook (see https://www.hup.harvard.edu/catalog....ontent=reviews) does a wonderful job of demolishing the whole concept. And I think anybody who has read it will agree that tests like VIF are a waste of time.

    The only problem that multicollinearity causes is imprecision of the estimates of the coefficients of the variables that are involved in the multicollinearity. Most of the time, those variables are not really of interest any way but are just included to adjust for ("control") their effects on the outcome. Multicolinearity does not impair that adjustment. So unless one (or more) of the multicollinear variables is a variable that is the focus of your study, you can ignore multicollinearity altogether. If you do have a focal variable involved in multicollinearity, then the only thing you need to do is look at the standard error of its coefficient and the confidence interval. If the precision is sufficient for your purposes, then there is nothing more to say. If the precision is poor and you cannot from a practical perspective answer your research question as a result, then you have a problem. But it is not a problem you can really solve without the difficult, often altogether infeasible, resort to getting a (much) larger data set or starting over with a different design (such as matched pairs) that breaks the multicollinearity.
    .

    Comment


    • #3
      Thank you for the very clear answer, Clyde! This helped me a lot.

      Comment

      Working...
      X