Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • General Question

    Dear all,

    I want to kindly, ask can I keep dummy variables in my regression, even if they have a VIF higher then 10?

    Thank you very much!

    Q&A User

  • #2
    Yes. When you have indicator ("dummy") variables for a discrete variable, it is quite common for them to be strongly correlated with each other. For example, if you have a variable with 3 levels, and if the "base" category is relatively uncommon, then the indicators for the other two levels will add up to 1 in many of the observations, so they will be nearly colinear. What you need to do is not determine whether you have multicolinearity, but determine whether you have a multicolinearity problem. Most multicolinearity is innocuous.

    The important issue is what the role of this discrete variable is in the model. If it is introduced simply as a covariate, ("control variable") and the variables whose effects must be estimated to answer the research questions do not participate in the near-colinearity, then it doesn't matter how colinear these indicators are with each other. True they will degrade the precision of the estimation of their own coefficients, but they will not affect the estimation of the key variables of interest. So this kind of multicolinearity should be ignored.

    One circumstance in which it could be a problem is if the discrete variable is itself a key variable in the model, one whose effect is to be estimated as a goal of the research. Another circumstance in which it could be a problem is if the discrete variable is itself not a key variable but the near-colinearity that VIF is detecting also involves a key variable. In either of these cases the estimation of the key variable's coefficient will be noisy and could be unstable. The way to know whether there is a real problem or not is by looking at the confidence interval around the key variable's coefficient. If that confidence interval is so wide that the estimate of the coefficient is so uncertain that you cannot arrive at an answer to your research question(s) using it, then you have a multicolinearity problem; otherwise you don't.

    The worse news is that in the (unlikely) event that you actually do have a multicolinearity problem, there really isn't much you can do about it. In particular, removing the offending variable "solves" that problem at the expense of introducing omitted-variable bias, which is, I think, at best, just as bad, and often worse. The only real solution is to either get a much, much larger data set, or to scrap the project and start over using a different data gathering design that breaks the near-colinearity involving the key variable.

    Notice that I have not, so far, even mentioned VIF. VIF really has no role in determining whether you have a multicolinearity problem. Estimating VIF is a vastly overused procedure. The only real use for it is if you run your regression and find you have unsatisfactory results due to unexpectedly wide confidence intervals around the key coefficient(s). One possible cause for that would be a multicolinearity involving the key variable(s). In this situation, VIF can help you find out whether the key variable participates in a multicolinearity relationship. But the important thing is not to look to see if there are any variables with high VIF, you must look first to see if the key variable(s) have high VIF. If they do not, then multicolinearity is not a problem for you. If they do, then seeing what other variables have high VIF can help you figure out which are the variables that are damaging your estimation of the key variable coefficient(s) by virtue of a near-colinearity.

    I imagine that a great deal of productive activity could have been accomplished with all the time and effort that the world has needlessly wasted fretting about high VIFs. I highly recommend you get hold of a copy of Arthur Goldberger's textbook of econometrics. In it there is a highly informative and highly entertaining chapter about multicolinearity--which he refers to by a much more appropriate name, hyponumerosity. He fully takes down the entire concept of multicolinearity in it. His points are essentially what I have outlined here, but written with great humor and clarity.

    Added: In the future, please give your threads informative titles. These threads are not just dialogs between a questioner and a responder. Other people come to search for answers to questions here, believing (often correctly) that their question has been asked and answered previously. When a thread's title does not reasonably describe what is being discussed, those others will either fail to find an answer that should be available to them, or will waste their time reading something that is irrelevant to their needs.
    Last edited by Clyde Schechter; 30 Dec 2023, 11:28.

    Comment


    • #3
      Thank you so much Clyde!! I will definitely get hold of Arthur Goldberger's textbook of Econometricts!!

      Comment

      Working...
      X