Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cut off for correlation matrix for variables in regression

    One of ways to deal with correct specification in regression model, it is advisable to run correlation matrix for variables i the regression in stata using vce, corr. What is the correlation cut off for 2 variables to indicate likelihood of missepefication such that one of the variables should be drooped?

  • #2
    There is no such cutoff, and, in my opinion, you should not do this anyway. Searching for multicolinearity, whether through a correlation matrix or VIFs, etc. accomplishes nothing useful.

    There are two different kinds of predictor variables in a regression model. There are those that are the focus of your research question: you want to estimate their effects on the outcome with reasonable precision. The other kind are those which are included because their effects must be controlled for to reduce confounding (omitted variable) bias; they are of no interest in their own right and are just nuisance variables..

    If all of the variables in a near colinear relationship are of the second type, then it doesn't matter that they are nearly colinear. That colinearity does not in any way affect your ability to properly estimate the effects of the uninvolved variables that you are actually interested in. It does impair your ability to estimate the effects of these nuisance variables--but you have no need to do that anyway. The colinearity does not impair the ability of the regression model to adjust for the confounding effects and reduce bias. So this type of colinearity can be ignored. It only affects things you don't need to estimate anyway.

    If one or more of the variables whose effects you do need to estimate precisely is involved in a near colinear relationship, then you may have a problem, even a serious one. But there is nothing you can do about it if this is the case. All that is needed is to know that you have the problem so that you do not rely on unreliable estimates. How can you know if you have a colinearity problem? It is quite simple. If the colinearity is close to a perfectly linear relationship, so close that the regression coefficients cannot be calculated due to a singular matrix, Stata will give you a warning message when you run the regression and it will drop one of the involved variables. If the colinearity is not severe enough that Stata is forced to do that, you will recognize the problem because the standard error(s) of the coefficient(s) of the involved variable(s) will be very large, and the confidence intervals correspondingly will be very wide. So if you get an estimated regression coefficient with an unreasonably large standard error, that is a sign of a colinear relationship involving that variable. There is, however, nothing you can do to fix the problem. There are transformations of those variables that can make the colinearity go away, but then you are no longer estimating the effect of the variable you said you were interested in--you are estimating something else that may or may not be useful. You can drop one of the colinear variables: but you don't want to drop a variable that is the focus of your research, and if you drop one of the nuisance variables, you may be reintroducing confounding bias into your estimates--so that's not good either. The only solutions that leave you with valid results require getting a different data set altogether. One approach is to get a much larger data set--but it typically has to be a great deal larger, so that this approach may be impractical. Another approach is to scrap your existing study and gather new data with a different data collection design that breaks the colinear relationship, for example by the use of matching.

    So my advice is to skip the testing for colinearity and just look at your regression results. If the confidence intervals on the variables you are interested in are narrow enough to be useful, then you are in good shape regardless of whether some variables are involved in colinearity or not. If they are not, you cannot accomplish your original research goals with your existing data and you have to accept that as a limitation of the current study.




    Comment

    Working...
    X