Dear listers,
I am working on a paper on economics (industrial organization) where the empirical strategy relies on a probit.
My model is
p (y=1 | X) = Φ(β1.x1 + β2.x2 + β3.x1*x2 + β4.x3 + β5.fx1*x3 + β6.x4 + β7.x1*x4 + ψ + u)
For information:
x1 is binary
x2, x3 and x4 are continuous.
x2 has a normal distribution for it was log transformed whereas x3 and x4 are left skewed.
I was worried with potential collinearity arising from the (relatively) strong correlation between some of the variables - for example, x2 and one of the controls included in ψ (lets call it c1) .
I was then in doubt if I should run a collinearity test or just ignore it once collinearity would arise anyway due to the presence of three interaction terms.
I had the impression that I should have a look at it anyway. So I ran -collin- and of course I obtained high VIFs for some interaction terms and respective main variables (particularly x2 and x1, that are highly correlated).
What I thought I could do:
1) Just ignore it.
2) center the continuous variables at zero, and rerun the test
3) run the test without the interaction terms (with the continuous variables centered at zero) - my idea being to check whether the diagnosis of collinearity would point to some worrisome result if there were no interaction terms.
I did 2 and 3.
My question is: is this approach correct? Should I do something else? Or just accept the fact that interaction terms will make collinearity arise and ignore everything?
Thank you in advance for your help.
Best,
Jo
I am working on a paper on economics (industrial organization) where the empirical strategy relies on a probit.
My model is
p (y=1 | X) = Φ(β1.x1 + β2.x2 + β3.x1*x2 + β4.x3 + β5.fx1*x3 + β6.x4 + β7.x1*x4 + ψ + u)
For information:
x1 is binary
x2, x3 and x4 are continuous.
x2 has a normal distribution for it was log transformed whereas x3 and x4 are left skewed.
I was worried with potential collinearity arising from the (relatively) strong correlation between some of the variables - for example, x2 and one of the controls included in ψ (lets call it c1) .
Code:
| y x1 x2 c1 x3 x4 -------------+------------------------------------------------------ y | 1.0000 | | x1 | 0.4694 1.0000 | 0.0000 | x2 | 0.4252 0.7958 1.0000 | 0.0000 0.0000 | c1 | 0.1646 0.2233 0.4109 1.0000 | 0.0000 0.0000 0.0000 | x3 | 0.1820 0.1661 0.2712 0.2887 1.0000 | 0.0000 0.0000 0.0000 0.0000 | x4 | -0.1412 -0.0592 -0.0333 -0.0135 0.0469 1.0000 | 0.0000 0.0771 0.3203 0.7204 0.1615
I had the impression that I should have a look at it anyway. So I ran -collin- and of course I obtained high VIFs for some interaction terms and respective main variables (particularly x2 and x1, that are highly correlated).
Code:
SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- x2 16.68 4.08 0.0600 0.9400 x1 58.58 7.65 0.0171 0.9829 x2*x1 30.47 5.52 0.0328 0.9672 x3 1.40 1.18 0.7132 0.2868 x1*x3 1.38 1.18 0.7238 0.2762 x4 1.56 1.25 0.6404 0.3596 x1*x4 2.53 1.59 0.3946 0.6054 c1 1.33 1.15 0.7514 0.2486 c2 1.03 1.01 0.9708 0.0292 ---------------------------------------------------- Mean VIF 12.77
What I thought I could do:
1) Just ignore it.
2) center the continuous variables at zero, and rerun the test
3) run the test without the interaction terms (with the continuous variables centered at zero) - my idea being to check whether the diagnosis of collinearity would point to some worrisome result if there were no interaction terms.
I did 2 and 3.
Code:
SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- x2 16.68 4.08 0.0600 0.9400 x1 3.42 1.85 0.2923 0.7077 x2*x1 11.74 3.43 0.0852 0.9148 x3 1.40 1.18 0.7132 0.2868 x1*x3 1.38 1.18 0.7238 0.2762 x4 1.56 1.25 0.6404 0.3596 x1*x4 2.53 1.59 0.3946 0.6054 c1 1.33 1.15 0.7514 0.2486 c2 1.03 1.01 0.9708 0.0292 ---------------------------------------------------- Mean VIF 4.56SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- x2 2.43 1.56 0.4110 0.5890 x1 2.25 1.50 0.4448 0.5552 x3 1.11 1.05 0.9013 0.0987 x4 1.02 1.01 0.9835 0.0165 c1 1.22 1.10 0.8207 0.1793 c2 1.03 1.01 0.9739 0.0261 ---------------------------------------------------- Mean VIF 1.51Code:
My question is: is this approach correct? Should I do something else? Or just accept the fact that interaction terms will make collinearity arise and ignore everything?
Thank you in advance for your help.
Best,
Jo
Comment