When not to be concerned about collinearity?

Miguel A. Duran

Join Date: Apr 2014

Posts: 47
#1

When not to be concerned about collinearity?

08 Apr 2015, 06:08

In a recent post (http://www.statalist.org/forums/foru...coeff-of-logit), Richard Williams provided a link to P. Allison's answer to the question "When can you safely ignore multicollinearity?" (http://statisticalhorizons.com/multicollinearity). However, I have the following concern about this answer.
My basic model (excluding control variables) is as follows:
y = c0 + c1 · xm + c2 · xx + c3 · mm, where y is the dependent variable, and xm, xx and mm are dummy variables (I use OLS).
I would also like to analyze the effect of the interaction terms between another continuous and centered variable (zs) and xm, xx and mm. The resulting model is:
(1) y = c0 + c1 · xm + c2 · xx + c3 · mm + c4 · zs + c5 · xm · zs + c6 · xx · zs + c3 · mm · zs.
In this case, the command -collin xm xx mm zs xm·zs xx·zs mm·zs- indicates that zs has a VIF equal to 15.01. Although this value is above a threshold of 10, according to P. Allison, ignoring a potential multicollinearity problem would be safe in this case.
However, I could perform my analysis in a different way. Specifically, I could use different regression equations for each interaction effect. For instance, for the interaction effect corresponding to xm, the model would be now:
(2) y = c0 + c1 · xm + c2 · xx + c3 · mm + c4 · zs + c5 · xm · zs
In this case, although I am using interaction terms, collin indicates that there is no variable with a VIF above 10. Indeed, the VIF associated to zs is 3.37.
My question is whether I should stick to (1). Or is the the VIF of zs in (1) a good enough reason to use (2) instead of (1)?
Thanks in advance.

Last edited by Miguel A. Duran; 08 Apr 2015, 06:11.
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3850
#2

08 Apr 2015, 06:32

However, I could perform my analysis in a different way. Specifically, I could use different regression equations for each interaction effect.

No you cannot. Or, better stated, you can, but the two approaches are not equivalent. In (1) the effect of zs depends on xm, xx and mm, whereas in (2) it only depends on xm. In (3) and (4) - both not shown - it depends on xx or mm, respectively, but not on xm. This means that the coefficients in (1) have a different interpretation of those in (2) (and (3) and (4))).

Let me add that VIFs are, in my view, not a good reason to decide which model(s) to estimate. Substantive theory is.

Best
Daniel

Last edited by daniel klein; 08 Apr 2015, 06:37.
Comment
Miguel A. Duran

Join Date: Apr 2014

Posts: 47
#3

08 Apr 2015, 06:58

Thanks for your answer, Daniel. I did not state in my first post that xx, xm and mm are mutually exclusive dummies, i.e., if xm, for instance, is 1, both xx and mm are 0. Therefore, if I am not mistaken, zs would have the same interpretations in (1) and (2-4).
Nevertheless, if zs had different interpretations, this would not too relevant in my anlysis, because my interest lies in the interaction terms. And my concern is whether I should analyze these interactions terms in a unique equation or in three different equations. In particular, whether the VIF of zs in (1) is a reason to use (2-4) instead of (1).
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#4

08 Apr 2015, 07:11

What does it mean, that xm, xx and mm are mutually exclusive? Do they represent one categorical variable? I think you need to tell us more about the substantive content of your analyses. Note that interacting a continuous variable with a categorical one, using only selected levels of the latter in the multiplicative terms invalidates the hole approach.

Best
Daniel
Comment
Miguel A. Duran

Join Date: Apr 2014

Posts: 47
#5

08 Apr 2015, 07:31

I am sorry for not having been clear enough. I am studying the effect on the use of credit cards of four different types of contracts: xm, xx, mm and a fourth one, uu (which is not considered to avoid the dummy variable trap). Thus, if the contract is of type xm, the variable xm is equal to 1 (otherwise is 0). What I mean by "mutually exclusive" is that a contract cannot be at the same of two different types (i.e. if xm = 1, then xx, mm and uu are equal to 0). The variable zs measures individuals' financial distress. What I said before about the fact that my main interest is to analyze the effect of the interaction terms means that the key question of my research goes like this: what is the effect of a given contract if financial distress increases? And my concern, as I stated in my previous posts, is whether I should use a single regression equation. If you need me to clarify anything else, please do not hesitate to let me know. Thanks in advance.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#6

08 Apr 2015, 07:53

Ok, so you have the case of categorical variable (type of contract) interacted with a continuous one (financial distress). In that case the answer is simple: you cannot use the second approach.

As exaplained above, this is because in your original post, in (1) c3 represents the expected differences in y between contract type xx and uu if financial distress equals 0 (which corresponds to its sample mean after centering), whereas in (2) c3 represents the the expected differences in y between contract type xx and uu irrespective of level of financial distress. The same is, of course, true for c4. Therefore the models are not equivalent and (2) is not a valid way of representing the interaction effect.

Best
Daniel

Last edited by daniel klein; 08 Apr 2015, 07:56.
Comment
Miguel A. Duran

Join Date: Apr 2014

Posts: 47
#7

08 Apr 2015, 08:00

Thanks, Daniel, for your help.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2167
#8

08 Apr 2015, 09:34

I can't say I enjoyed scanning Paul Allison's blog entry and the questions that were asked. It was pretty depressing, actually. I like a lot of what he does, but in the article about multicollinearity about 10% of his suggestions/interpretations are correct, 10% are ambiguous, and 80% are misleading or wrong. I saw almost no mention of a standard error; the focus is on the VIFs. I didn't see a clear discussion of how centering before constructing interactions changes the effect on the levels to be interpreted as average marginal effects.

To make things worse, he references my introductory book -- as if one can go there to corroborate his advice. Rather than emphasize VIFs as a useful tool, I discuss why they aren't very useful. I agree with Daniel.

As for Miguel's specific problem, here as some thoughts. (1) What is the main purpose of your regression? Is it to estimate the effects of the dummy variables and the continuous variable? (2) What kinds of standard errors are you getting on the variables of interest? (3) If you estimate the model with the full set of interactions and can conclude that one or more are statistically insignificant, you could justify dropping them. But it makes no sense to include them one at a time.

How large are the coefficients and standard error on zc? It's coefficient measures the effect for the base group, uu. It could be that you can't estimate that particular effect very precisely given the data, but you can estimate the effect for other types of contracts more precisely. So be it. You start with a genera model where each contract has a level effect and the effect of financial distress can depend on the type of contract. You've done everything right. Some effects are easier to estimate than others. Looking at VIFs is not going to change that.

If you want further comments/help, you need to show Stata output.

JW
Comment

Miguel A. Duran

Join Date: Apr 2014
Posts: 47

08 Apr 2015, 10:40

Thank you very much, Jeff. Your comments and Daniel's point in the same direction, so I will stick to my initial model.
This is the Stata output of my main analysis (xm_zs, xx_zs and mm_zs are the interaction terms):

Code:

------------------------------------------------------------
                    types             (0)             (1)   
------------------------------------------------------------
_cons               0.508**         0.546**         0.536**
                  (0.199)         (0.219)         (0.212)   

xm                 0.0290          0.0329          0.0629   
                 (0.0505)        (0.0515)        (0.0536)   

xx                 -0.111**        -0.109**        -0.112**
                 (0.0464)        (0.0479)        (0.0476)   

mm                -0.0432         -0.0519         -0.0461   
                 (0.0408)        (0.0425)        (0.0416)   

zs                                 0.0112          0.0300**
                                (0.00813)        (0.0143)   

xm_zs                                             -0.0390**
                                                 (0.0184)   

xx_zs                                             -0.0184   
                                                 (0.0146)   

mm_zs                                             0.00112   
                                                 (0.0169)   

cv                    Yes             Yes             Yes   

quarter               Yes             Yes             Yes   

sic                   Yes             Yes             Yes   
------------------------------------------------------------
N                    2042            1953            1953   
R-sq                0.339           0.349           0.361   
adj. R-sq           0.321           0.331           0.342   
------------------------------------------------------------

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2167
#10

08 Apr 2015, 17:08

Looking at those results, I'm not even sure why you would check for collinearity. The coefficients on the insignificant interactions are substantially smaller than on zs and the interaction between xm and zs.
Comment

Announcement

When not to be concerned about collinearity?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment