CLOGIT including an interactive term (between two dummy variables), but one of the dummy variables is being dropped due to multicollinearity

Thiago Silva

Join Date: Jan 2017

Posts: 8
#1

CLOGIT including an interactive term (between two dummy variables), but one of the dummy variables is being dropped due to multicollinearity

24 Jan 2017, 09:18

I have a question on how to properly interpret the estimates of a CLOGIT including an interactive term (between two dummy variables) into the model, but in which one of the dummy variables (its constitutive term) is being dropped due to multicollinearity.

I have the following variables:

Dependent variable:
PresidentsParty = categorical variable with several categories (8)
Independent variables:
Vote of no confidence = dummy (1 = yes; 0 = no) Multiparty system= dummy (1 = yes; 0 = no)
I am trying to estimate the following model:
PresidentsParty = B1*Vote of no confidence + B2*Multiparty System + B3*(Vote of no confidence*Multiparty system)
I am using Stata clogit command for estimating the model. The problem is: B2*Multiparty System is being dropped due to multicollinearity. Everything else is running. Therefore, my questions are:
Considering that "B2*Multiparty System" is being dropped, could I go ahead and run the model specified?

How can I know/interpret the effect of "Multiparty System" (alone) on my dependent variable "PresidentsParty" (taking into account that "B2*Multiparty System" is dropped)?

Considering that "B2*Multiparty System" is being dropped, do I still need the constitutive term "B1*Vote of no confidence" in my model or should I just run the interactive term? [i.e., PresidentsParty = B1*(Vote of no confidence*Multiparty system)] In this case, how do I interpret my coefficients? And question 2 remains: How can I know the effect of only "B2*Multiparty System" on my dependent variable "PresidentsParty"?
Tags: categorical, clogit, interaction, Multicollinearity, regression
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

24 Jan 2017, 12:57

In general, whenever a model includes an interaction term, it should also include the "main effects" (what you refer to as "constitutive terms") as well. There is one clear case where this rule can (must) be violated: in a fixed-effects model where the information carried by a constitutive term is already represented by the fixed effect. This is your situation here. Each country either is or is not a multiparty system at all times. So that information is already represented by the fixed effect and the multiparty system indicator ("dummy") variable is not necessary here. In fact, if you try to include it, as you have seen, Stata omits it anyway. So the answer question 1 is yes.

The effect of multiparty system on your dependent variable cannot be estimated in a fixed effects model, again because of collinearity with the fixed effect itself. Only effects of attributes of the countries that vary over time can be estimated with fixed effects models. If your research goals require an estimate of the effect of having a multiparty system then you cannot use a fixed effects model for your analysis. Look into random effects models. Or, if such a thing is possible, find a data set with countries which have switched between multi-party system and non-multi-party system over the time spanned in your data set. (Of course, it may well be that such switches are themselves endogenous, so this solution might not work.) So the answer to question 2 is: it cannot be done with this analysis.

The general rule remains that the constitutive terms (I like that phrase!) must be included, except when they can't. But there is no reason why Vote cannot be included: it does vary from one observation to the next within the same country. Since it can be included, it must be. So the answer to question 3 is :yes, you still must include Vote.

Finally, the easy way to code these models is with factor variable notation. See -help fvvarlist- for details. In this case, all you have to code is:

Code:

xtreg PresidentsParty Vote##Multiparty, fe

Stata will unpack that and include the Vote and the Vote X Multiparty interaction terms. It will also "start to include" Multiparty, but on recognizing its colinearity with the fixed effects, it will automatically omit it for you. You can't go wrong this way. Stata always handles these correctly.
1 like
Comment
Thiago Silva

Join Date: Jan 2017

Posts: 8
#3

24 Jan 2017, 13:54

Thank you so much, Clyde Schechter!
Your answer was really helpful.

Do you have any idea on what method to conduct in order to estimate the effect of "Multiparty" on my dependent variable?

I realized that when I run mlogit (instead of clogit), "Multiparty" is not dropped, the direction and statistical significance of the estimates are the same as I found using clogit. But I do not know if this is always the case (i.e., I do not know if this is correct, and what are the shortcomings/mistakes of using mlogit to a data that is actually for clogit).

I also thought about using a similar approach as we use for solving the "separation" problem, based on a penalized likelihood correction model, but I don't know if I am doing this correctly. Any idea?

Thanks again!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

24 Jan 2017, 14:23

Do you have any idea on what method to conduct in order to estimate the effect of "Multiparty" on my dependent variable?

I would use a random effects model, -xtlogit, re-, instead of -clogit-. But you should discuss this with a colleague in your field. Random effects models are predicated on assumptions that are not always correct and seldom verifiable. In particular, they do not automatically remove "omitted variable bias" for all time-independent attributes. They also rely on distributional assumptions that may be incorrect. Some disciplines take a rather skeptical view of random effects models as a result. Other disciplines accept them easily, or after some scrutiny. (Actually, even econometrics accepts random effects models if they pass a Hausman test.) So find out what is considered acceptable in yours. That said, it is in principle impossible to construct a model that automatically guarantees removal of all omitted variable bias for time-independent attributes but also estimates the effect of a time-independent attribute. You just can't have it both ways. Anything you do to estimate this effect will represent a compromise here.

As for -mlogit-, no it cannot be substituted for -clogit-. I suspect your data set is rather unusual if you got the same results from both of them. That should seldom happen. And -mlogit- fails to take into account the repeated-measures nature of your data; it treats all observations as if they were independent, which they are clearly not in your case.
1 like
Comment

Announcement

CLOGIT including an interactive term (between two dummy variables), but one of the dummy variables is being dropped due to multicollinearity

Comment

Comment

Comment