Problem of collinearity onces added district fixed effects

Kerstin Schmidt

Join Date: Apr 2017

Posts: 125
#1

Problem of collinearity onces added district fixed effects

23 Apr 2017, 12:41

Dear Statalist,

I am running a logit regression in Stata 14.2. This is to test how x1 (continuous and centered), x2 (continuous and centered) and the interaction term x1*x2 affect the probability of y. Using the following commands gives no evidence on collinearity between the independent variables with VIF values around 1.0 (see picture 1_ and 2_):

Code:

logit y c.centered_x1##c.centered_x2 vif, uncentered

However, if I add district fixed effects (39 districts) to the model with i.ubigeo (see picture 3_), x1 and x2 become highly collinear (VIF values of 22.74 and 28.64 when excluding the interaction term; see picture 4_) and the interaction term is eventually omitted in the regression due to collinearity. This poses a problem as it is one of my variables of interest.

I wonder if this is only because I add 38 dummy variables or if there is another reason for this to happen?

Any help is very much appreciated!!

PS: I added pictures of the Stata outputs for better clarity. Further, I am very much aware that the Pseudo R2 is quite low - this is only a version without control variables, which would not make any difference in the problem of collinearity.
Attached Files
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17748
#2

23 Apr 2017, 14:50

Kerstin:
I would test whether -i.ubigeo- is worth keeping via -parmtest-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kerstin Schmidt

Join Date: Apr 2017

Posts: 125
#3

24 Apr 2017, 01:39

Dear Carlo, thanks for your reply!
Running testparm i.ubigeo after the logit regression gives me a Prob>F of 0.0000. Therefore, we reject the null that the coefficients for all districts are jointly equal to zero and district fixed effects are needed in this case. What do you think?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17748
#4

25 Apr 2017, 11:36

Kerstin:
thanks for providing further clarifications.
My guess is that your regression model can't support that specification.
With so many observations (by the way, from your posts I assume that they are all independent, i.e., that you're not dealing with a panel dataset), the evidence that -i.ubigeo- is significant might be due to sample size.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Problem of collinearity onces added district fixed effects

Comment

Comment

Comment