Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem of collinearity onces added district fixed effects

    Dear Statalist,

    I am running a logit regression in Stata 14.2. This is to test how x1 (continuous and centered), x2 (continuous and centered) and the interaction term x1*x2 affect the probability of y. Using the following commands gives no evidence on collinearity between the independent variables with VIF values around 1.0 (see picture 1_ and 2_):
    Code:
    logit y c.centered_x1##c.centered_x2
    vif, uncentered
    However, if I add district fixed effects (39 districts) to the model with i.ubigeo (see picture 3_), x1 and x2 become highly collinear (VIF values of 22.74 and 28.64 when excluding the interaction term; see picture 4_) and the interaction term is eventually omitted in the regression due to collinearity. This poses a problem as it is one of my variables of interest.

    I wonder if this is only because I add 38 dummy variables or if there is another reason for this to happen?

    Any help is very much appreciated!!

    PS: I added pictures of the Stata outputs for better clarity. Further, I am very much aware that the Pseudo R2 is quite low - this is only a version without control variables, which would not make any difference in the problem of collinearity.
    Attached Files

  • #2
    Kerstin:
    I would test whether -i.ubigeo- is worth keeping via -parmtest-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo, thanks for your reply!
      Running testparm i.ubigeo after the logit regression gives me a Prob>F of 0.0000. Therefore, we reject the null that the coefficients for all districts are jointly equal to zero and district fixed effects are needed in this case. What do you think?

      Comment


      • #4
        Kerstin:
        thanks for providing further clarifications.
        My guess is that your regression model can't support that specification.
        With so many observations (by the way, from your posts I assume that they are all independent, i.e., that you're not dealing with a panel dataset), the evidence that -i.ubigeo- is significant might be due to sample size.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X