Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multicollinearity in GLM (Gamma) with binary predictors

    I am running a GLM with a Gamma family and log link, as my dependent variable is a strictly positive continuous variable with a heavy right-skewed distribution.

    My independent variables are all categorical (binary):

    X1 (2 categories): main effect of interest
    X2 (binary): hypothesized moderator of X1
    X3 (binary): hypothesized moderator of X1

    My model is:
    glm Y i.X1##i.X2 i.X1##i.X3, family(gamma) link(log) vce(robust)

    Since Stata does not support vif after glm, I ran an equivalent OLS model and found that X3 and its interaction term i.X1#i.X3 both have VIF > 10, suggesting multicollinearity. Further inspection revealed that X3 is heavily imbalanced across categories of X1 (0.6% vs 15.2%).

    My questions are:
    1. Given that all my independent variables are binary, is VIF still a valid diagnostic for multicollinearity, or should I use GVIF instead?
    2. In the context of GLM (Gamma family), is multicollinearity a concern that requires correction?
    3. If correction is needed, what is the most appropriate approach given that:
      • My theoretical hypothesis requires the X1 × X3 interaction term
    Any guidance on how to handle this situation would be greatly appreciated!

  • #2
    1. Yes
    2. Multicollinearity is never a concern, it just describes a state of your data. You need to accurately describe your data, but once collected, there is nothing you can do to fix data. Any "fixed" just hide it, but won't make it go away. Hiding characteristics of your data is often a bad idea, but since you are substantively interested in that interaction term, there is really nothing you can do about it.

    This means you are more likely to get non-significant results. That is not fun, but that accurately represents the amount of information present in your data.
    3. A correction is not needed and not possible.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Since you are using a quasi-likelihood model (because you used the vce(robust) option), you will get more robust results when you use the family(poisson) option.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thank you, Maarten!

        Two quick follow-ups:
        1. Would dropping the main effect of X3 while retaining the interaction term be problematic, both for hiding the data structure and for violating the hierarchy principle?
        2. Is your recommendation simply to keep the original model as-is, but acknowledge the collinearity and its implications (larger standard errors, reduced power) when reporting?

        Comment


        • #5
          1. Do not drop the main effects! Leaving the main effect out really changes the model such that it no longer answers the question you want to answer. That is really really bad. There are ways to remove the main effects and still get a valid model, but those introduce the multicollinearity back in again, and you really need to know what you are doing when you try that. So it won't solve anything and there is a large chance you will make a mistake and make everything worse. In short, don't do that.
          2. Yes!
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            I agree with Maarten: I wouldn't drop the main effects. A couple of additional points:

            1. Using the gamma family is the same as using the exponential distribution, but with standard errors that are robust. Because the exponential distribution is in the linear exponential family, it is just a robust for estimating the conditional mean function as the Poisson. I would lean towards gamma in this case because the "working" variance assumption for the gamma (variance proportional to the square of the mean) probably is better than for the Poisson (variance proportional to the mean). It's a matter of efficiency only, though. You can try both with robust standard errors.
            2. You haven't shown us any output to see whether the results are statistically significant. These questions can't be answered in a vacuum. What is the t statistic on X1*X3? Is the effect practically large? If the answer is yes to both, why are you even looking at VIFs?

            Comment

            Working...
            X