Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions regarding multi collinearity and centering of control variable

    Dear all,

    I am running a Generalized Estimating Equations model and wanted to check whether there exists any multi collinearity issues. I am not sure how to do that in Stata. I read that in order to check for multi collinearity, one should regress the model, and then run estat vif. For example, this is the code I am thinking of using to generate vif values:

    . xtset id year

    . xtgee mobility revMM_new n_teams mgr_new1 teamexp_yr c.teamexp_yr#c.teamexp_yr i.teamid i.year if year < 2008 & hlevel >2 & hlevel < 9 & retire~=1 & expansion_mob~=1, family (binomial 1) link(logit) corr(ar1) vce(robust)

    . regress mobility revMM_new n_teams mgr_new1 teamexp_yr c.teamexp_yr#c.teamexp_yr i.teamid i.year if year < 2008 & hlevel >2 & hlevel < 9 & retire~=1 & expansion_mob~=1

    . estat vif


    (a) Is the above procedure correct?

    (b) Also, the revMM_new is a non-binary control variable (it records revenue values). Assuming that revMM_new is likely to generate a high vif value, will it be OK to use mean-centered values of this variable in the model?

    Kindly let me know. Your help will be greatly appreciated.

    Thanks.

  • #2
    a) That is fine. Remember that multicolinearity is just a property of the explanatory variables, so the model is irrelevant.

    b) If you mean by mean centering removing the overal mean, then that will do exactly nothing as it is just a linear transformation. If you mean remove the mean of each panel, then you change the meaning of the variable (OK, I think I am being mean, with all those means, but I think you know what I mean, right?). Whether or not that is a good or a bad thing is up to you, as long as you interpret the results correctly.

    Remember that multicolinearity is in itself not a problem that needs to be fixed. Your model will still be correct with multicolinearity present. It just means that you don't have the statistical power you would hope to get based on just the number of observations. However, the loss of power is just an accurate representation of the information you have available to you in the data, so that is unfortunate (we always want more power, but not a problem.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Many thanks, Maarten. By mean centering, I did "mean" removing the overall mean.

      I wanted to mean center the control variable because its VIF value did turn out to be slightly greater than 10 (around 11.08) and I previously read that this problem can potentially be solved via mean centering. However, after reading other posts in this forum, it seems that I should not be too worried about high VIF values as long as the variable in question is a control in the model. The good news is, my main variables of interest have low VIF values.

      Comment

      Working...
      X