Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to measure multicollinearity

    Hello,

    I am trying to test for multicollinearity in multilevel longitudinal logistic regression models. I am using the gllamm command, a user contributed command that can be used for multilevel models.

    ​​​​​​​I am not having success using vif after my gllamm commmand.

    I have tried running my regression model, then trying various various syntx for vif with the following error messages:


    Code:
    . xi: gllamm Garden_Active_ i.Year LCommunity_Garden LMarket_Garden LPickups_ i.r_L_volunteer_3_max LUR_Curr_Yr_or_Prior_ , i(Garden_ID) family(binomial) link(logit) nip(10) adapt
    . vif
    not appropriate after regress, nocons;
    use option uncentered to get uncentered VIFs
    r(301);
    
    . vif, uncentered
    variable _cons not found
    r(111);
    
    . estat vif
    subcommand estat vif is unrecognized
    r(321);
    Is there a way that I can get VIF to work with gllamm? Or is there another method that I can use to test for multicollinearity?

    Thank you,
    alyssa

  • #2
    -vif- can only be used after regress. But multicolinearity does not depend on the nature of the model, it just depends on the covariance matrix of the predictor variables. So just run -regress- using the same variables you are using in your multilevel model, and then run -vif-.

    That said, if you search this Forum you will find many posts pointing out that testing for multicolinearity is usually a waste of time. If you can get your hand on Goldberger's Textbook of Econometrics, there is an entertainingly written chapter there explaining in great detail why. The short version is that it usually isn't a problem even when present, you don't have to run programs like -vif- to determine whether it is a problem, and if it is, there's usually nothing you can do about it anyway.

    Comment


    • #3
      Alyssa, there is an article Multicollinearity: What about correlated explanatory variables in a default model? that pulled together quotes on multicollinearity from a number of econometric textbooks. Several of them mentioned the Goldberger parody that Clyde references. I've quoted a few of them below.

      Textbooks where the quotes came from:
      Goldberger, Arthur S. A Course in Econometrics, Harvard University Press, 1991.
      Hansen, Bruce E. Econometrics, University of Wisconsin, January 15, 2015.
      Maddala, G. S. Introduction to Econometrics, third edition, John Wiley & Sons, 2005.
      Studenmund, A. H. Using Econometrics: A Practical Guide, Addison-Wesley, 1997.


      From Hansen, pages 105-107:
      “Some earlier textbooks overemphasized a concern about multicollinearity. A very amusing parody of these texts appeared in Chapter 23.3 of Goldberger’s A Course in Econometrics (1991), which is reprinted below.
      [Is apparently included in the textbook, but not in the list of quotes]
      .
      [See page 107 of Hansen for the Goldberger parody of the concern with multicollinearity concerns]
      From Maddala, page 267
      “…Multicollinearity is one of the most misunderstood problems in multiple
      regression…there have been several measures for multicollinearity suggested in
      the literature (variance-inflation factors VIF, condition numbers, etc.). This
      chapter argues that all these are useless and misleading
      . They all depend on the
      correlation structure of the explanatory variables only…high inter-correlations
      among the explanatory variables are neither necessary nor sufficient to cause the
      multicollinearity problem. The best indicators of the problem are the t-ratios of the
      individual coefficients. This chapter also discusses the solution offered for the
      multicollinearity problem, such as ridge regression, principal component
      regression, dropping of variables, and so on, and shows they are ad hoc and do
      not help. The only solutions are to get more data or to seek prior information.”
      From Studenmund, page 264
      “The major consequences of multicollinearity are
      1. Estimates will remain unbiased…
      2. The variances and standard errors of the estimates will increase…
      3. The computed t-scores will fall…
      4. Estimates will become very sensitive to changes in specification…
      5. The overall fit of the equation and the estimation of non-multicollinear
      variables will be largely unaffected…”
      Finally, they quote from a Professor Robert Jarrow (but don't list the source of the quote):
      “The only concern with multicollinearity in a regression is that the standard errors
      of the independent variables in the set of correlated variables will be large, so that
      the independent variables may not appear to be significant, when, in fact, they are.

      Comment


      • #4
        Hello,

        Thank you both for your responses. I have a few follow-up questions for Clyde Schechter :
        1.I have read some of the posts on this forum talking about how multicollinearity is rarely a problem, but when it is a problem you can't do anything about it. But isn't it still important to know that you have it? Aren't you violating assumptions by having multicollinearity? I suspect I may have a multicollinearity problem because I am having model instability: I have a huge change in P-values for some variables depending on what other variables are in the model (and by huge I mean P=0.000 when these variables are by themselves, changing to P=.98 with many other variables in the model). Are you aware of any other issues that may cause this model instability?
        2. As for measuring multicollinearity using simple regression: I have read an article on this, which says to regress your variables of concern. For example if your variables of concern are a, b, and c, run the following regressions to determine if there is collinearity: regress a b c, regress c a b, regress b a c. However, most of my variables of concern are either binary or ordinal, so is this method still valid using non-continuous variables?

        Many thanks,
        Alyssa

        Comment


        • #5
          Well, changes when additional variables are added to the model doesn't really have much to do with multicollinearity. It can happen as a result of adding a variable that confounds the original association--and the fact that it did can be a very important fact, one that should be reported as part of your results, not dealt with as a problem. Also I wouldn't look at the p-value change as an indication of problems: do the coefficients change appreciably? The p-value change could be due to loss of sample size due to missing values in the added variables. Another possible issue is if the ratio of the sample size to the number of predictors is getting too close to 1.

          You can tell you have a multicolinearity problem if the standard error(s) of the key variable(s) (i.e. the ones whose effects you are interested in, no the ones that are included to adjust for confounding) are too wide. You don't say whether you have this situation or not. But let's say for sake of discussion that you do. What can you do? Well, you can remove one or more of the variables involved in the multicolinearity from the model. But clearly you don't want to move one of the key variables of interest because then you cannot possibly attain your research goals. So you would remove only non-key variables. But why were they there in the first place? If your analyses are well designed, they were included because they are important confounders. So removing them just means providing an analysis that fails to adjust for an important confounder and is therefore invalid. What else might you do? You can get more data: a large enough data set will, indeed, solve the problem. (Goldberger even says that multicollinearity should actually be called hyponumerosity.) The problem is that in most of these situations the data set must be massively larger than what you have, and, in practice it is rarely possible, and if possible, seldom feasible, to get that much more data. What else might you do? You can just start over from scratch with a new design that will break the confounding, such as stratified or matched sampling. You might even be able to use some of your existing data for that, but you will undoubtedly need to supplement it with much new data. And the use of stratified or matched sampling makes the process of data collection more onerous because you will have to turn away otherwise eligible participants/firms/units of analysis because they don't fit the proposed matching/stratification scheme and await the accrual of those who do. So this is typically feasible, but very difficult and what it really amounts to is abandoning the project and starting a new one.

          If you really want to measure colinearity and you want to use vif, you don't have to do a bunch of regressions. One will suffice. Just run exactly what you did in -gllamm- but use -regress-. That is,
          Code:
          regress Garden_Active_ i.Year LCommunity_Garden LMarket_Garden///
          LPickups_ i.r_L_volunteer_3_max LUR_Curr_Yr_or_Prior_
          estat vif
          It is perfectly OK that some of these variables are categorical. However, in interpreting the vif results, you have to bear in mind that high levels of multicollinearity among the indicators of the levels of a polytomous variable are expected, normal, and not problematic.


          Comment


          • #6
            Thank you Clyde and David for your responses!

            Comment

            Working...
            X