Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multinomial logistic model with impupted data - Determine variance inflation factor (VIF) or other measure of collinearity

    Dear Statalisters,

    I have the following issue I hope you can help me out with.
    I'm using Stata 15.1
    Multinomial Logistic Model with imputed data
    398 regions of 15 years, thus 5,970 observations

    I want to get the variance inflation factor (VIF)

    mi estimate: mlogit CLUBS LFPR EMPL_AQ EAST URBAN GDPCIN GDPDENSA, base(1)
    estat vif

    However estat vif is not valid after mlogit, because it is not linear.


    The second option I tried was with the ado - colling pkg
    Which does not require regression results to determine the VIF (amongst other diagnostics)

    collin LFPR EMPL_AQ EAST URBAN GDPCIN GDPDENSA

    The problem I encountered here, was the number of observations increased to 68,058, which seems incorrect to me.
    The collin doesn't work in combination with mi estimate and mi estimate, cmdok.

    Is it possible to the VIF with multiple imputed variables with the correct number of observations or can I use the results from the estimated VIF with the large number of observations?

    Thanks in advance
    Last edited by Jantje Beton; 07 Jan 2018, 04:59.

  • #2
    Jantje:
    -estat vif- would not work after -mlogit- anyhow, as you can see from the following toy-example:
    Code:
    . use http://www.stata-press.com/data/r15/sysdsn1.dta
    (Health insurance data)
    
    . mlogit insure age i.nonwhite
    
    Iteration 0:   log likelihood = -555.85446
    Iteration 1:   log likelihood =  -549.9403
    Iteration 2:   log likelihood =  -549.9329
    Iteration 3:   log likelihood =  -549.9329
    
    Multinomial logistic regression                 Number of obs     =        615
                                                    LR chi2(4)        =      11.84
                                                    Prob > chi2       =     0.0186
    Log likelihood =  -549.9329                     Pseudo R2         =     0.0107
    
    ------------------------------------------------------------------------------
          insure |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Indemnity    |  (base outcome)
    -------------+----------------------------------------------------------------
    Prepaid      |
             age |  -.0091542   .0060024    -1.53   0.127    -.0209188    .0026103
      1.nonwhite |    .671617   .2164655     3.10   0.002     .2473523    1.095882
           _cons |   .2198639   .2803975     0.78   0.433     -.329705    .7694328
    -------------+----------------------------------------------------------------
    Uninsure     |
             age |  -.0040487   .0112954    -0.36   0.720    -.0261873    .0180899
      1.nonwhite |   .3804865   .4080687     0.93   0.351    -.4193135    1.180287
           _cons |  -1.757286   .5313508    -3.31   0.001    -2.798715   -.7158579
    ------------------------------------------------------------------------------
    
    . estat vif
    estat vif not valid
    r(321);
    In these instances, it's advisable to look at the CIs and make a judgement call about their width.
    As an aside, please note that towering members of this forum repeatedly warned about the fact that quasi-extreme multicollinearity is often oversold (see also chapter 23 in http://www.hup.harvard.edu/catalog.p...40&content=toc), mainly in the light that there's nothing you can do about that, but change your model specification (if feasible).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment

    Working...
    X