Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variance Inflation Factors (or other collinearity measures) with Survival Analysis

    Hi All,

    I am performing a survival analysis on a group of patients with a few measures derived from imaging to see if those measures predict outcomes.

    I am concerned that many of these measures may be collinear and thus, do not want to include them in the same cox proportional hazards model. I have previously used the variance inflation factor (VIF) as a rough guide of collinearity. Is it possible to measure VIF or an equivalent collinearity parameter in cox PH regression?

    Am I best off just seeing if the variables are correlated using a scatter plot? Do you have any pointers regarding whether it is appropriate to include two potentially collinear variables in a multivariate model?

    Cheers!

  • #2
    Rahul:
    extreme multicollinearity is dealt by Stata automatically.
    Quasi-extreme multicollinearity after -stcox- can be investigated via -estat vce, corr-:
    Code:
    . use "http://www.stata-press.com/data/r14/drugtr.dta", clear
    (Patient Survival in Drug Trial)
    
    . stcox i.drug age
    
             failure _d:  died
       analysis time _t:  studytime
    
    Iteration 0:   log likelihood = -99.911448
    Iteration 1:   log likelihood = -83.551879
    Iteration 2:   log likelihood = -83.324009
    Iteration 3:   log likelihood = -83.323546
    Refining estimates:
    Iteration 0:   log likelihood = -83.323546
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =           48                  Number of obs    =          48
    No. of failures =           31
    Time at risk    =          744
                                                    LR chi2(2)       =       33.18
    Log likelihood  =   -83.323546                  Prob > chi2      =      0.0000
    
    ------------------------------------------------------------------------------
              _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          1.drug |   .1048772   .0477017    -4.96   0.000     .0430057    .2557622
             age |   1.120325   .0417711     3.05   0.002     1.041375     1.20526
    ------------------------------------------------------------------------------
    
    
    . estat vce, corr
    
    Correlation matrix of coefficients of cox model
    
                 |        1.         
            e(V) |     drug       age
    -------------+--------------------
          1.drug |   1.0000          
             age |  -0.2281    1.0000
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,

      Thanks for the informative answer. In interpreting the correlation coefficients for the variables in that model does it come down to a personal threshold for identifying variables as collinear? Eg, I set a threshold of .80 above which I become concerned for collinearity?

      Alternatively, is there a consensus for when collinearity is likely to be a problem?

      Comment


      • #4
        Rahul:
        not to my knowledge, at least.
        Personally, 0.75 triggers a warning chime.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          I don't think there is any simple answer to that question. If you drop variables, you run the risk of omitted variable bias. If the sample is large, multicollinearit may cause more problems than when it is small. See the following, esp. p. 4, for a discussion of multicollineaity and what to do about it.

          http://www3.nd.edu/~rwilliam/stats2/l11.pdf
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment

          Working...
          X