Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stcox Enormous Hazard Ratios (& SE)?

    Dear Statalist community,

    I would appreciate feedback on why I might be getting such big Hazard Ratio (and corresponding robust Standard Error) for the variable called "V2" (significant at the 0.1 Level) below (subject-year observations starting in 2000 and ending in 2014):


    Code:
    stcox V1 V2 V3 V4 V5 V6 V7 V8 V9, cluster(stateid)
             failure _d:  CU_intro_censor
       analysis time _t:  CU_intro_durat
                     id:  stateid
    
    Iteration 0:   log pseudolikelihood = -38.498425
    Cox regression -- Breslow method for ties
    No. of subjects      =           32                Number of obs   =       409
    No. of failures      =           14
    Time at risk         =          409
                                                       Wald chi2(9)    =     57.10
    Log pseudolikelihood =   -38.498425                Prob > chi2     =    0.0000
                                   (Std. Err. adjusted for 32 clusters in stateid)
    ------------------------------------------------------------------------------
                 |               Robust
              _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              V1 |    1.31167   .1280843     2.78   0.005      1.08319    1.588342
              V2 |   29249.42   164285.9     1.83   0.067     .4843432    1.77e+09
              V3 |   1.266819   .2825774     1.06   0.289     .8181731    1.961481
              V4 |   2.542591   1.442277     1.65   0.100     .8364406    7.728905
              V5 |   3.529885   3.106127     1.43   0.152     .6291362    19.80507
              V6 |   .0101404   .0452379    -1.03   0.303     1.62e-06     63.5887
              V7 |   1.010953   .0321171     0.34   0.732     .9499248    1.075903
              V8 |   1.000334   .0001594     2.10   0.036     1.000022    1.000647
              V9 |   1.053155   .0487628     1.12   0.263     .9617902    1.153199
    ------------------------------------------------------------------------------

    Previously, in http://www.statalist.org/forums/foru...=1463052080277 #2, it is suggested that after stsetting:
    Code:
    sts graph, strata(V2) risktable
    dotplot _t if _d, over(V2)
    But in Stata 13 when I type that, I get the following error:
    Code:
     strata() requires adjustfor(); perhaps you mean by()
    r(198);
    in http://www.stata.com/statalist/archi.../msg01087.html it is suggested that the ratio of the number of failures to the number of predictors should be no more than 5:1; in my case it is 14:9. However, I have seen in previous studies (social science) worse ratios (models with more covariates and less failures)

    By the way, neither the global nor the covariate specific Proportionality Assumption test(s) are violated:

    Code:
    estat phtest, detail
    
          Test of proportional-hazards assumption
    
          Time:  Time
          ----------------------------------------------------------------
                      |       rho            chi2       df       Prob>chi2
          ------------+---------------------------------------------------
          V1          |     -0.07008         0.09        1         0.7685
          V2          |      0.04479         0.03        1         0.8714
          V3          |      0.10258         0.12        1         0.7336
          V4          |     -0.17487         0.35        1         0.5554
          V5          |     -0.03148         0.03        1         0.8608
          V6          |      0.00600         0.00        1         0.9779
          V7          |      0.01503         0.01        1         0.9382
          V8          |      0.13468         0.24        1         0.6227
          V9          |      0.04130         0.05        1         0.8272
          ------------+---------------------------------------------------
          global test |                      1.02        9         0.9994
    Only V4 & V5 are dummies, the rest are continuous vars.
    V2 was subjected to linear Interpolation (I only had data for three Points, censi 2000 & 2010 & intercensus 2005). V7 & V9 were interpolated similarly

    Question also available here http://stackoverflow.com/questions/3...zard-ratios-se
    Last edited by Victor Cruz; 12 May 2016, 07:13.
    Victor Cruz

  • #2
    I apologize for omitting the adjustfor() option: it is required when sts graph includes the strata() option.

    However you have misread the statement in my previous post: it was that the ratio of failures to parameters be at least 5:1, not no more than 5:1. The simulations that suggested that guideline were for an analysis of one primary predictor of interest, with the others being confounders whose coefficients are unimportant. In other situations the ratio should be at least 10:1
    Last edited by Steve Samuels; 12 May 2016, 10:23.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      My educated guess is that the large ostensible magnitude of your hazard ratio for V2 reflects the relative magnitude of V2. You need to understand that the hazard ratio for V2 is ratio of the hazard when V2 is increased by 1. A little math shows that the hazard ratio for a variable is just e to the power of its coefficient estimate.

      Run the following example and review the output. You will see that dividing the load by 10 multiplies the coefficient estimate by 10, which produces an enormous increase in the hazard ratio. What this tells us about your case is that V2 is probably small enough that an increase of 1 in its value is unrealistic, and this yields an unrealistic hazard ratio.

      Code:
      clear
      webuse kva
      generate load10 = load/10
      stset failtime
      stcox load bearings
      stcox, nohr
      stcox load10 bearings
      stcox, nohr
      Added after further reflection: it's not so much the magnitude of V2, but rather, the range: if a difference of 1 between two values of V2 is unrealistically large, then the hazard ratio will be unrealistically large. So if the values of V2 were all between 9.99 and 10.01 it seems likely that its hazard ratio as reported by stcox would be very large.
      Last edited by William Lisowski; 12 May 2016, 12:09.

      Comment


      • #4
        The first post you linked to said
        The number of failures per variable should be at least 5,
        However n the thread in which the second post appeared, I did originally say "no more than", but at the top of that post was this correction:

        "but I would say that the ratio of the number of failures to the number of predictors should be no more than 5:1"

        That should be "no less than 5:1"
        Last edited by Steve Samuels; 12 May 2016, 13:47.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Originally posted by Steve Samuels View Post
          I apologize for omitting the adjustfor() option: it is required when sts graph includes the strata() option.

          However you have misread the statement in my previous post: it was that the ratio of failures to parameters be at least 5:1, not no more than 5:1. The simulations that suggested that guideline were for an analysis of one primary predictor of interest, with the others being confounders whose coefficients are unimportant. In other situations the ratio should be at least 10:1
          Thanks Steve Samuels;

          So what would be the commands to produce the graph that the user you responded to uploaded in that post? What should one focus on/look for in such graph?
          Victor Cruz

          Comment


          • #6
            Originally posted by William Lisowski View Post
            My educated guess is that the large ostensible magnitude of your hazard ratio for V2 reflects the relative magnitude of V2. You need to understand that the hazard ratio for V2 is ratio of the hazard when V2 is increased by 1. A little math shows that the hazard ratio for a variable is just e to the power of its coefficient estimate.

            Run the following example and review the output. You will see that dividing the load by 10 multiplies the coefficient estimate by 10, which produces an enormous increase in the hazard ratio. What this tells us about your case is that V2 is probably small enough that an increase of 1 in its value is unrealistic, and this yields an unrealistic hazard ratio.

            Code:
            clear
            webuse kva
            generate load10 = load/10
            stset failtime
            stcox load bearings
            stcox, nohr
            stcox load10 bearings
            stcox, nohr
            Added after further reflection: it's not so much the magnitude of V2, but rather, the range: if a difference of 1 between two values of V2 is unrealistically large, then the hazard ratio will be unrealistically large. So if the values of V2 were all between 9.99 and 10.01 it seems likely that its hazard ratio as reported by stcox would be very large.
            Thanks William Lisowski.

            I contacted my co-author on this and she pointed out that she made a mistake in the description of V2 that she shared with me, i.e., she didn't comment that V2 was in decimals, e.g., 0.XX instead of XX.XX (percentage Format); whereas the rest of the variables was in percentage format. Once I corrected this (multiply V2 times 100), the hazard Ratio produced made sense.
            Victor Cruz

            Comment


            • #7
              You are asking for a command that you have already quoted (and which was given in the linked thread).
              Code:
               dotplot _t if _d, over(V2)
              Look for "interesting" patterns, if any. These will differ from problem to problem. In that case, two things stood out: 1) the small number of distinct event times; 2) for the poster, a surprise as to the treatment group in which the earliest failures occurred. dotplot produces side-by-side histograms, so you can compare not only the general shape of the plotted distributins, but also locate of extreme points, modes and gaps.

              It's good that you were able to correct the erroneously scaled variable. That the hazard ratios now "make sense" doesn't change the fact that they are unreliable due to severe overfitting
              Last edited by Steve Samuels; 17 May 2016, 04:56.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Thanks, Steve Samuels!
                Victor Cruz

                Comment

                Working...
                X