stcox Enormous Hazard Ratios (& SE)?

Victor Cruz

Join Date: Jun 2014
Posts: 54

stcox Enormous Hazard Ratios (& SE)?

12 May 2016, 06:33

Dear Statalist community,

I would appreciate feedback on why I might be getting such big Hazard Ratio (and corresponding robust Standard Error) for the variable called "V2" (significant at the 0.1 Level) below (subject-year observations starting in 2000 and ending in 2014):

Code:

stcox V1 V2 V3 V4 V5 V6 V7 V8 V9, cluster(stateid)
         failure _d:  CU_intro_censor
   analysis time _t:  CU_intro_durat
                 id:  stateid

Iteration 0:   log pseudolikelihood = -38.498425
Cox regression -- Breslow method for ties
No. of subjects      =           32                Number of obs   =       409
No. of failures      =           14
Time at risk         =          409
                                                   Wald chi2(9)    =     57.10
Log pseudolikelihood =   -38.498425                Prob > chi2     =    0.0000
                               (Std. Err. adjusted for 32 clusters in stateid)
------------------------------------------------------------------------------
             |               Robust
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          V1 |    1.31167   .1280843     2.78   0.005      1.08319    1.588342
          V2 |   29249.42   164285.9     1.83   0.067     .4843432    1.77e+09
          V3 |   1.266819   .2825774     1.06   0.289     .8181731    1.961481
          V4 |   2.542591   1.442277     1.65   0.100     .8364406    7.728905
          V5 |   3.529885   3.106127     1.43   0.152     .6291362    19.80507
          V6 |   .0101404   .0452379    -1.03   0.303     1.62e-06     63.5887
          V7 |   1.010953   .0321171     0.34   0.732     .9499248    1.075903
          V8 |   1.000334   .0001594     2.10   0.036     1.000022    1.000647
          V9 |   1.053155   .0487628     1.12   0.263     .9617902    1.153199
------------------------------------------------------------------------------

Previously, in http://www.statalist.org/forums/foru...=1463052080277 #2, it is suggested that after stsetting:

Code:

sts graph, strata(V2) risktable
dotplot _t if _d, over(V2)

But in Stata 13 when I type that, I get the following error:

Code:

 strata() requires adjustfor(); perhaps you mean by()
r(198);

in http://www.stata.com/statalist/archi.../msg01087.html it is suggested that the ratio of the number of failures to the number of predictors should be no more than 5:1; in my case it is 14:9. However, I have seen in previous studies (social science) worse ratios (models with more covariates and less failures)

By the way, neither the global nor the covariate specific Proportionality Assumption test(s) are violated:

Code:

estat phtest, detail

      Test of proportional-hazards assumption

      Time:  Time
      ----------------------------------------------------------------
                  |       rho            chi2       df       Prob>chi2
      ------------+---------------------------------------------------
      V1          |     -0.07008         0.09        1         0.7685
      V2          |      0.04479         0.03        1         0.8714
      V3          |      0.10258         0.12        1         0.7336
      V4          |     -0.17487         0.35        1         0.5554
      V5          |     -0.03148         0.03        1         0.8608
      V6          |      0.00600         0.00        1         0.9779
      V7          |      0.01503         0.01        1         0.9382
      V8          |      0.13468         0.24        1         0.6227
      V9          |      0.04130         0.05        1         0.8272
      ------------+---------------------------------------------------
      global test |                      1.02        9         0.9994

Only V4 & V5 are dummies, the rest are continuous vars.
V2 was subjected to linear Interpolation (I only had data for three Points, censi 2000 & 2010 & intercensus 2005). V7 & V9 were interpolated similarly

Question also available here http://stackoverflow.com/questions/3...zard-ratios-se

Last edited by Victor Cruz; 12 May 2016, 07:13.

Victor Cruz

Tags: hazard ratio, interpolation, robust standard error, STATA 13, stcox

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

12 May 2016, 10:15

I apologize for omitting the adjustfor() option: it is required when sts graph includes the strata() option.

However you have misread the statement in my previous post: it was that the ratio of failures to parameters be at least 5:1, not no more than 5:1. The simulations that suggested that guideline were for an analysis of one primary predictor of interest, with the others being confounders whose coefficients are unimportant. In other situations the ratio should be at least 10:1

Last edited by Steve Samuels; 12 May 2016, 10:23.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

12 May 2016, 11:47

My educated guess is that the large ostensible magnitude of your hazard ratio for V2 reflects the relative magnitude of V2. You need to understand that the hazard ratio for V2 is ratio of the hazard when V2 is increased by 1. A little math shows that the hazard ratio for a variable is just e to the power of its coefficient estimate.

Run the following example and review the output. You will see that dividing the load by 10 multiplies the coefficient estimate by 10, which produces an enormous increase in the hazard ratio. What this tells us about your case is that V2 is probably small enough that an increase of 1 in its value is unrealistic, and this yields an unrealistic hazard ratio.

Code:

clear webuse kva generate load10 = load/10 stset failtime stcox load bearings stcox, nohr stcox load10 bearings stcox, nohr

Added after further reflection: it's not so much the magnitude of V2, but rather, the range: if a difference of 1 between two values of V2 is unrealistically large, then the hazard ratio will be unrealistically large. So if the values of V2 were all between 9.99 and 10.01 it seems likely that its hazard ratio as reported by stcox would be very large.

Last edited by William Lisowski; 12 May 2016, 12:09.
1 like
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

12 May 2016, 13:02

The first post you linked to said

The number of failures per variable should be at least 5,

However n the thread in which the second post appeared, I did originally say "no more than", but at the top of that post was this correction:

"but I would say that the ratio of the number of failures to the number of predictors should be no more than 5:1"

That should be "no less than 5:1"

Last edited by Steve Samuels; 12 May 2016, 13:47.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Victor Cruz

Join Date: Jun 2014

Posts: 54
#5

17 May 2016, 01:48

Originally posted by Steve Samuels View Post

I apologize for omitting the adjustfor() option: it is required when sts graph includes the strata() option.

However you have misread the statement in my previous post: it was that the ratio of failures to parameters be at least 5:1, not no more than 5:1. The simulations that suggested that guideline were for an analysis of one primary predictor of interest, with the others being confounders whose coefficients are unimportant. In other situations the ratio should be at least 10:1

Thanks Steve Samuels;

So what would be the commands to produce the graph that the user you responded to uploaded in that post? What should one focus on/look for in such graph?

Victor Cruz
Comment
Victor Cruz

Join Date: Jun 2014

Posts: 54
#6

17 May 2016, 01:54

Originally posted by William Lisowski View Post

My educated guess is that the large ostensible magnitude of your hazard ratio for V2 reflects the relative magnitude of V2. You need to understand that the hazard ratio for V2 is ratio of the hazard when V2 is increased by 1. A little math shows that the hazard ratio for a variable is just e to the power of its coefficient estimate.

Run the following example and review the output. You will see that dividing the load by 10 multiplies the coefficient estimate by 10, which produces an enormous increase in the hazard ratio. What this tells us about your case is that V2 is probably small enough that an increase of 1 in its value is unrealistic, and this yields an unrealistic hazard ratio.

Code:

clear webuse kva generate load10 = load/10 stset failtime stcox load bearings stcox, nohr stcox load10 bearings stcox, nohr

Added after further reflection: it's not so much the magnitude of V2, but rather, the range: if a difference of 1 between two values of V2 is unrealistically large, then the hazard ratio will be unrealistically large. So if the values of V2 were all between 9.99 and 10.01 it seems likely that its hazard ratio as reported by stcox would be very large.

Thanks William Lisowski.

I contacted my co-author on this and she pointed out that she made a mistake in the description of V2 that she shared with me, i.e., she didn't comment that V2 was in decimals, e.g., 0.XX instead of XX.XX (percentage Format); whereas the rest of the variables was in percentage format. Once I corrected this (multiply V2 times 100), the hazard Ratio produced made sense.

Victor Cruz
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

17 May 2016, 04:35

You are asking for a command that you have already quoted (and which was given in the linked thread).

Code:

dotplot _t if _d, over(V2)

Look for "interesting" patterns, if any. These will differ from problem to problem. In that case, two things stood out: 1) the small number of distinct event times; 2) for the poster, a surprise as to the treatment group in which the earliest failures occurred. dotplot produces side-by-side histograms, so you can compare not only the general shape of the plotted distributins, but also locate of extreme points, modes and gaps.

It's good that you were able to correct the erroneously scaled variable. That the hazard ratios now "make sense" doesn't change the fact that they are unreliable due to severe overfitting

Last edited by Steve Samuels; 17 May 2016, 04:56.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Victor Cruz

Join Date: Jun 2014

Posts: 54
#8

27 Jun 2016, 12:25

Thanks, Steve Samuels!

Victor Cruz
Comment

Announcement

stcox Enormous Hazard Ratios (& SE)?

Comment

Comment

Comment

Comment

Comment

Comment

Comment