Cox PH regression: outcome with HR too small or too large.

eric some

Join Date: May 2014
Posts: 29

Cox PH regression: outcome with HR too small or too large.

21 Sep 2015, 07:59

I ran a Cox model with TVC in Stata 13. The outcome variable was "EBFPLUS" (exclusive and predominant breastfeeding) and the main exposure variable is "rand" (trial arm). I got hazard ratios too big or too small. Is that normal? Otherwise, what does it imply for my model?
Please see command and output below.
Thanks for your help!
Best Regards
Eric

Code:

use NCOX1_W22, clear

. drop if EBFPLUS_exit>=.
(90 observations deleted)

. keep if country==3
(5628 observations deleted)

.                           stset EBFPLUS_exit, failure(EBFPLUS_dth) origin(date_csdi) enter(date_csdi) scale(7)

     failure event:  EBFPLUS_dth != 0 & EBFPLUS_dth < .
obs. time interval:  (origin, EBFPLUS_exit]
 enter on or after:  time date_csdi
 exit on or before:  failure
    t for analysis:  (time-origin)/7
            origin:  time date_csdi

------------------------------------------------------------------------------
     1632  total observations
        0  exclusions
------------------------------------------------------------------------------
     1632  observations remaining, representing
       96  failures in single-record/single-failure data
 30039.43  total analysis time at risk and under observation
                                              at risk from t =         0
                                   earliest observed entry t =         0
                                        last observed exit t =  22.71429

. 
.  stcox i.rand i.tertile i.agegroup i.educ i.marital i.occup i.parity  i.deliv i.bfinitime,  ///
>  tvc(rand tertile agegroup educ marital occup parity deliv bfinitime)  texp( ln(_t) )

         failure _d:  EBFPLUS_dth
   analysis time _t:  (EBFPLUS_exit-origin)/7
             origin:  time date_csdi
  enter on or after:  time date_csdi

Iteration 0:   log likelihood = -657.09909
Iteration 1:   log likelihood = -647.11305
Iteration 2:   log likelihood = -618.76367
Iteration 3:   log likelihood = -587.38412
Iteration 4:   log likelihood =  -583.3244
Iteration 5:   log likelihood = -559.65771
Iteration 6:   log likelihood = -557.18193
Iteration 7:   log likelihood = -556.15539
Iteration 8:   log likelihood = -555.82087
Iteration 9:   log likelihood = -555.77453
Iteration 10:  log likelihood = -555.77145
Iteration 11:  log likelihood = -555.77108
Iteration 12:  log likelihood = -555.77095
Iteration 13:  log likelihood =  -555.7709
Iteration 14:  log likelihood = -555.77088
Iteration 15:  log likelihood = -555.77087
Iteration 16:  log likelihood = -555.77087
Iteration 17:  log likelihood = -555.77087
Iteration 18:  log likelihood = -555.77087
Iteration 19:  log likelihood = -555.77087
Iteration 20:  log likelihood = -555.77087
Iteration 21:  log likelihood = -555.77087
Iteration 22:  log likelihood = -555.77087
Iteration 23:  log likelihood = -555.77087
Iteration 24:  log likelihood = -555.77087
Iteration 25:  log likelihood = -555.77087
Iteration 26:  log likelihood = -555.77087
Iteration 27:  log likelihood = -555.77087
Iteration 28:  log likelihood = -555.77087
Iteration 29:  log likelihood = -555.77087
Iteration 30:  log likelihood = -555.77087
Iteration 31:  log likelihood = -555.77087
Iteration 32:  log likelihood = -555.77087
Refining estimates:
Iteration 0:   log likelihood = -555.77087
Iteration 1:   log likelihood = -555.77087

Cox regression -- Breslow method for ties

No. of subjects =         1632                     Number of obs   =      1632
No. of failures =           96
Time at risk    =  30039.42857
                                                   LR chi2(22)     =    202.66
Log likelihood  =   -555.77087                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
main         |
      1.rand |   .0000863   .0002745    -2.94   0.003     1.70e-07    .0439029
             |
     tertile |
          2  |    10935.6    30435.3     3.34   0.001      46.7552     2557734
          3  |   1.96e+09   1.05e+10     3.98   0.000     52146.87    7.37e+13
             |
    agegroup |
          3  |   25992.19   75550.56     3.50   0.000     87.23405     7744612
          5  |   8.15e+07   4.55e+08     3.26   0.001     1443.403    4.60e+12
             |

Tags: None

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

21 Sep 2015, 16:16

I have two thoughts.

1. The large or small HRs arise because subjects in some groups tended to die at much earlier times than those in other groups. You can examine this after running your stset statement e.g. for the rand variable

Code:

sts graph, strata(rand) risktable dotplot _t if _d, over(rand)

2. You have fit too many covariates and you haven't even considered interactions and non-proportionality yet. The number of failures per variable should be at least 5, but if some categories are very small (<10% of observations), then more.

Reference: Vittinghoff, Eric, and Charles E McCulloch. 2007. Relaxing the rule of ten events per variable in logistic and Cox regression. American journal of epidemiology 165, no. 6: 710-718.

Last edited by Steve Samuels; 21 Sep 2015, 16:25.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
eric some

Join Date: May 2014

Posts: 29
#3

22 Sep 2015, 00:54

Dear Steven, thanks so much!
here is attached the plot I got after running

Code:

dotplot _t if _d, over(rand)

I am not sure about my interpretation. Do you think your first hypothesis still hold?
What if I decide to drop some covariates to relieve my model? Is there any strategy to select those variables to be dropped or have I to decide only in accordance to what seems important to me and according to the litterature?

Attached Files

Graph.gph (6.7 KB, 1 view)
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

22 Sep 2015, 14:27

The only early failures are for rand=0, so that is consistent with the hypothesis. Can you explain that pattern?

However there is another issue: It looks like there are only eight distinct failure times. Exactly how many are there? What accounts for this phenomenon? It is very troublesome.

Such heavily tied data invalidate the standard errors and p-values reported by stcox. You should try the exactm option in stcox.

As to model selection, be guided by what is important to you and also what the literature suggests. Start with simple models. It is unlikely that you can fit many interactions because there will many covariate combinations with no failures.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
eric some

Join Date: May 2014

Posts: 29
#5

22 Sep 2015, 16:16

Thanks so much Steven! Much helpful. Actually I am surprised by the pattern. I would have rather expected early failures for rand=1. Need to think of it and check out why.

There are actually 7 failure periods because it was a cohort study with follow up visit every month after the first two visits on Day 7 and week 2 and the end of the study (cut of date for the current analysis) was week 22. so I guess the phenomenon is understandable.

tvc and exactm cannot be simultanously used as options in the stcox. If I try the model with the exactm option, what will happen knowing that the hazards are not proportional in this dataset? Is the exactm option able to handle also the time varying covariate issue? By the way, I have tried with the exactm option without the tvc and the outcome appears nicer. However, is it valid?
Thanks!
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

23 Sep 2015, 19:22

With only eight periods, you need a grouped/discrete model, Eric, not a continuous model (stcox). See the materials on Stephen Jenkins's website https://www.iser.essex.ac.uk/resourc...sis-with-stata , especially the Discrete model lesson 6 and the draft book manuscript.

The discrete analog of the proportional hazards Cox model in Stata is cloglog. However the hazards will not be proportional for the "rand" variable, as there are no early deaths in one group..

There is another serious lesson here. You apparently started with stcox rather than examining the data descriptively, with sts graph, for example. A good analysis strategy is exactly the reverse.

Last edited by Steve Samuels; 23 Sep 2015, 19:47.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
eric some

Join Date: May 2014

Posts: 29
#7

24 Sep 2015, 09:01

Thanks so much Steven.
It has been so helpful!
Best Regards
Eric
Comment
Victor Cruz

Join Date: Jun 2014

Posts: 54
#8

12 May 2016, 06:34

Originally posted by Steve Samuels View Post

I have two thoughts.

1. The large or small HRs arise because subjects in some groups tended to die at much earlier times than those in other groups. You can examine this after running your stset statement e.g. for the rand variable

Code:

sts graph, strata(rand) risktable dotplot _t if _d, over(rand)

2. You have fit too many covariates and you haven't even considered interactions and non-proportionality yet. The number of failures per variable should be at least 5, but if some categories are very small (<10% of observations), then more.

Reference: Vittinghoff, Eric, and Charles E McCulloch. 2007. Relaxing the rule of ten events per variable in logistic and Cox regression. American journal of epidemiology 165, no. 6: 710-718.

Hi,
after stsetting my dataset, the code Steve Samuels provided above Returns the following error in Stata 13.

Code:

strata() requires adjustfor(); perhaps you mean by() r(198);

I am also getting one hazard ratio and SE for a variable (called V2 below) from a cox model as big as 29,249.42 & 164,285.91, respectively, significant at the 0.1 Level, and have not found a pausible cause for this.

I wrote a post on this:http://www.statalist.org/forums/foru...zard-ratios-se and I would appreciate Feedback on it.

Victor Cruz
Comment

Announcement

Cox PH regression: outcome with HR too small or too large.

Comment

Comment

Comment

Comment

Comment

Comment

Comment