Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cox PH regression: outcome with HR too small or too large.

    I ran a Cox model with TVC in Stata 13. The outcome variable was "EBFPLUS" (exclusive and predominant breastfeeding) and the main exposure variable is "rand" (trial arm). I got hazard ratios too big or too small. Is that normal? Otherwise, what does it imply for my model?
    Please see command and output below.
    Thanks for your help!
    Best Regards
    Eric

    Code:
    use NCOX1_W22, clear
    
    . drop if EBFPLUS_exit>=.
    (90 observations deleted)
    
    . keep if country==3
    (5628 observations deleted)
    
    .                           stset EBFPLUS_exit, failure(EBFPLUS_dth) origin(date_csdi) enter(date_csdi) scale(7)
    
         failure event:  EBFPLUS_dth != 0 & EBFPLUS_dth < .
    obs. time interval:  (origin, EBFPLUS_exit]
     enter on or after:  time date_csdi
     exit on or before:  failure
        t for analysis:  (time-origin)/7
                origin:  time date_csdi
    
    ------------------------------------------------------------------------------
         1632  total observations
            0  exclusions
    ------------------------------------------------------------------------------
         1632  observations remaining, representing
           96  failures in single-record/single-failure data
     30039.43  total analysis time at risk and under observation
                                                  at risk from t =         0
                                       earliest observed entry t =         0
                                            last observed exit t =  22.71429
    
    . 
    .  stcox i.rand i.tertile i.agegroup i.educ i.marital i.occup i.parity  i.deliv i.bfinitime,  ///
    >  tvc(rand tertile agegroup educ marital occup parity deliv bfinitime)  texp( ln(_t) )
    
             failure _d:  EBFPLUS_dth
       analysis time _t:  (EBFPLUS_exit-origin)/7
                 origin:  time date_csdi
      enter on or after:  time date_csdi
    
    Iteration 0:   log likelihood = -657.09909
    Iteration 1:   log likelihood = -647.11305
    Iteration 2:   log likelihood = -618.76367
    Iteration 3:   log likelihood = -587.38412
    Iteration 4:   log likelihood =  -583.3244
    Iteration 5:   log likelihood = -559.65771
    Iteration 6:   log likelihood = -557.18193
    Iteration 7:   log likelihood = -556.15539
    Iteration 8:   log likelihood = -555.82087
    Iteration 9:   log likelihood = -555.77453
    Iteration 10:  log likelihood = -555.77145
    Iteration 11:  log likelihood = -555.77108
    Iteration 12:  log likelihood = -555.77095
    Iteration 13:  log likelihood =  -555.7709
    Iteration 14:  log likelihood = -555.77088
    Iteration 15:  log likelihood = -555.77087
    Iteration 16:  log likelihood = -555.77087
    Iteration 17:  log likelihood = -555.77087
    Iteration 18:  log likelihood = -555.77087
    Iteration 19:  log likelihood = -555.77087
    Iteration 20:  log likelihood = -555.77087
    Iteration 21:  log likelihood = -555.77087
    Iteration 22:  log likelihood = -555.77087
    Iteration 23:  log likelihood = -555.77087
    Iteration 24:  log likelihood = -555.77087
    Iteration 25:  log likelihood = -555.77087
    Iteration 26:  log likelihood = -555.77087
    Iteration 27:  log likelihood = -555.77087
    Iteration 28:  log likelihood = -555.77087
    Iteration 29:  log likelihood = -555.77087
    Iteration 30:  log likelihood = -555.77087
    Iteration 31:  log likelihood = -555.77087
    Iteration 32:  log likelihood = -555.77087
    Refining estimates:
    Iteration 0:   log likelihood = -555.77087
    Iteration 1:   log likelihood = -555.77087
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =         1632                     Number of obs   =      1632
    No. of failures =           96
    Time at risk    =  30039.42857
                                                       LR chi2(22)     =    202.66
    Log likelihood  =   -555.77087                     Prob > chi2     =    0.0000
    
    ------------------------------------------------------------------------------
              _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    main         |
          1.rand |   .0000863   .0002745    -2.94   0.003     1.70e-07    .0439029
                 |
         tertile |
              2  |    10935.6    30435.3     3.34   0.001      46.7552     2557734
              3  |   1.96e+09   1.05e+10     3.98   0.000     52146.87    7.37e+13
                 |
        agegroup |
              3  |   25992.19   75550.56     3.50   0.000     87.23405     7744612
              5  |   8.15e+07   4.55e+08     3.26   0.001     1443.403    4.60e+12
                 |

  • #2
    I have two thoughts.

    1. The large or small HRs arise because subjects in some groups tended to die at much earlier times than those in other groups. You can examine this after running your stset statement e.g. for the rand variable
    Code:
    sts graph, strata(rand) risktable
    dotplot _t if _d, over(rand)
    2. You have fit too many covariates and you haven't even considered interactions and non-proportionality yet. The number of failures per variable should be at least 5, but if some categories are very small (<10% of observations), then more.


    Reference: Vittinghoff, Eric, and Charles E McCulloch. 2007. Relaxing the rule of ten events per variable in logistic and Cox regression. American journal of epidemiology 165, no. 6: 710-718.

    Last edited by Steve Samuels; 21 Sep 2015, 16:25.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Dear Steven, thanks so much!
      here is attached the plot I got after running
      Code:
      dotplot _t if _d, over(rand)
      I am not sure about my interpretation. Do you think your first hypothesis still hold?
      What if I decide to drop some covariates to relieve my model? Is there any strategy to select those variables to be dropped or have I to decide only in accordance to what seems important to me and according to the litterature?

      Attached Files

      Comment


      • #4
        The only early failures are for rand=0, so that is consistent with the hypothesis. Can you explain that pattern?

        However there is another issue: It looks like there are only eight distinct failure times. Exactly how many are there? What accounts for this phenomenon? It is very troublesome.

        Such heavily tied data invalidate the standard errors and p-values reported by stcox. You should try the exactm option in stcox.


        As to model selection, be guided by what is important to you and also what the literature suggests. Start with simple models. It is unlikely that you can fit many interactions because there will many covariate combinations with no failures.

        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Thanks so much Steven! Much helpful. Actually I am surprised by the pattern. I would have rather expected early failures for rand=1. Need to think of it and check out why.

          There are actually 7 failure periods because it was a cohort study with follow up visit every month after the first two visits on Day 7 and week 2 and the end of the study (cut of date for the current analysis) was week 22. so I guess the phenomenon is understandable.

          tvc and exactm cannot be simultanously used as options in the stcox. If I try the model with the exactm option, what will happen knowing that the hazards are not proportional in this dataset? Is the exactm option able to handle also the time varying covariate issue? By the way, I have tried with the exactm option without the tvc and the outcome appears nicer. However, is it valid?
          Thanks!

          Comment


          • #6
            With only eight periods, you need a grouped/discrete model, Eric, not a continuous model (stcox). See the materials on Stephen Jenkins's website https://www.iser.essex.ac.uk/resourc...sis-with-stata , especially the Discrete model lesson 6 and the draft book manuscript.

            The discrete analog of the proportional hazards Cox model in Stata is cloglog. However the hazards will not be proportional for the "rand" variable, as there are no early deaths in one group..

            There is another serious lesson here. You apparently started with stcox rather than examining the data descriptively, with sts graph, for example. A good analysis strategy is exactly the reverse.
            Last edited by Steve Samuels; 23 Sep 2015, 19:47.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Thanks so much Steven.
              It has been so helpful!
              Best Regards
              Eric

              Comment


              • #8
                Originally posted by Steve Samuels View Post
                I have two thoughts.

                1. The large or small HRs arise because subjects in some groups tended to die at much earlier times than those in other groups. You can examine this after running your stset statement e.g. for the rand variable
                Code:
                sts graph, strata(rand) risktable
                dotplot _t if _d, over(rand)
                2. You have fit too many covariates and you haven't even considered interactions and non-proportionality yet. The number of failures per variable should be at least 5, but if some categories are very small (<10% of observations), then more.


                Reference: Vittinghoff, Eric, and Charles E McCulloch. 2007. Relaxing the rule of ten events per variable in logistic and Cox regression. American journal of epidemiology 165, no. 6: 710-718.
                Hi,
                after stsetting my dataset, the code Steve Samuels provided above Returns the following error in Stata 13.

                Code:
                 strata() requires adjustfor(); perhaps you mean by()
                r(198);
                I am also getting one hazard ratio and SE for a variable (called V2 below) from a cox model as big as 29,249.42 & 164,285.91, respectively, significant at the 0.1 Level, and have not found a pausible cause for this.

                I wrote a post on this:http://www.statalist.org/forums/foru...zard-ratios-se and I would appreciate Feedback on it.
                Victor Cruz

                Comment

                Working...
                X