Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Terminating observation within matched pairs of subjects in a matched cohort analysis

    Hi there

    I am carrying out a matched cohort study, where each exposed individual is individually matched (on age, sex GP practice and calendar time) to up to 10 unexposed individuals. The matched sets are identified by the variable setid. I am using a stratified cox model [i.e. stcox exposed, strata(setid)].

    Does anybody know whether STATA terminates follow-up among members of a matched set once one member is no longer under observation? So does STATA count events among the unexposed if they occur AFTER the exposed case is censored?

    I get identical results when I manually censor the unexposed individuals on the date the exposed individual gets censored (i.e. I change their exit date), compared to when I don't manually censor them. However, the number of events and person time the models say they are using is different. The output is below for both scenarios.

    Thanks in advance for your help.

    Best wishes
    Harriet

    -------------------------------------------------------------------------------------------------
    With unexposed individuals manually censored when exposed is censored:

    Failure _d: dementia==1
    Analysis time _t: (doexit-origin)/365.25
    Origin: time doentry
    Enter on or after: time doentry
    Exit on or before: time doexit
    ID variable: id

    Iteration 0: Log likelihood = -5123.2261
    Iteration 1: Log likelihood = -5123.2157
    Iteration 2: Log likelihood = -5123.2157
    Refining estimates:
    Iteration 0: Log likelihood = -5123.2157

    Stratified Cox regression with no ties
    Strata variable: setid

    No. of subjects = 121,369 Number of obs = 121,369
    No. of failures = 2,534
    Time at risk = 464,617.632
    LR chi2(1) = 0.02
    Log likelihood = -5123.2157 Prob > chi2 = 0.8850


    _t Haz. ratio Std. err. z P>z [95% conf. interval]

    exposed 1.008399 .0582552 0.14 0.885 .900448 1.129293

    -----------------------------------
    Using all available data:

    Failure _d: dementia==1
    Analysis time _t: (doexit-origin)/365.25
    Origin: time doentry
    Enter on or after: time doentry
    Exit on or before: time doexit
    ID variable: id

    Iteration 0: Log likelihood = -5123.2261
    Iteration 1: Log likelihood = -5123.2157
    Iteration 2: Log likelihood = -5123.2157
    Refining estimates:
    Iteration 0: Log likelihood = -5123.2157
    Stratified Cox regression with no ties
    Strata variable: setid
    No. of subjects = 121,369 Number of obs = 121,369
    No. of failures = 2,534
    Time at risk = 464,617.632
    LR chi2(1) = 0.02
    Log likelihood = -5123.2157 Prob > chi2 = 0.8850

    _t Haz. ratio Std. err. z P>z [95% conf. interval]

    exposed 1.008399 .0582552 0.14 0.885 .900448 1.129293





  • #2
    I posted the wrong output for the "all available data" analysis above

    This is the correct output:

    Failure _d: dementia==1
    Analysis time _t: (doexit-origin)/365.25
    Origin: time doentry
    Enter on or after: time doentry
    Exit on or before: time doexit
    ID variable: id

    Iteration 0: Log likelihood = -10896.591
    Iteration 1: Log likelihood = -10896.58
    Iteration 2: Log likelihood = -10896.58
    Refining estimates:
    Iteration 0: Log likelihood = -10896.58

    Stratified Cox regression with Breslow method for ties
    Strata variable: setid

    No. of subjects = 121,369 Number of obs = 121,369
    No. of failures = 6,349
    Time at risk = 820,479.765
    LR chi2(1) = 0.02
    Log likelihood = -10896.58 Prob > chi2 = 0.8850

    ------------------------------------------------------------------------------
    _t | Haz. ratio Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    exposed | 1.008399 .0582552 0.14 0.885 .900448 1.129293
    ------------------------------------------------------------------------------

    Comment


    • #3
      Does anybody know whether STATA terminates follow-up among members of a matched set once one member is no longer under observation?
      No.

      But look at how the partial likelihood is defined for a stratified Cox model: as the product of stratum-specific partial likelihoods https://web.njit.edu/~wguo/Math%2065...xt_Book%5D.pdf

      Intuitively, after the only exposed observation for any given strata is removed from the risk set (because of an event or censoring), the remaining (all unexposed) observations in that stratum carry no further information on the _hazard ratio_ for exposed vs unexposed observations. So, you might as well censor them: it won't change the HR estimate (but note that this would affect, for example, the estimated survival functions -- see toy example below)

      If you ignore the strata you'll get different HRs instead, of course (see toy example below).

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(sid x t d t2 d2)
      1 1 5.5 1 5.5 1
      1 0   1 1   1 1
      1 0   2 1   2 1
      1 0   3 1   3 1
      1 0   4 1   4 1
      1 0   5 1   5 1
      1 0   6 1 5.5 0
      1 0   7 1 5.5 0
      1 0   8 1 5.5 0
      1 0   9 1 5.5 0
      2 1 2.5 1 2.5 1
      2 0 1.2 1 1.2 1
      2 0 2.2 1 2.2 1
      2 0 3.2 1 2.5 0
      2 0 4.2 1 2.5 0
      2 0 5.2 1 2.5 0
      2 0 6.2 1 2.5 0
      2 0 7.2 1 2.5 0
      2 0 8.2 1 2.5 0
      2 0 9.2 1 2.5 0
      end
      
      stset t, fail(d)
      stcox x, strata(sid)
      stcox x
      sts, by(sid) name(g1, replace) xlabel(0/10)
      
      stset t2, fail(d2)
      stcox x, strata(sid)
      stcox x
      sts, by(sid) name(g2, replace) xlabel(0/10)

      Comment


      • #4
        Thank you Andrea. This is very helpful. It seems to me the reported number of failures and person time at risk reported in the stratified cox model estimate is therefore wrong, unless you manually edit the the end dates. Would you agree?

        Comment


        • #5
          I am not sure I agree. Stata is simply summarising whatever outcome data is in your dataset with the total number of events and the sum of the person-time at risk. As the outcome data changes (censoring the unexposed obs), those 2 descriptive statistics will change as well.
          Because of the specific data structure (matched-cohort data with 1 exposed subject) and procedure (stratified PH Cox model), the HR estimate from the 2 versions of the outcome data is the same. But, again, those descriptive statistics are neither right or wrong, they simply follow from the data you decided to use.

          Comment

          Working...
          X