Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survival Analysis: Setting up a conditional risk set model with multiple failures and a time-dependent covariate

    Dear Stata users

    I have a specific question regarding a survival analysis with multiple failure and a time-dependent covariate, as well as multiple cluster levels. I had a proper look at the manuals of stsum, stset and stcox, and I am now somewhat confused by two different examples on survival analysis with with multiple failures I found in the Stata manual and the STB (see examples A and B below). I'm using Stata 13.

    Description of my data:
    I do have a multiple failure data structure, with 1000 cows (ID’s) that have multiple number of calving intervals (time from one calving date to the next calving date). I also have a time-dependent covariate 'period' (marked in blue), where observations have been split on 01jul2012. Data are left censored on 01jan2010, and stcox will be run after the stset command.
    I’m particularly interested whether calving intervals differ over the 2 time periods, and whether they are affected by a few other predictor variables x1-x5

    This is an extraction of two individuals in my data set. Start1 and Stop2 were the initial time variables, transformed into Start_gap and Stop_gap for the conditional risk set model as described by Mario Cleves in STB-49, page 38, where time from previous event is being used.
    For the purpose of my research question it makes sense to measure time to each event from the time of the previous event.

    Click image for larger version

Name:	statalist.JPG
Views:	1
Size:	31.4 KB
ID:	1292843


    A) Example 7 in the stcox manual, referring to an example in stsum:
    Following this example, I would have to use Start_gap and Stop_gap in my data set, and create nf and newid according to the example. In the stset command I would use id(newid) and adjust for clustering in the stcox command, using vce(cluster id).

    Code:
    stset t, id(newid) failure(d) time0(newt0) noshow
    stcox ...,vce(cluster id)
    Question1 to this procedure: How could I include the fact, that ID’s(cows) are also clustered within herds? Is there a possibility to add an additional level?
    Question2: I do include a time-dependent covariate ‘period’, so for a split(or expanded) observation, would both observations have the same nf and newid?

    B) Example described by Mario Cleves on multiple failure-time data, STB-49, page 38(3.2.4)):

    Code:
    stset time, fail(status) exit(futime) enter(time0)
    stcox...., strata(str) cluster(id)
    In this example, the clock is set zero after each sample, what corresponds to the variables Start_gap and Stop_gap in my data set as well. Like the example above, it foucuses on the time between two events.
    Questions: What’s the exact differences between the two approaches A and B? In example B the variable str is included, what to my understanding corresponds to the variable nf in example A, right? But the variable id is handled quite differently, what's the reason for this?
    As well as above, I want to include a time-dependent covariate, does it mean as well that ‘strata’ would contain the same number for 2 split(expanded) observations?
    And again, would it be possible to add an additional level(herds) as well?



    Many thanks for your help and inputs!

    ​Isabel

  • #2
    I can attempt to answer some of your questions, Isabel. I'm not sure what you meant by saying that " ‘strata’ would contain the same number for 2 split(expanded) observations", so I won't address that.

    * The difference between the two sources:

    1. The strata(() option in Cleve's FAQ allows for a different distributional shape (i.e. Survival curve, hazard function) for each failure order. The example in the stsum model creates period as a numeric covariate.

    2.The different cluster options reflect changes in stcox between 1999, when Cleves first wrote his FAQ, and now. Cleve's version is actually still legal.

    Multilevel Herd effects

    1. Add shared(herd) as an option to stcox.

    2. gsem can fit multi-level mixed poisson models. There is an equivalence between the the Poisson and exponential distributions. Therefore you can fit a piecewise exponential survival model with gsem. See: http://www.stata.com/statalist/archi.../msg00841.html. This is a very flexible model, since the hazard function is constant only within short intervals.

    In either case, you could add herd characteristics to your model.

    • Some comments/questions:

    1. You refer to "Data are left censored on 01jan2010. This doesn't look like censorship to me. You will have some intervals between calving which will cross the time that defines your group variable. Thus group membership will also be time-varying.

    2. Were there an calvings for each pair not in the data? If so, then "period" is not a measure of biological age.
    Last edited by Steve Samuels; 30 Apr 2015, 16:18.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      The last question should have been "Were there prior calvings for each pair not in the data? If so, then "period" is not a measure of biological age. "
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        I've misused the word "period", which defines your comparison variable. The stsum approach assigns calving-order, not your period variable, as the covariate.


        About your analysis: I'm not sure that a survival model is the only possible approach. One that comes to mind: analyze inter-calving intervals with a parametric multil-level regression model, dropping intervals that span both periods. You would need to model trends in calendar time in the first period and compare observations in the second period to those extrapolated from time trends in the first. Otherwise "period" would be confounded with calendar trends.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Dear Steve,

          thank you so much for having a look into this, I really appreciate your feedback.

          Click image for larger version

Name:	statalist2.JPG
Views:	1
Size:	40.2 KB
ID:	1293081


          1) What I meant by " ‘strata’ would contain the same number for 2 split(expanded) observations":
          I updated my example table above with observation numbers. The question was whether it was correct to assign the same nfailures() number (as in example 7 in the stcox manual) to the observations3/4 or 5/6, as both of them represent one calving interval each but are expanded to 2 observations because of the splitting on 01jul2012 into two sub-periods. Meaning that then e.g. both observation 3 and 4 had the value "2" assigned for the variable 'nf'.


          2) Considering the left censoring on 01jan2010:
          For some animals I have data before 01jan2010, for some I don't (e.g. younger animals, borne after 01jan2010 like animal number 143). I updated my example table above as well with the variable 'calving date 2,' what should make the intervals a bit more clear. Animal 55 was calving on 11sep2009, but because of the censoring only the interval from 01jan2010 to the next calving date on 02sep2010 is considered (244 days in stop_gap).


          3) Alternative models:
          I was also thinking about a "simple" regression analysis. The problem is that I would have to run a multinomial or ordered logistic regression model with eg. shortened, normal and prolonged calving intervals, as the distribution of calving interval is extremely skewed and even with transformation I couldn't get it close to a normal distribution.
          Also I would loose some information, as in the survival analysis we actually intended to even include 3 periods if possible, splitting the data on 01jul2012 as well as on 31dec2012. Period 2 (01jul2012 - 31dec2012 is acutall the "real" interesting one), but that could only be assessed in survival analysis but not in normal regression because of all the overlapping. But I might get back to this simplified approach if necessary.



          I mostly struggle with the code because in my example I have a combination of several issues (censoring, time-dependent covariates, mulitple failures), and the examples given in the manuals usually only address one issue at the time.

          To sum it up again, my main interest lies in whether there is a difference in the length of calving time between period 1 and period 2 (or optionally even period 3). Also I will correct for other factors x1 x2 x3 (like breed, age, etc. of the individual cow).

          If I understood you correctly, I don't need strata() option for my model as I'm not interested in the different distributions for each failure order.

          According to the discussion I would use the following code, referring to my example data table above:

          Code:
          stset stop_gap, enter(start_gap) exit(time .) fail(event) id(ID)
          stcox x1 x2 x3,  tvc(period) shared(herd) vce(cluster id)
          I'm still not a 100% sure whether I specified the stset and stcox commands correctly, especially the tvc and vce options.
          I'd also be happy to share more information if additional details helped to clarify the correct use of the commands.







          Comment


          • #6

            stsplit did not create "strata" in the sense used in a hazard analysis; it just assigned the "treatment" period and there is no issue with the repetition of covariate values. I also don't agree with your use of the word "censorship", which refers to observations where exact dates of events not known, only whether the events occur before or after certain dates, or between two dates. In any case, these issues don't seem to affect your analysis: your current stset and stcox statements look okay to me.

            However your period comparison might still be confounded by calendar time-trends. These would, in turn, be correlated with birth order. I suggest an auxiliary analysis in which calendar time, possibly with seasonal components, is added as a time-varying covariate. The question then becomes: is there an effect of period independent of the general time trend.
            Last edited by Steve Samuels; 03 May 2015, 21:10.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment

            Working...
            X