Dear Stata users
I have a specific question regarding a survival analysis with multiple failure and a time-dependent covariate, as well as multiple cluster levels. I had a proper look at the manuals of stsum, stset and stcox, and I am now somewhat confused by two different examples on survival analysis with with multiple failures I found in the Stata manual and the STB (see examples A and B below). I'm using Stata 13.
Description of my data:
I do have a multiple failure data structure, with 1000 cows (ID’s) that have multiple number of calving intervals (time from one calving date to the next calving date). I also have a time-dependent covariate 'period' (marked in blue), where observations have been split on 01jul2012. Data are left censored on 01jan2010, and stcox will be run after the stset command.
I’m particularly interested whether calving intervals differ over the 2 time periods, and whether they are affected by a few other predictor variables x1-x5
This is an extraction of two individuals in my data set. Start1 and Stop2 were the initial time variables, transformed into Start_gap and Stop_gap for the conditional risk set model as described by Mario Cleves in STB-49, page 38, where time from previous event is being used.
For the purpose of my research question it makes sense to measure time to each event from the time of the previous event.

A) Example 7 in the stcox manual, referring to an example in stsum:
Following this example, I would have to use Start_gap and Stop_gap in my data set, and create nf and newid according to the example. In the stset command I would use id(newid) and adjust for clustering in the stcox command, using vce(cluster id).
Question1 to this procedure: How could I include the fact, that ID’s(cows) are also clustered within herds? Is there a possibility to add an additional level?
Question2: I do include a time-dependent covariate ‘period’, so for a split(or expanded) observation, would both observations have the same nf and newid?
B) Example described by Mario Cleves on multiple failure-time data, STB-49, page 38(3.2.4)):
In this example, the clock is set zero after each sample, what corresponds to the variables Start_gap and Stop_gap in my data set as well. Like the example above, it foucuses on the time between two events.
Questions: What’s the exact differences between the two approaches A and B? In example B the variable str is included, what to my understanding corresponds to the variable nf in example A, right? But the variable id is handled quite differently, what's the reason for this?
As well as above, I want to include a time-dependent covariate, does it mean as well that ‘strata’ would contain the same number for 2 split(expanded) observations?
And again, would it be possible to add an additional level(herds) as well?
Many thanks for your help and inputs!
Isabel
I have a specific question regarding a survival analysis with multiple failure and a time-dependent covariate, as well as multiple cluster levels. I had a proper look at the manuals of stsum, stset and stcox, and I am now somewhat confused by two different examples on survival analysis with with multiple failures I found in the Stata manual and the STB (see examples A and B below). I'm using Stata 13.
Description of my data:
I do have a multiple failure data structure, with 1000 cows (ID’s) that have multiple number of calving intervals (time from one calving date to the next calving date). I also have a time-dependent covariate 'period' (marked in blue), where observations have been split on 01jul2012. Data are left censored on 01jan2010, and stcox will be run after the stset command.
I’m particularly interested whether calving intervals differ over the 2 time periods, and whether they are affected by a few other predictor variables x1-x5
This is an extraction of two individuals in my data set. Start1 and Stop2 were the initial time variables, transformed into Start_gap and Stop_gap for the conditional risk set model as described by Mario Cleves in STB-49, page 38, where time from previous event is being used.
For the purpose of my research question it makes sense to measure time to each event from the time of the previous event.
A) Example 7 in the stcox manual, referring to an example in stsum:
Following this example, I would have to use Start_gap and Stop_gap in my data set, and create nf and newid according to the example. In the stset command I would use id(newid) and adjust for clustering in the stcox command, using vce(cluster id).
Code:
stset t, id(newid) failure(d) time0(newt0) noshow stcox ...,vce(cluster id)
Question2: I do include a time-dependent covariate ‘period’, so for a split(or expanded) observation, would both observations have the same nf and newid?
B) Example described by Mario Cleves on multiple failure-time data, STB-49, page 38(3.2.4)):
Code:
stset time, fail(status) exit(futime) enter(time0) stcox...., strata(str) cluster(id)
Questions: What’s the exact differences between the two approaches A and B? In example B the variable str is included, what to my understanding corresponds to the variable nf in example A, right? But the variable id is handled quite differently, what's the reason for this?
As well as above, I want to include a time-dependent covariate, does it mean as well that ‘strata’ would contain the same number for 2 split(expanded) observations?
And again, would it be possible to add an additional level(herds) as well?
Many thanks for your help and inputs!
Isabel
Comment