Creating time-span data for survival analysis

Charlie Kenward

Join Date: Aug 2017

Posts: 7
#1

Creating time-span data for survival analysis

10 Aug 2017, 05:05

Dear Community

I have a cohort data set with patients (patid) who experience a test (K) on date (Ktestdate) between entry (indexdate) and exit(exit). Patients may have had no test, one test or multiple tests. I am trying to set the data up as survival data so that I can fit a Cox regression model to analyse exposure and covariate relationships with test frequency.

I have got as far as creating a timespan variable (Ktime0) to indicate the time between the date of the last test and the next failure(K).

What I want to end up with is a new observation which is the time between the last test (K) and the end of follow-up (exit). I.e. for a patient with only one failure event I want two observations, one from time of entry (indexdate) until failure (K) and a second from the failure (K) until the end of follow-up (exit).

I hope I have outlined the problem clearly and would massively appreciate any help on this matter.

Kind regards

Charlie
Public Health / Health Economics MSC student.
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

10 Aug 2017, 06:34

Welcome to the Stata Forum / Statalist.

Please prefer to share data (real or mock, full or abridged, depending on the situation) under CODE delimiters, as recommended in the FAQ.

You may wish to use a toy example for that as well.

This is the best way to entail helpful replies.

Best regards,

Marcos
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#3

10 Aug 2017, 08:21

Charlie:
as an aside to Marcos' helpful advice, I would take a look at -stsplit- entry in Stata .pdf manual.

Kind regards,
Carlo
(Stata 19.0)
Comment
Charlie Kenward

Join Date: Aug 2017

Posts: 7
#4

10 Aug 2017, 09:48

Ta Carlos and Marko

Stata 14, Windows

Data set has 230,000 subjects, 1.1m observations, representing long form multi-event data.

My example data shows the 3 options; multiple, missing or single failure(K)

I want to model failure rates during the whole time exposed using poisson or cox.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float id int indexdate byte gender double(Ktestdate K) float(dob exit) 1 18007 2 18018 4.199999809265137 -7123.5 19813 1 18007 2 18687.000000000004 4.199999809265137 -7123.5 19813 1 18007 2 19081.000000000004 4.300000190734863 -7123.5 19813 1 18007 2 18044 4.5 -7123.5 19813 1 18007 2 19374 3.9000000953674316 -7123.5 19813 2 19682 2 19813 . -7853.5 19813 3 18084 2 18085 4.300000190734863 -4566.5 18112 end format %td indexdate format %d Ktestdate format %td dob format %td exit

When I use: "stset Ktestdate, failure(K) id(id) origin(dob) enter(indexdate) exit(exit) scale(365.25)" this does not include time between the last failure and exit from the study, resulting in lower total times exposed and at risk than I would like.

I tried:

snapspan id Ktestdate indexdate-gender K-exit, generate(Ktime0)
gen start=max(indexdate, Ktime0)
format start %td
stset exit, failure(K) origin(start) time0(start) exit(time .) scale(365.25) id(id)

This results in all time between indexdate and exit being captured, which I want, but omits some of the observations:

------------------------------------------------------------------------------
7 total observations
4 overlapping records (exit[_n-1]>start) PROBABLE ERROR
------------------------------------------------------------------------------
3 observations remaining, representing
3 subjects
2 failures in single-failure-per-subject data
5.38 total analysis time at risk and under observation
at risk from t = 0
earliest observed entry t = 0
last observed exit t = 4.944559

What I want to produce is the data-set laid out in http://www.stata.com/support/faqs/st...re-time-data/; "for each patient there must be one observation per event or time interval"

+------------------------------------------------------+ | id group time0 time status number size | |------------------------------------------------------| | 1 placebo 0 1 0 1 3 | | 2 placebo 0 4 0 2 0 | | 3 placebo 0 7 0 1 0 | | 4 placebo 0 10 0 5 0 | | 5 placebo 0 6 1 4 0 | |------------------------------------------------------| | 5 placebo 6 10 0 4 0 | | 6 placebo 0 14 0 1 0 | | 7 placebo 0 18 0 1 0 | | 8 placebo 0 5 1 1 3 | | 8 placebo 5 18 0 1 3 | |------------------------------------------------------| | 9 placebo 0 12 1 1 1 | | 9 placebo 12 16 1 1 1 | | 9 placebo 16 18 0 1 1 | +------------------------------------------------------+ Thanks
Comment

Charlie Kenward

Join Date: Aug 2017
Posts: 7

10 Aug 2017, 09:49

Sorry, the last table should be:

Code:

id     group   time0   time   status   number   size
1   placebo       0      1        0        1      3
2   placebo       0      4        0        2      0
3   placebo       0      7        0        1      0
4   placebo       0     10        0        5      0
5   placebo       0      6        1        4      0
5   placebo       6     10        0        4      0  
6   placebo       0     14        0        1      0 
7   placebo       0     18        0        1      0
8   placebo       0      5        1        1      3
8   placebo       5     18        0        1      3 
9   placebo       0     12        1        1      1
9   placebo      12     16        1        1      1
9   placebo      16     18        0        1      1

Last edited by Charlie Kenward; 10 Aug 2017, 10:42.

Comment

Charlie Kenward

Join Date: Aug 2017

Posts: 7
#6

10 Aug 2017, 09:50

sorry, it's clear in the link http://www.stata.com/support/faqs/st...re-time-data/
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

10 Aug 2017, 10:29

Charlie:
you way want a database like the one reported under Example 10, -stcox- entry, Stata .pdf manual.
Your contribution #5 is unreadable: please use always CODE delimiters.
I'm also not clear whether you used the Multiple faiilures option provided by -stset-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Charlie Kenward

Join Date: Aug 2017

Posts: 7
#8

10 Aug 2017, 10:54

Carlo,

Apologies for the table, I've corrected it.

By multiple failures option do you mean exit(time .)?

I am really after a database like the one now corrected in post #5 so that for each id I have observations containing time between failures with the last observation being the time between the last failure and the end of the study (exit).
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

10 Aug 2017, 11:24

Charlie:
-dataex- is an useful way to share example/excerpt of your dataset with other listers (type -search dataex- from within Stata to install it. Thanks).
That said, you may want something like the following example (which elaborates a bit on yours):

Code:

. stset risk_time, id(id) failure( status ) exit(time)

                id:  id
     failure event:  status != 0 & status < .
obs. time interval:  (risk_time[_n-1], risk_time]
 exit on or before:  time time

------------------------------------------------------------------------------
         13  total observations
          1  observation begins on or after exit
------------------------------------------------------------------------------
         12  observations remaining, representing
          9  subjects
          4  failures in multiple-failure-per-subject data
         77  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =        18

where the added variable r-risk_time- is obtained as follows:

Code:

g risk_time= time- time0

Kind regards,
Carlo
(Stata 19.0)

Comment

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#10

10 Aug 2017, 12:23

I fail to understand the model because, according to #4, the variable selected to spot the event ("failure" or K) is continuous, instead of binary, shall the model be an "ordinary" survival analysis, or categorical, shall there be multiple failures.

Best regards,

Marcos
Comment

Charlie Kenward

Join Date: Aug 2017
Posts: 7

#11

10 Aug 2017, 13:21

Carlo and Marcos

This is a sample of my data-set which I am trying to arrange into the format in post #5

I have created a binary failure var= Kevent

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id int indexdate byte gender double(Ktestdate K) float(dob exit) double Ktime0 float Kevent
1 18007 2              18018  4.199999809265137 -7123.5 19813                  . 1
1 18007 2              18044                4.5 -7123.5 19813              18018 1
1 18007 2 18687.000000000004  4.199999809265137 -7123.5 19813              18044 1
1 18007 2 19081.000000000004  4.300000190734863 -7123.5 19813 18687.000000000004 1
1 18007 2              19374 3.9000000953674316 -7123.5 19813 19081.000000000004 1
2 19682 2              19813                  . -7853.5 19813                  . 0
3 18084 2              18085  4.300000190734863 -4566.5 18112                  . 1
end
format %td indexdate
format %d Ktestdate
format %td dob
format %td exit
format %d Ktime0

I used the command: "snapspan id Ktestdate indexdate gender K dob exit Kevent, generate(Ktime0)" to try to create span data.

I can't work out how to create an observation for the time between the last failure event and the exit time.

Thank you

Charlie

Comment

Charlie Kenward

Join Date: Aug 2017

Posts: 7
#12

11 Aug 2017, 08:03

Thanks for all your help Carlos and Marcos, I have now solved the problem.
Created dummy observation at the end of each record.
Comment

Announcement