Setting up time-to-first-event variables

Nate Fitzpatrick

Join Date: Sep 2021
Posts: 20

Setting up time-to-first-event variables

16 Nov 2021, 10:38

Hello,

I am new to Stata and looking to efficiently set up time-to-event variables for the time to event considering competing risks and the multi-level event variable. This will be used to graph CIFs with competing events (death).

My question is if there is an efficient way to create these variables using long-data if the outcomes are separated into 3 separate vars, and time is days from index date

I was able to clumsily work my way through this using with a long series of "replace z = x if x<=y" commands, but I am hesitant to trust that data.

My data is set up something like (made up on the spot):

Code:

Example id
desired_outcome
outc_date
death_date
censor_date

1
0
7
242
100

1
0
10
242
100

1
1
15
242
100

1
1
20
242
100

2
1
44
.
30

2
1
55
.
30

3
1
52
.
30

3
0
5
25
30

3
0
10
25
30

3
0
15
25
30

3
0
20
25
30


Example code:
bysort id: egen outc_1 = min(outc_date)
replace outc_1 = censor_date if missing(outc_1)
*series of "replace if less than" lines*

gen outc_1_type = .
replace gen outc_1_type = 0 if gen outc_1==censor_date
replace gen outc_1_type = 1 if gen outc_1==outc_date
replace gen outc_1_type = 2 if gen outc_1==death_date

I think the structure I want would be like this after dropping anything other than the first outcome:

Code:

  id
desired_outcome
outc_date
death_date
censor_date
outc_1
outc_1_type

1
1
15
242
100
15
1

2
1
44
.
30
30
0

3
0
20
25
30
25
2

I think it works this way but I'm hesitant to trust this coding I'm sure there's an easier way.

Thank you so much.

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35715
#2

16 Nov 2021, 10:51

I don't follow what you want here. For identifier 1 why is 15 chosen as outcome date and not the other values? Same question for identifier 2 and 44 and identifier 3 and 20.
Comment
Nate Fitzpatrick

Join Date: Sep 2021

Posts: 20
#3

16 Nov 2021, 12:16

Originally posted by Nick Cox View Post

I don't follow what you want here. For identifier 1 why is 15 chosen as outcome date and not the other values? Same question for identifier 2 and 44 and identifier 3 and 20.

To clarify, the data is a merging of multiple datasets taken from administrative health records with each var essentially being a different dataset.

Since I am looking to set this up as a time-to-first event variable (outc_1 [time], outc_1_type [status])to use for a CIF, in the way this example is set up, the primary outcome of interest is desired_outcome (desired_outcome==1). So the first event is the first outcome date (outc_date) where desired_outcome==1.

Considering competing risks, the competing risk in this case is death, since it precludes the outcome of interest. So in any case where the outcome of interest has not been observed yet (desired_outcome==1) and the subject has died the time-to-first-event variable would be the time to death_date.

In the case the subject no longer meets eligibility criteria they will be censored at the last known "good" time point (censored_date) and no longer contribute person-time to the study.

So for time-to-first-event, suggested here as

Id 1: The min value for desired_outcome==1 is 15 (outc_date). Records indicate they were not censored until day 100 (censor_date) and didn't die until day 242.

Id 2: Min date for desired_outcome==1 is 44, but they were censored on day 30 (censor_date), The outcome did not occur until they stopped contributing exposed-person-time, so they do not contribute an event.

id 3: Desired outcome is not observed (desired_outcome!=1) but the subject died (death_date==25) while still within the study (death_date<censor_date). So they contribute a competing event as their first exposed event (outc_1_type==2).

Of course this is just one component of the question with relevant sensitivity analyses set up, but I'm looking for a better way to run this section.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35715
#4

16 Nov 2021, 13:25

Thanks very much for the details. I think it is going to be more prudent if I leave this to the biostatisticians or medical statisticians familiar with this territory.
Comment

Announcement

Setting up time-to-first-event variables

Comment

Comment

Comment