Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Setting up time-to-first-event variables

    Hello,

    I am new to Stata and looking to efficiently set up time-to-event variables for the time to event considering competing risks and the multi-level event variable. This will be used to graph CIFs with competing events (death).

    My question is if there is an efficient way to create these variables using long-data if the outcomes are separated into 3 separate vars, and time is days from index date

    I was able to clumsily work my way through this using with a long series of "replace z = x if x<=y" commands, but I am hesitant to trust that data.

    My data is set up something like (made up on the spot):

    Code:
    Example 
    id desired_outcome outc_date death_date censor_date
    1 0 7 242 100
    1 0 10 242 100
    1 1 15 242 100
    1 1 20 242 100
    2 1 44 . 30
    2 1 55 . 30
    3 1 52 . 30
    3 0 5 25 30
    3 0 10 25 30
    3 0 15 25 30
    3 0 20 25 30
    Example code: bysort id: egen outc_1 = min(outc_date) replace outc_1 = censor_date if missing(outc_1) *series of "replace if less than" lines* gen outc_1_type = . replace gen outc_1_type = 0 if gen outc_1==censor_date replace gen outc_1_type = 1 if gen outc_1==outc_date replace gen outc_1_type = 2 if gen outc_1==death_date
    I think the structure I want would be like this after dropping anything other than the first outcome:

    Code:
      
    id desired_outcome outc_date death_date censor_date outc_1 outc_1_type
    1 1 15 242 100 15 1
    2 1 44 . 30 30 0
    3 0 20 25 30 25 2
    I think it works this way but I'm hesitant to trust this coding I'm sure there's an easier way.

    Thank you so much.

  • #2
    I don't follow what you want here. For identifier 1 why is 15 chosen as outcome date and not the other values? Same question for identifier 2 and 44 and identifier 3 and 20.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      I don't follow what you want here. For identifier 1 why is 15 chosen as outcome date and not the other values? Same question for identifier 2 and 44 and identifier 3 and 20.
      To clarify, the data is a merging of multiple datasets taken from administrative health records with each var essentially being a different dataset.

      Since I am looking to set this up as a time-to-first event variable (outc_1 [time], outc_1_type [status])to use for a CIF, in the way this example is set up, the primary outcome of interest is desired_outcome (desired_outcome==1). So the first event is the first outcome date (outc_date) where desired_outcome==1.

      Considering competing risks, the competing risk in this case is death, since it precludes the outcome of interest. So in any case where the outcome of interest has not been observed yet (desired_outcome==1) and the subject has died the time-to-first-event variable would be the time to death_date.

      In the case the subject no longer meets eligibility criteria they will be censored at the last known "good" time point (censored_date) and no longer contribute person-time to the study.

      So for time-to-first-event, suggested here as

      Id 1: The min value for desired_outcome==1 is 15 (outc_date). Records indicate they were not censored until day 100 (censor_date) and didn't die until day 242.

      Id 2: Min date for desired_outcome==1 is 44, but they were censored on day 30 (censor_date), The outcome did not occur until they stopped contributing exposed-person-time, so they do not contribute an event.

      id 3: Desired outcome is not observed (desired_outcome!=1) but the subject died (death_date==25) while still within the study (death_date<censor_date). So they contribute a competing event as their first exposed event (outc_1_type==2).

      Of course this is just one component of the question with relevant sensitivity analyses set up, but I'm looking for a better way to run this section.

      Comment


      • #4
        Thanks very much for the details. I think it is going to be more prudent if I leave this to the biostatisticians or medical statisticians familiar with this territory.

        Comment

        Working...
        X