Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating time-span data for survival analysis

    Dear Community

    I have a cohort data set with patients (patid) who experience a test (K) on date (Ktestdate) between entry (indexdate) and exit(exit). Patients may have had no test, one test or multiple tests. I am trying to set the data up as survival data so that I can fit a Cox regression model to analyse exposure and covariate relationships with test frequency.

    I have got as far as creating a timespan variable (Ktime0) to indicate the time between the date of the last test and the next failure(K).

    What I want to end up with is a new observation which is the time between the last test (K) and the end of follow-up (exit). I.e. for a patient with only one failure event I want two observations, one from time of entry (indexdate) until failure (K) and a second from the failure (K) until the end of follow-up (exit).

    I hope I have outlined the problem clearly and would massively appreciate any help on this matter.

    Kind regards

    Charlie
    Public Health / Health Economics MSC student.

  • #2
    Welcome to the Stata Forum / Statalist.

    Please prefer to share data (real or mock, full or abridged, depending on the situation) under CODE delimiters, as recommended in the FAQ.

    You may wish to use a toy example for that as well.

    This is the best way to entail helpful replies.
    Best regards,

    Marcos

    Comment


    • #3
      Charlie:
      as an aside to Marcos' helpful advice, I would take a look at -stsplit- entry in Stata .pdf manual.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Ta Carlos and Marko

        Stata 14, Windows

        Data set has 230,000 subjects, 1.1m observations, representing long form multi-event data.

        My example data shows the 3 options; multiple, missing or single failure(K)

        I want to model failure rates during the whole time exposed using poisson or cox.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float id int indexdate byte gender double(Ktestdate K) float(dob exit)
        1 18007 2              18018  4.199999809265137 -7123.5 19813
        1 18007 2 18687.000000000004  4.199999809265137 -7123.5 19813
        1 18007 2 19081.000000000004  4.300000190734863 -7123.5 19813
        1 18007 2              18044                4.5 -7123.5 19813
        1 18007 2              19374 3.9000000953674316 -7123.5 19813
        2 19682 2              19813                  . -7853.5 19813
        3 18084 2              18085  4.300000190734863 -4566.5 18112
        end
        format %td indexdate
        format %d Ktestdate
        format %td dob
        format %td exit
        When I use: "stset Ktestdate, failure(K) id(id) origin(dob) enter(indexdate) exit(exit) scale(365.25)" this does not include time between the last failure and exit from the study, resulting in lower total times exposed and at risk than I would like.

        I tried:

        snapspan id Ktestdate indexdate-gender K-exit, generate(Ktime0)
        gen start=max(indexdate, Ktime0)
        format start %td
        stset exit, failure(K) origin(start) time0(start) exit(time .) scale(365.25) id(id)

        This results in all time between indexdate and exit being captured, which I want, but omits some of the observations:

        ------------------------------------------------------------------------------
        7 total observations
        4 overlapping records (exit[_n-1]>start) PROBABLE ERROR
        ------------------------------------------------------------------------------
        3 observations remaining, representing
        3 subjects
        2 failures in single-failure-per-subject data
        5.38 total analysis time at risk and under observation
        at risk from t = 0
        earliest observed entry t = 0
        last observed exit t = 4.944559

        What I want to produce is the data-set laid out in http://www.stata.com/support/faqs/st...re-time-data/; "for each patient there must be one observation per event or time interval"

        +------------------------------------------------------+ | id group time0 time status number size | |------------------------------------------------------| | 1 placebo 0 1 0 1 3 | | 2 placebo 0 4 0 2 0 | | 3 placebo 0 7 0 1 0 | | 4 placebo 0 10 0 5 0 | | 5 placebo 0 6 1 4 0 | |------------------------------------------------------| | 5 placebo 6 10 0 4 0 | | 6 placebo 0 14 0 1 0 | | 7 placebo 0 18 0 1 0 | | 8 placebo 0 5 1 1 3 | | 8 placebo 5 18 0 1 3 | |------------------------------------------------------| | 9 placebo 0 12 1 1 1 | | 9 placebo 12 16 1 1 1 | | 9 placebo 16 18 0 1 1 | +------------------------------------------------------+ Thanks




        Comment


        • #5
          Sorry, the last table should be:

          Code:
          id     group   time0   time   status   number   size
          1   placebo       0      1        0        1      3
          2   placebo       0      4        0        2      0
          3   placebo       0      7        0        1      0
          4   placebo       0     10        0        5      0
          5   placebo       0      6        1        4      0
          5   placebo       6     10        0        4      0  
          6   placebo       0     14        0        1      0 
          7   placebo       0     18        0        1      0
          8   placebo       0      5        1        1      3
          8   placebo       5     18        0        1      3 
          9   placebo       0     12        1        1      1
          9   placebo      12     16        1        1      1
          9   placebo      16     18        0        1      1
          Last edited by Charlie Kenward; 10 Aug 2017, 10:42.

          Comment


          • #6
            sorry, it's clear in the link http://www.stata.com/support/faqs/st...re-time-data/

            Comment


            • #7
              Charlie:
              you way want a database like the one reported under Example 10, -stcox- entry, Stata .pdf manual.
              Your contribution #5 is unreadable: please use always CODE delimiters.
              I'm also not clear whether you used the Multiple faiilures option provided by -stset-.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Carlo,

                Apologies for the table, I've corrected it.

                By multiple failures option do you mean exit(time .)?

                I am really after a database like the one now corrected in post #5 so that for each id I have observations containing time between failures with the last observation being the time between the last failure and the end of the study (exit).





                Comment


                • #9
                  Charlie:
                  -dataex- is an useful way to share example/excerpt of your dataset with other listers (type -search dataex- from within Stata to install it. Thanks).
                  That said, you may want something like the following example (which elaborates a bit on yours):
                  Code:
                  . stset risk_time, id(id) failure( status ) exit(time)
                  
                                  id:  id
                       failure event:  status != 0 & status < .
                  obs. time interval:  (risk_time[_n-1], risk_time]
                   exit on or before:  time time
                  
                  ------------------------------------------------------------------------------
                           13  total observations
                            1  observation begins on or after exit
                  ------------------------------------------------------------------------------
                           12  observations remaining, representing
                            9  subjects
                            4  failures in multiple-failure-per-subject data
                           77  total analysis time at risk and under observation
                                                                  at risk from t =         0
                                                       earliest observed entry t =         0
                                                            last observed exit t =        18
                  where the added variable r-risk_time- is obtained as follows:
                  Code:
                  g risk_time= time- time0
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    I fail to understand the model because, according to #4, the variable selected to spot the event ("failure" or K) is continuous, instead of binary, shall the model be an "ordinary" survival analysis, or categorical, shall there be multiple failures.
                    Best regards,

                    Marcos

                    Comment


                    • #11
                      Carlo and Marcos

                      This is a sample of my data-set which I am trying to arrange into the format in post #5

                      I have created a binary failure var= Kevent

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input float id int indexdate byte gender double(Ktestdate K) float(dob exit) double Ktime0 float Kevent
                      1 18007 2              18018  4.199999809265137 -7123.5 19813                  . 1
                      1 18007 2              18044                4.5 -7123.5 19813              18018 1
                      1 18007 2 18687.000000000004  4.199999809265137 -7123.5 19813              18044 1
                      1 18007 2 19081.000000000004  4.300000190734863 -7123.5 19813 18687.000000000004 1
                      1 18007 2              19374 3.9000000953674316 -7123.5 19813 19081.000000000004 1
                      2 19682 2              19813                  . -7853.5 19813                  . 0
                      3 18084 2              18085  4.300000190734863 -4566.5 18112                  . 1
                      end
                      format %td indexdate
                      format %d Ktestdate
                      format %td dob
                      format %td exit
                      format %d Ktime0

                      I used the command: "snapspan id Ktestdate indexdate gender K dob exit Kevent, generate(Ktime0)" to try to create span data.

                      I can't work out how to create an observation for the time between the last failure event and the exit time.

                      Thank you

                      Charlie

                      Comment


                      • #12
                        Thanks for all your help Carlos and Marcos, I have now solved the problem.
                        Created dummy observation at the end of each record.

                        Comment

                        Working...
                        X