Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • survival analysis with overlapping records probable error

    Dear Stata team,
    hi again,
    I am trying to run the survival analysis. I have multiple records per subject. the participant come to the clinic and the specimen is taken to measure the outcome (it takes many days for the outcome to be detected) and some times the participant may come to the next visit when the second sample is collected from him before the results of the first sample shows up.
    these overlapping between the visit2 and the outcome of visit1 should be independent meaning that it doesn't matter if they result of visit1 comes before sampling in visit2. what matters is the duration that we observe for each sampling to be detected.
    is there a way where I can get rid of this overlapping issue? does it affect the analysis?

    I have these variables:
    date of sampling
    date of detection
    detection result (0 for positive and 1 for negative)

    I used the following
    stset date of detection, id(id) failure(detection result==1) time0(date of sampling)

    any help on this ?

    thank you very much

    best regards
    Umama

  • #2
    Hard to say exactly from that info, but my first thought would be to ignore all the visit dates, and just focus on time-to-first positive test.

    I'd think carefully if you are truly interested in the times to each failure. In my experience multiple failures greatly complicate the analysis without always adding too much, or having a ready interpretation (but obviously highly dependent on your specific disease states).
    __________________________________________________ __
    Assistant Professor, Department of Biostatistics and Epidemiology
    School of Public Health and Health Sciences
    University of Massachusetts- Amherst

    Comment


    • #3
      If I understood right (and I'm not sure), the date of the outcome should be the date it was identifyed, not the date the results came to the office. Being this so, there won't be any overlapping,

      On the other hand, shall the event be recurrent, you may think about using a different approach, such as stratifying the Cox model and applying CP (counting process) , Marginal or Gap Time.
      Best regards,

      Marcos

      Comment


      • #4
        Hi Marcos,
        so for illustration this I draw this table
        id Date Start of treatment Date of specimen collection Date of culture detection Detection result
        1 1 Jan 13 4 Jan 13 12 Jan 13 1
        1 7 Jan 13 13 Jan 13 2
        1 14 Jan 13 19 Jan 13 1
        2 2 Feb 13 10 Feb 13 15 Feb 13 1
        2 14 Feb 13 19 Feb 13 1
        2 18 Feb 13 25 Feb 13 2
        2 1 Mar 13 10 Mar 13 2
        3
        3
        4
        4
        so the overlap could be between the date of detection of the previous visit and the date of sampling of the following visit (for example in subject 1 the overlap is in the duration of detection 12 Jan with the second sampling time which is 7 Jan)
        but I want to see if the duration (the time from sampling and until the time of detection) but when I write the stset command in stata it gives me probable error overlapping record and i don't know if I should ignore it or do something about it

        also, thank you for the suggestion above, I will look more about it, if there is a command for CP would you kindly suggest how can I get it?

        thank you very much
        your help is highly appreciated
        best regards
        Umama

        Comment


        • #5
          hi Andrew,
          thank you for the comment, I was thinking that I may ignore the time of sampling but the problem is that there are gaps between the dates of sampling. is there any way I could get over that?

          many thanks
          best regards

          Comment


          • #6
            if there is a command for CP would you kindly suggest how can I get it?
            I fear you should preferrably take a close at the CP models, not just applying a command.

            That said, in a few words, you need to have a "interval" variable, the starting time and the stopping time.

            Also, shall the PH assumption be violated, you may need to perform a stratified CP approach.

            All in all, I strongly recommend you search further in the literature.

            These are "complex" types of survival analysis and commands as well as interpretation , let alone the selection of the most appropriate model, deserve attention and care.

            Best regards,

            Marcos

            Comment


            • #7
              Echoing Marco's advice, the data setup for multiple failures takes some very careful thought, FAQ here:

              http://www.stata.com/support/faqs/st...ure-time-data/

              and some other helpful links here:

              http://www1.udel.edu/ASA/Therneau_slides_for_packet.pdf

              https://stat.ethz.ch/education/semes...ntation_10.pdf




              __________________________________________________ __
              Assistant Professor, Department of Biostatistics and Epidemiology
              School of Public Health and Health Sciences
              University of Massachusetts- Amherst

              Comment


              • #8
                thank you for the informative sites
                so given that my data is longitudinal and the measurement which is the time to detection is repeated over visits, I should consider CP or stratified Cox, from the sites I understood that I may use conditional ordered event survival.
                so for each subject I have the variable of entering the time interval and and also the variable of exiting the time interval. You mentioned that I need to have interval variable: my understanding is that it should be a variable that code each strata? for example in my case it could be the number of visit (each visit represent the sampling time or the enter)
                so I am guessing that the table should be something like this
                id Visit# Date Start of treatment Date of specimen collection Date of culture detection Detection result X
                1 1 1 Jan 13 4 Jan 13 12 Jan 13 1 1
                1 2 7 Jan 13 13 Jan 13 2 1
                1 3 14 Jan 13 19 Jan 13 1 1
                2 1 2 Feb 13 10 Feb 13 15 Feb 13 1 2
                2 2 14 Feb 13 19 Feb 13 1 2
                2 3 18 Feb 13 25 Feb 13 2 2
                2 4 1 Mar 13 10 Mar 13 2 2
                3
                3
                4
                4
                but first i should check the PH assumption for the exposure of interest? by using the log-log graphs or the test estat phtest but if violated then I need to stratify
                but do I stratify based on the exposure of interest or do I stratify based on the Visit#?

                from the site that Andrew provided
                http://www.stata.com/support/faqs/st...ure-time-data/

                this approach for modeling sounds reasonable 3.2.3 The conditional risk set model (time from entry)

                but I need to consider that there are gaps between the visits so the last time of each visit is not necessarily the entry for the second visit

                also still the overlapping records problem I don't know how to deal with it

                thank you very much

                Comment


                • #9
                  Hello Umama,


                  I recommend you "associate" command, output and problem. For example, in the case of the PH assumption, I kindly suggest to post the command, share the results, then discuss about the alternatives, shall there be violation. In short, you may "extend" the model by using the tvc() option, or you may use the "culprit" variable as stratum.

                  Please read the FAQ, mainly on the topic related to sharing command and output,

                  You may use CODE delimiters or install the SSC dataex.

                  Thanks.
                  Best regards,

                  Marcos

                  Comment


                  • #10
                    that is great idea Marcos, thank you for the suggestion
                    I will check the site you suggested and post the commands and the output on this page
                    many thanks
                    best regards
                    Umama

                    Comment


                    • #11
                      Hi Marcos,
                      so I tried the tvc command in the cox model that I built but I am not sure what would be a good diagnostic post-modelling tool to test the validity of the model

                      my outcome is the time to culture negative
                      my time is the time from treatment
                      i used the following

                      stset time, id(id) failure(negative==1)
                      cox X1 X2, efron vce(robust) tvc(X3)

                      X3 is the time varying covariate coded as 1 until 4 months of treatment and 2 until the end of the follow up , X1 is the main exposure, i have it as continuous variable but I am also going to try category as binary

                      before I used the tvc I was depending on the aic and the residual plots
                      for the residuals this is what I used

                      predict mg, mgale
                      predict dr, deviance
                      predict xb, xb
                      scatter mg xb
                      scatter dr xb

                      but after I added the tvc to include the time varying covariate stata didn't accept these commands

                      plenty of thanks!

                      Comment

                      Working...
                      X