Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Restricting the survival analysis to first five years since diagnosis

    Hi Everyone,


    Below is an example of data from the survival analysis.

    The total duration of follow up is 18 years+. However, I want to restrict my survival analysis to the first five years since diagnosis.

    datediag dead dateexit id perdiag agegrp
    4429 1 4706 1570110 0 5
    4533 1 6754 1570111 0 5
    4387 1 4650 1570112 1 5
    4669 1 4856 1570113 0 4
    4123 1 4236 1570114 0 3
    4883 0 4896 1570115 0 2
    4324 1 4432 1570116 0 5
    4129 1 4231 1349552 0 5
    4114 1 4155 1349553 0 5
    4220 1 4343 1349554 1 5
    4201 1 4220 1349555 0 5
    5332 1 5449 1349556 0 5


    Could anyone suggest me what is the best way to restrict my analysis to FIRST FIVE YEARS SINCE DIAGNOSIS?

    I am confused between the two methods:

    * Generating duration variable
    gen duration = dateexit - datediag


    * We only want first five years after diagnosis
    drop if duration > 1825


    or using the stsplit command and restricting the analysis to first five years

    stsplit fiveyears, at (5)
    and
    dropping
    drop if fiveyears==5


  • #2
    The first approach is incorrect and will induce a bias that is likely to be considerable. That approach will exclude the individuals that did not die within 5 years. You want to censor their times at 5 years, not exclude them. I don't recommend it, but the code you want with that approach would be something like:

    Code:
    replace duration=1825 if duration > 1825
    replace dead=0 if duration > 1825
    stset duration, fail(dead)
    The approach using stplit will work, and is preferable to the first approach, but I think the easiest approach is to impose the censoring when you stset by using the exit() option.

    Code:
    stset dateexit, failure(dead) scale(365.25) origin(datediag) exit(time datediag + 5*365.25)
    This does the same thing as stsplit, except it does it in one step. You will still have 2 observations for each individual who survives more than 5 years, 1 with _st==1 and one with _st==0 (the person time that will be excluded).


    Comment


    • #3
      Dear Prof. Paul

      Thank you very much for your input. I found your suggestions very helpful.

      I need one for help from you. I am running a piecewise regression model on my data.
      I would be very grateful to you if you could suggest me any resource which can help me understand the 'streg' output containing interaction terms?

      streg ib0.years ib1.perdiag ib1.agegrp ib0.years#ib1.perdiag, distribution (e) cformat(%4.3f)


      I have attached the screenshot of the output.
      Attached Files

      Comment


      • #4
        The following page contains a video lecture, copy of the lecture notes, code to reproduce the analyses in the lectures, and a link to a document on understanding interactions in Stata.

        http://pauldickman.com/video/interactions/

        The lecture notes use R, but there is a do file with Stata code to reproduce the models (: in R is the same as # in Stata, * in R is equivalent to ## in Stata).

        There's a lot of math, but I think math is unavoidable if you are to understand interactions. The examples use Cox regression, but the concept is the same with Poisson regression (they are both models for the log hazard).

        As applied to your data:

        The estimated hazard ratio for period (comparing 71-80 to 81-90), holding age constant, *during the first year* is 1.186.
        The estimated hazard ratio for period (comparing 71-80 to 81-90), holding age constant, *during the second year* is 1.186 * 0.854.
        The estimated hazard ratio for period (comparing 71-80 to 81-90), holding age constant, *during the third year* is 1.186 * 0.808.
        etc.

        You could get the hazard ratios, with confidence intervals, for years 2 and 3 using the -lincom- command.

        You could also get the 5 hazard ratios for the effect of period (one for each of the 5 years) by using the following syntax:

        Code:
        streg ib1.perdiag ib1.agegrp ib0.years#ib1.perdiag, distribution (e) cformat(%4.3f)
        Details of why are in my lecture notes linked above.

        I'm assuming the variable years is the result of stsplit (or similar) so fitting the interaction is effectively relaxing the proportional hazards assumption. You're probably already aware of that, but I mention to ensure we're on the same page. A test of the 4 interaction parameters can be viewed as a test of proportional hazards.

        As an aside, the page with video lectures is new (it's not linked from anywhere and this is the first time I've advertised it publicly). All comments and suggestions are welcome.

        Comment


        • #5
          Dear Prof. Paul,

          Thank you for the elaborate response. Let me go through the material. I will get back to you if I have any question.

          Once again, thank you very much for your time.

          Regards
          Pavan

          Comment


          • #6
            [QUOTE=Paul Dickman;n1573235]The first approach is incorrect and will induce a bias that is likely to be considerable. That approach will exclude the individuals that did not die within 5 years. You want to censor their times at 5 years, not exclude them. I don't recommend it, but the code you want with that approach would be something like:

            Code:
            replace duration=1825 if duration > 1825
            replace dead=0 if duration > 1825
            stset duration, fail(dead)

            Dear Paul and Pavan.

            Thank you for the above approach, I had been looking for this for quite some time! Just would like to add that second step in the above code (i.e., replacing the failure variable to 0 or censored, for follow-up longer than 5 years) will have to be done in the first step before replacing all follow-up times>5yr to =5yr, because once all follow-up times>5yrs have been replaced, stata wont be able to detect any observation >5yr to change the failure variable for follow-up time >5yrs. Code would then be like this:

            replace dead=0 if duration > 1825
            replace duration=1825 if duration > 1825
            stset duration, fail(dead)

            Best,
            Nitya

            Comment

            Working...
            X