Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interval censored survival using aggregate data

    Hello,

    I've got data on vials of drosophila infected with a virus. Then at each time point, I record how many are still alive. I would like to do an interval censored survival analysis to determine how many are alive at the end.


    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(STRAIN infected startingsample time1 time2 time3 time4 time5 time6) str1 gender byte age
    1 1 20 19 19 18 18 18 15 "F" 2
    2 0 15 14 14 14 14 12 12 "M" 4
    3 1 30 29 29 29 29 29 29 "F" 3
    4 1 15 15 15 15 13 12 10 "F" 3
    5 0 10 10  9  9  8  7  7 "F" 3
    6 1 20 17 16 16 16 14 14 "M" 4
    7 0  9  8  5  3  0  0  0 "F" 2
    8 0 12 10 10 10 10  9  7 "M" 3
    end
    ------------------ copy up to and including the previous line ------------------


    I used this code below

    [/CODE]
    stintreg i.infected gender , interval(t6 startingsample) distribution(weibull)


    This code doesn't allow me to enter all the timepoints in the interval. Do I only need to enter the last sample recorded regardless of other time points and the starting sample? Thank you in advance.

  • #2
    Well, the code you show will fail if only because of the silly reason that gender is a string variable and so cannot be used in the regression. More substantively, you have the wrong data structure here. What you need is to go to long layout (consecutive time periods in separate observations, not as separate variables in a single observation), and calculate the beginning and endpoints of the interval each observation corresponds to along with the number that died in that interval. (The final interval from time6 to infinity will all be right censored.) Then you can use -stintreg-. So, like this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(STRAIN infected startingsample time1 time2 time3 time4 time5 time6) str1 gender byte age
    1 1 20 19 19 18 18 18 15 "F" 2
    2 0 15 14 14 14 14 12 12 "M" 4
    3 1 30 29 29 29 29 29 29 "F" 3
    4 1 15 15 15 15 13 12 10 "F" 3
    5 0 10 10  9  9  8  7  7 "F" 3
    6 1 20 17 16 16 16 14 14 "M" 4
    7 0  9  8  5  3  0  0  0 "F" 2
    8 0 12 10 10 10 10  9  7 "M" 3
    end
    
    rename time* n_alive*
    reshape long n_alive, i(STRAIN) j(begin)
    by STRAIN (begin): gen end = begin[_n+1], after(begin)
    by STRAIN (begin): gen n_died = cond(_n == _N, n_alive, n_alive - n_alive[_n+1])
    
    encode gender, gen(sex)
    
    stintreg i.infected i.sex [fweight = n_died], interval(begin end) distribution(weibull)
    Added: I'm not sure it makes sense to use a Weibull, or any other continuous distribution, unless the time periods 1, 2, 3, 4, 5, 6 are actually equally spaced in real-world time (e.g. if they are consecutive days.) If they are irregularly spaced, then I think fitting a continuous distribution in this way is going to be, at best a distortion, if not completely meaningless. What I would do, instead, is replace the values 1, 2, 3, 4, 5, and 6 by the actual amount of elapsed time from the start of the experiment and then proceed. (You have to do that between the reshape and the immediately following command that creates variable end.)
    Last edited by Clyde Schechter; 29 Aug 2022, 18:31.

    Comment


    • #3
      On further reflection, if you are fitting a parametric distribution, it looks to me as though what you are calling time1 really corresponds to time = 0. That is, no deaths happened, nor could have happened, before that time, within the scope of your study. If that is correct, then the code in #2 needs to be amended:
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(STRAIN infected startingsample time1 time2 time3 time4 time5 time6) str1 gender byte age
      1 1 20 19 19 18 18 18 15 "F" 2
      2 0 15 14 14 14 14 12 12 "M" 4
      3 1 30 29 29 29 29 29 29 "F" 3
      4 1 15 15 15 15 13 12 10 "F" 3
      5 0 10 10  9  9  8  7  7 "F" 3
      6 1 20 17 16 16 16 14 14 "M" 4
      7 0  9  8  5  3  0  0  0 "F" 2
      8 0 12 10 10 10 10  9  7 "M" 3
      end
      
      rename time* n_alive*
      reshape long n_alive, i(STRAIN) j(begin)
      replace begin = begin - 1
      by STRAIN (begin), sort: gen end = begin[_n+1], after(begin)
      by STRAIN (begin): gen n_died = cond(_n == _N, n_alive, n_alive - n_alive[_n+1])
      
      encode gender, gen(sex)
      
      stintreg i.infected i.sex [fweight = n_died], interval(begin end) distribution(weibull)
      Without that change, you will be trying to fit a Weibull distribution to a data set where everything is immortal between time 0 and 1, and then deaths begin occurring thereafter. It will take a considerable contortion of the parameters of the Weibull distribution to fit that and I wouldn't trust the results to be even approximately reflective of the real process here.

      Comment


      • #4
        Hi Clyde, Thank you very much. Regarding the distribution, the stintreg command requires specification of a distribution which I think is odd. The "starting sample" is the sample without any deaths. At time1, I have recorded some deaths and the total N goes down. Should I be trying to fit a non-parametric model as my real data actually have some missing time points for some strains instead of the stintreg. The STATA manual mentioned the use of the stintcox command but I am unable to install this version as I can't find it in my version 17.

        Comment


        • #5
          Regarding the distribution, the stintreg command requires specification of a distribution which I think is odd.
          stintreg, I would parse as st = survival time command, int = interval censored, reg = regression. So it's a generalization of -streg-, which does parametric survival model fitting on right-censored data--generalizing it to interval censored data. So it doesn't surprise me that it requires a distribution.

          The "starting sample" is the sample without any deaths.
          I hadn't noticed that in the original post. So starting sample is the equivalent of time 0. So I would change the code to:
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte(STRAIN infected startingsample time1 time2 time3 time4 time5 time6) str1 gender byte age
          1 1 20 19 19 18 18 18 15 "F" 2
          2 0 15 14 14 14 14 12 12 "M" 4
          3 1 30 29 29 29 29 29 29 "F" 3
          4 1 15 15 15 15 13 12 10 "F" 3
          5 0 10 10  9  9  8  7  7 "F" 3
          6 1 20 17 16 16 16 14 14 "M" 4
          7 0  9  8  5  3  0  0  0 "F" 2
          8 0 12 10 10 10 10  9  7 "M" 3
          end
          
          rename startingsample time0
          rename time* n_alive*
          reshape long n_alive, i(STRAIN) j(begin)
          by STRAIN (begin): gen end = begin[_n+1], after(begin)
          by STRAIN (begin): gen n_died = cond(_n == _N, n_alive, n_alive - n_alive[_n+1])
          
          encode gender, gen(sex)
          
          stintreg i.infected i.sex [fweight = n_died], interval(begin end) distribution(weibull)
          But given that you weren't expecting to choose a distribution, I worry that you chose Weibull for no particular reason. It may, indeed, make more sense then to do a Cox proportional hazards model using -stintcox-. I do not know why you cannot find it in your Stata 17 installation. It is definitely a part of official Stata in version 17. If I were you, I would run -update all- to make sure your Stata installation is completely up to date. If it still doesn't appear, I would uninstall and reinstall Stata from scratch. And if that doesn't resolve it, I would contact Stata Technical Support.

          If you do move to -stintcox- for this, bear in mind that -stintcox-, unlike -stintreg-, does not support weights. So just before the -encode gender- command you will need to:
          Code:
          drop if n_died == 0
          expand n_died
          and then you can run -stintcox-.

          Comment


          • #6
            Wonderful explanation as always Clyde. Thank you. This worked exactly as I wanted.

            Comment

            Working...
            X