Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stlist not creating new episodes.

    Dear All,

    I am trying to study how COVID-19 infections accelerates onset of chronic pain conditions. I am new to survival analysis in Stata and struggling with the stsplit command. Here is the dataex of my data:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(grpatidtreat d t0 failtime origin maxhadCOVID) byte(Asian White Black Hispanic age6579 age80plus Female Male) str1 race float(agecat sexcat timeCOVID)
     130 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
     194 0 1 18 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
     379 1 1 28 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
     598 1 1 20 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
     633 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
     686 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
     707 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
     836 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
     894 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
     900 1 1 25 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
     977 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    1027 0 1 31 690 0 1 0 0 0 1 0 1 0 "A" 1 1 .
    1028 1 1 10 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    1116 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    1224 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    1283 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 .
    1323 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    1423 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    1438 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    1599 1 1 13 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    1639 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    1879 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    1903 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    1955 0 1  7 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    1986 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    1988 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    2112 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    2159 0 1 31 690 0 0 0 1 0 1 0 0 1 "B" 1 2 .
    2307 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    2369 1 1 17 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    2389 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 .
    2423 0 1 31 690 0 0 0 0 0 1 0 1 0 ""  1 1 .
    2537 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    2539 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 .
    2588 1 1  9 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    2689 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 .
    2705 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    2771 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    2779 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    2859 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    2865 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    3039 0 1 31 690 0 1 0 0 0 0 1 1 0 "A" 2 1 .
    3118 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    3122 1 1 19 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    3204 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    3460 1 1 17 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    3505 0 1 31 690 0 0 0 0 0 0 1 1 0 ""  2 1 .
    3571 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    3578 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    3647 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    3833 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    3876 1 1  2 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    3935 1 1 15 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    3995 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    4088 0 1 31 690 0 0 0 0 0 1 0 1 0 ""  1 1 .
    4129 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    4219 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    4263 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    4279 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    4643 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    4669 0 1 31 690 0 1 0 0 0 1 0 0 1 "A" 1 2 .
    4691 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    4754 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    4934 1 1 18 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    4964 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    5079 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    5166 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    5231 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    5344 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    5408 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    5627 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    6065 0 1 31 690 0 0 0 1 0 1 0 1 0 "B" 1 1 .
    6138 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    6151 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    6261 1 1  7 690 0 0 0 1 0 1 0 0 1 "B" 1 2 .
    6714 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    6753 0 1 24 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    6788 1 1  6 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    6926 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    6935 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    6972 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
    7034 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    7116 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 .
    7127 0 1 31 690 0 0 0 1 0 1 0 0 1 "B" 1 2 .
    7159 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    7162 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    7183 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    7252 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    7384 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    7461 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    7505 1 1 25 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    7634 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    7646 1 1 31 690 0 1 0 0 0 1 0 1 0 "A" 1 1 .
    7866 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
    7886 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 .
    7906 1 1 12 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
    7953 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    8133 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    8230 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    8270 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
    end
    format %tm origin
    label values maxhadCOVID maxhadCOVID
    label def maxhadCOVID 0 "No confirmed COVID-19", modify
    label values agecat agecat
    label def agecat 1 "Age 65-79y", modify
    label def agecat 2 "Age 80+y", modify
    I tried to stset it. Next I want to split the spell at the point of COVID-19 infection, but not all individuals have COVID infections. So, when I try to stsplit i get the following error:

    Code:
    . stset failtime, id(grpatidtreat) enter(t0) failure(d)
    
    Survival-time data settings
    
               ID variable: grpatidtreat
             Failure event: d!=0 & d<.
    Observed time interval: (failtime[_n-1], failtime]
         Enter on or after: time t0
         Exit on or before: failure
    
    --------------------------------------------------------------------------
         49,963  total observations
          1,476  observations end on or before enter()
    --------------------------------------------------------------------------
         48,487  observations remaining, representing
         48,487  subjects
          6,997  failures in single-failure-per-subject data
      1,270,542  total analysis time at risk and under observation
                                                    At risk from t =         0
                                         Earliest observed entry t =         1
                                              Last observed exit t =        31
    
    . local split=timeCOVID
    
    . stsplit spell, at(`split')
    invalid numlist has missing values
    option at() must contain nonnegative values
    r(127);
    
    end of do-file
    
    r(127);
    To overcome this problem, I try to replace the timeCOVID variable with the failtime for individuals who don't have COVID-19, but then no new episodes are created, even though there are many instances where a COVID-19 infection is observed prior to failure (new chronic pain diagnosis), and where I expect new episodes to be created.

    Code:
    Code:
    . replace timeCOVID=_t if timeCOVID==.|(timeCOVID>failtime)
    (45,989 real changes made, 114 to missing)
    
    . local split=timeCOVID
    
    . stsplit spell, at(`split')
    (no new episodes generated)
    
    . 
    end of do-file
    
    . do "C:\Users\SUMEDH~1\AppData\Local\Temp\STD737c_000000.tmp"
    
    . dataex grpatidtreat d t0 failtime origin maxhadCOVID Asian White Black Hispanic age6579 age80plus Female Male ra
    > ce agecat sexcat timeCOVID _st _d _t _t0 spell 
    
    ----------------------- copy starting from the next line -----------------------
    
    
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(grpatidtreat d t0 failtime origin maxhadCOVID) byte(Asian White Black Hispanic age6579 age80plus Female Male) str1 race float(agecat sexcat timeCOVID) byte(_st _d _t _t0 spell)
     130 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
     194 0 1 18 690 0 0 1 0 0 1 0 0 1 "W" 1 2 18 1 0 18 1 0
     379 1 1 28 690 0 0 1 0 0 0 1 1 0 "W" 2 1 28 1 1 28 1 0
     598 1 1 20 690 0 0 1 0 0 0 1 0 1 "W" 2 2 20 1 1 20 1 0
     633 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
     686 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
     707 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
     836 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
     894 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
     900 1 1 25 690 0 0 1 0 0 1 0 0 1 "W" 1 2 25 1 1 25 1 0
     977 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    1027 0 1 31 690 0 1 0 0 0 1 0 1 0 "A" 1 1 31 1 0 31 1 0
    1028 1 1 10 690 0 0 1 0 0 0 1 1 0 "W" 2 1 10 1 1 10 1 0
    1116 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
    1224 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    1283 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 31 1 0 31 1 0
    1323 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    1423 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    1438 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    1599 1 1 13 690 0 0 1 0 0 0 1 0 1 "W" 2 2 13 1 1 13 1 0
    1639 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    1879 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    1903 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    1955 0 1  7 690 0 0 1 0 0 0 1 0 1 "W" 2 2  7 1 0  7 1 0
    1986 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    1988 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    2112 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    2159 0 1 31 690 0 0 0 1 0 1 0 0 1 "B" 1 2 31 1 0 31 1 0
    2307 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    2369 1 1 17 690 0 0 1 0 0 0 1 1 0 "W" 2 1 17 1 1 17 1 0
    2389 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 31 1 0 31 1 0
    2423 0 1 31 690 0 0 0 0 0 1 0 1 0 ""  1 1 31 1 0 31 1 0
    2537 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    2539 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 31 1 0 31 1 0
    2588 1 1  9 690 0 0 1 0 0 1 0 1 0 "W" 1 1  9 1 1  9 1 0
    2689 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 31 1 0 31 1 0
    2705 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    2771 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    2779 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    2859 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    2865 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    3039 0 1 31 690 0 1 0 0 0 0 1 1 0 "A" 2 1 31 1 0 31 1 0
    3118 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    3122 1 1 19 690 0 0 1 0 0 1 0 0 1 "W" 1 2 19 1 1 19 1 0
    3204 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    3460 1 1 17 690 0 0 1 0 0 1 0 1 0 "W" 1 1 17 1 1 17 1 0
    3505 0 1 31 690 0 0 0 0 0 0 1 1 0 ""  2 1 31 1 0 31 1 0
    3571 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    3578 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    3647 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
    3833 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    3876 1 1  2 690 0 0 1 0 0 0 1 0 1 "W" 2 2  2 1 1  2 1 0
    3935 1 1 15 690 0 0 1 0 0 0 1 1 0 "W" 2 1 15 1 1 15 1 0
    3995 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    4088 0 1 31 690 0 0 0 0 0 1 0 1 0 ""  1 1 31 1 0 31 1 0
    4129 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    4219 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    4263 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    4279 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    4643 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    4669 0 1 31 690 0 1 0 0 0 1 0 0 1 "A" 1 2 31 1 0 31 1 0
    4691 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    4754 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
    4934 1 1 18 690 0 0 1 0 0 0 1 1 0 "W" 2 1 18 1 1 18 1 0
    4964 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    5079 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
    5166 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    5231 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    5344 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    5408 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    5627 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    6065 0 1 31 690 0 0 0 1 0 1 0 1 0 "B" 1 1 31 1 0 31 1 0
    6138 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    6151 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    6261 1 1  7 690 0 0 0 1 0 1 0 0 1 "B" 1 2  7 1 1  7 1 0
    6714 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    6753 0 1 24 690 0 0 1 0 0 1 0 1 0 "W" 1 1 24 1 0 24 1 0
    6788 1 1  6 690 0 0 1 0 0 1 0 0 1 "W" 1 2  6 1 1  6 1 0
    6926 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    6935 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    6972 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
    7034 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
    7116 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 31 1 0 31 1 0
    7127 0 1 31 690 0 0 0 1 0 1 0 0 1 "B" 1 2 31 1 0 31 1 0
    7159 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    7162 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    7183 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    7252 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    7384 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    7461 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
    7505 1 1 25 690 0 0 1 0 0 1 0 1 0 "W" 1 1 25 1 1 25 1 0
    7634 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    7646 1 1 31 690 0 1 0 0 0 1 0 1 0 "A" 1 1 31 1 1 31 1 0
    7866 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
    7886 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 31 1 0 31 1 0
    7906 1 1 12 690 0 0 1 0 0 0 1 1 0 "W" 2 1 12 1 1 12 1 0
    7953 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    8133 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    8230 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    8270 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
    end
    format %tm origin
    label values maxhadCOVID maxhadCOVID
    label def maxhadCOVID 0 "No confirmed COVID-19", modify
    label values agecat agecat
    label def agecat 1 "Age 65-79y", modify
    label def agecat 2 "Age 80+y", modify
    ------------------ copy up to and including the previous line ------------------

    What am I doing wrong? Any help would be much appreciated.
    Gratefully,
    Sumedha




  • #2
    After replacing the missing values of timeCOVID in the way you describe, I think what you need to do with -stsplit- is
    Code:
    stsplit spell, at(0) after(timeCOVID)
    Actually, you don't even have to modify the missing values of timeCOVID first. -stsplit- will interpret the missing values appropriately.
    Last edited by Clyde Schechter; 19 Sep 2023, 22:14.

    Comment


    • #3
      Dear Prof. Schechter,
      As always, your advice was on point. Thank you so much.

      I do have some follow up questions though (and, of course, please let me know if it would be better to start a new post instead). Here is the dataex of a few individuals in my sample:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(grpatidtreat d t0 failtime origin maxhadCOVID) byte(Asian White Black Hispanic age6579 age80plus Female Male) float(agecat sexcat timeCOVID pain dead)
         2588 1 1  9 690 0 0 1 0 0 1 0 1 0 1 1  . 1 0
         3122 1 1 19 690 0 0 1 0 0 1 0 0 1 1 2  . 1 0
      2344926 0 1 31 714 0 0 1 0 0 1 0 0 1 1 2  . 0 0
      2346371 0 1 31 714 1 0 1 0 0 1 0 1 0 1 1 28 0 0
      3734179 0 1 17 714 1 0 1 0 0 0 1 1 0 2 1 16 0 1
      4972079 1 1 28 714 1 0 1 0 0 0 1 1 0 2 1 17 1 0
      end
      format %tm origin
      label values maxhadCOVID maxhadCOVID
      label def maxhadCOVID 0 "No confirmed COVID-19", modify
      label def maxhadCOVID 1 "Confirmed COVID-19", modify
      label values agecat agecat
      label def agecat 1 "Age 65-79y", modify
      label def agecat 2 "Age 80+y", modify
      Following your helpful advice, I stsplit and created indicators for 'time' and the patient having Covid in the second spell (created using stsplit) as follows:

      Code:
      . stsplit spell, at(0) after(timeCOVID)
      (2,274 observations (episodes) created)
      
      
      . gen time=spell+1
      (1,476 missing values generated)
      
      . 
      . gen C19=0
      
      . replace C19=1 if time==1
      (47,741 real changes made)
      
      . gen stime=_t /*=failtime*/
      (1,476 missing values generated)
      
      . 
      . dataex grpatidtreat d t0 failtime origin maxhadCOVID Asian White Black Hispanic age6579 age80plus Female Male ag
      > ecat sexcat timeCOVID pain dead time C19 if (grpatidtreat==2346371| ///
      > grpatidtreat==2344926|grpatidtreat==2588|grpatidtreat==3122 |grpatidtreat==3734179|grpatidtreat==4972079)
      
      ----------------------- copy starting from the next line -----------------------
      
      
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(grpatidtreat d t0 failtime origin maxhadCOVID) byte(Asian White Black Hispanic age6579 age80plus Female Male) float(agecat sexcat timeCOVID pain dead time C19)
         2588 1 1  9 690 0 0 1 0 0 1 0 1 0 1 1  . 1 0 1 1
         3122 1 1 19 690 0 0 1 0 0 1 0 0 1 1 2  . 1 0 1 1
      2344926 0 1 31 714 0 0 1 0 0 1 0 0 1 1 2  . 0 0 1 1
      2346371 . 1 28 714 1 0 1 0 0 1 0 1 0 1 1 28 0 0 0 0
      2346371 0 1 31 714 1 0 1 0 0 1 0 1 0 1 1 28 0 0 1 1
      3734179 . 1 16 714 1 0 1 0 0 0 1 1 0 2 1 16 0 1 0 0
      3734179 0 1 17 714 1 0 1 0 0 0 1 1 0 2 1 16 0 1 1 1
      4972079 . 1 17 714 1 0 1 0 0 0 1 1 0 2 1 17 1 0 0 0
      4972079 1 1 28 714 1 0 1 0 0 0 1 1 0 2 1 17 1 0 1 1
      end
      format %tm origin
      label values maxhadCOVID maxhadCOVID
      label def maxhadCOVID 0 "No confirmed COVID-19", modify
      label def maxhadCOVID 1 "Confirmed COVID-19", modify
      label values agecat agecat
      label def agecat 1 "Age 65-79y", modify
      label def agecat 2 "Age 80+y", modify
      ------------------ copy up to and including the previous line ------------------
      Now I run simple regressions, to compare outcome from streg and merlin as follows:

      Code:
      . streg Female age80plus time, distribution(weibull) nohr
      
               Failure _d: d
         Analysis time _t: failtime
        Enter on or after: time t0
              ID variable: grpatidtreat
      
      Fitting constant-only model:
      Iteration 0:  Log likelihood =  -26989.72
      Iteration 1:  Log likelihood = -26871.984
      Iteration 2:  Log likelihood = -26871.868
      Iteration 3:  Log likelihood = -26871.868
      
      Fitting full model:
      Iteration 0:  Log likelihood = -26871.868  
      Iteration 1:  Log likelihood = -25941.045  
      Iteration 2:  Log likelihood = -25669.942  
      Iteration 3:  Log likelihood = -25669.154  
      Iteration 4:  Log likelihood = -25669.154  
      
      Weibull PH regression
      
      No. of subjects =    48,485                            Number of obs =  50,759
      No. of failures =     6,997
      Time at risk    = 1,270,527
                                                             LR chi2(3)    = 2405.43
      Log likelihood = -25669.154                            Prob > chi2   =  0.0000
      
      ------------------------------------------------------------------------------
                _t | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            Female |   .1099982    .024407     4.51   0.000     .0621613     .157835
         age80plus |   1.163013    .023975    48.51   0.000     1.116023    1.210003
              time |  -.4915093   .0455772   -10.78   0.000     -.580839   -.4021796
             _cons |  -4.607462   .0669945   -68.77   0.000    -4.738769   -4.476155
      -------------+----------------------------------------------------------------
             /ln_p |  -.1977041   .0171623   -11.52   0.000    -.2313417   -.1640665
      -------------+----------------------------------------------------------------
                 p |   .8206126   .0140836                      .7934683    .8486856
               1/p |   1.218602   .0209141                      1.178293     1.26029
      ------------------------------------------------------------------------------
      
      
      
      . merlin  (failtime Female age80plus time, family(weibull, failure(d)) timevar(failtime)) 
      
      Fitting full model:
      
      Iteration 0:  Log likelihood =   -1319012  
      Iteration 1:  Log likelihood = -44116.307  
      Iteration 2:  Log likelihood =  -42621.37  
      Iteration 3:  Log likelihood = -41924.584  
      Iteration 4:  Log likelihood =  -41921.72  
      Iteration 5:  Log likelihood = -41921.719  
      
      Fixed effects regression model                          Number of obs = 48,485
      Log likelihood = -41921.719
      ------------------------------------------------------------------------------
                   | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      failtime:    |            
            Female |   .1177565   .0244106     4.82   0.000     .0699127    .1656003
         age80plus |   1.127935   .0240828    46.84   0.000     1.080734    1.175137
              time |  -1.974778    .045855   -43.07   0.000    -2.064652   -1.884904
             _cons |  -3.907423   .0599555   -65.17   0.000    -4.024933   -3.789912
        log(gamma) |   .0263636   .0113976     2.31   0.021     .0040248    .0487024
      ------------------------------------------------------------------------------
      My first question is why are these slightly different? It seems it could be because of log(gamma), but not sure what exactly that is...

      Second, when I try to extend the above model by including the indicator for the patient having COVID-19 (C19 created above), the model breaks down (just showing a few lines here but I let it run for longer, but it never converges):

      Code:
      . merlin  (failtime Female age80plus time C19, family(weibull, failure(d)) timevar(failtime)) 
      
      Fitting full model:
      
      Iteration 0:  Log likelihood =   -1319012  (not concave)
      Iteration 1:  Log likelihood = -46381.473  (not concave)
      Iteration 2:  Log likelihood = -43490.659  (not concave)
      Iteration 3:  Log likelihood = -42887.386  (not concave)
      Iteration 4:  Log likelihood = -42564.573  (not concave)
      Iteration 5:  Log likelihood = -42418.566  (not concave)
      Iteration 6:  Log likelihood = -42332.731  (not concave)
      Iteration 7:  Log likelihood = -42262.363  (not concave)
      Iteration 8:  Log likelihood = -42198.327  (not concave)
      Iteration 9:  Log likelihood = -42154.329  (not concave)
      Iteration 10: Log likelihood = -42118.697  (not concave)
      --Break--
      r(1);
      I tried to check if streg would give me any clues, and it seems it could be because C19 dummy gets omitted (may be because its perfectly collinear with time, as second wave only gets created when C19==1)?

      Code:
      . streg Female age80plus time C19, distribution(weibull) nohr
      
               Failure _d: d
         Analysis time _t: failtime
        Enter on or after: time t0
              ID variable: grpatidtreat
      note: C19 omitted because of collinearity.
      
      Fitting constant-only model:
      Iteration 0:  Log likelihood =  -26989.72
      Iteration 1:  Log likelihood = -26871.984
      Iteration 2:  Log likelihood = -26871.868
      Iteration 3:  Log likelihood = -26871.868
      
      Fitting full model:
      Iteration 0:  Log likelihood = -26871.868  
      Iteration 1:  Log likelihood = -25941.045  
      Iteration 2:  Log likelihood = -25669.942  
      Iteration 3:  Log likelihood = -25669.154  
      Iteration 4:  Log likelihood = -25669.154  
      
      Weibull PH regression
      
      No. of subjects =    48,485                            Number of obs =  50,759
      No. of failures =     6,997
      Time at risk    = 1,270,527
                                                             LR chi2(3)    = 2405.43
      Log likelihood = -25669.154                            Prob > chi2   =  0.0000
      
      ------------------------------------------------------------------------------
                _t | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            Female |   .1099982    .024407     4.51   0.000     .0621613     .157835
         age80plus |   1.163013    .023975    48.51   0.000     1.116023    1.210003
              time |  -.4915093   .0455772   -10.78   0.000     -.580839   -.4021796
               C19 |          0  (omitted)
             _cons |  -4.607462   .0669945   -68.77   0.000    -4.738769   -4.476155
      -------------+----------------------------------------------------------------
             /ln_p |  -.1977041   .0171623   -11.52   0.000    -.2313417   -.1640665
      -------------+----------------------------------------------------------------
                 p |   .8206126   .0140836                      .7934683    .8486856
               1/p |   1.218602   .0209141                      1.178293     1.26029
      ------------------------------------------------------------------------------
      
      . 
      end of do-file
      
      .
      I also tried omitting time to check if it makes a difference, and it did (C19 is no longer omitted):

      Code:
      . streg Female age80plus C19, distribution(weibull) nohr
      
               Failure _d: d
         Analysis time _t: failtime
        Enter on or after: time t0
              ID variable: grpatidtreat
      
      Fitting constant-only model:
      Iteration 0:  Log likelihood =  -26989.72
      Iteration 1:  Log likelihood = -26871.984
      Iteration 2:  Log likelihood = -26871.868
      Iteration 3:  Log likelihood = -26871.868
      
      Fitting full model:
      Iteration 0:  Log likelihood = -26871.868  
      Iteration 1:  Log likelihood = -25941.045  
      Iteration 2:  Log likelihood = -25669.942  
      Iteration 3:  Log likelihood = -25669.154  
      Iteration 4:  Log likelihood = -25669.154  
      
      Weibull PH regression
      
      No. of subjects =    48,485                            Number of obs =  50,759
      No. of failures =     6,997
      Time at risk    = 1,270,527
                                                             LR chi2(3)    = 2405.43
      Log likelihood = -25669.154                            Prob > chi2   =  0.0000
      
      ------------------------------------------------------------------------------
                _t | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            Female |   .1099982    .024407     4.51   0.000     .0621613     .157835
         age80plus |   1.163013    .023975    48.51   0.000     1.116023    1.210003
               C19 |  -.4915093   .0455772   -10.78   0.000     -.580839   -.4021796
             _cons |  -4.607462   .0669945   -68.77   0.000    -4.738769   -4.476155
      -------------+----------------------------------------------------------------
             /ln_p |  -.1977041   .0171623   -11.52   0.000    -.2313417   -.1640665
      -------------+----------------------------------------------------------------
                 p |   .8206126   .0140836                      .7934683    .8486856
               1/p |   1.218602   .0209141                      1.178293     1.26029
      ------------------------------------------------------------------------------
      
      . 
      end of do-file
      But I could not replicate this with merlin:

      Code:
      . merlin  (failtime Female age80plus C19, family(weibull, failure(d)) timevar(failtime)) 
      
      Fitting full model:
      
      initial values not feasible
      
      -> Starting values failed - trying zero vector
      
      initial values not feasible
      r(1400);
      
      end of do-file
      
      r(1400);
      Why might the outcome be now different for streg vs merlin?

      Sorry for such a long post, but I wanted to provide code and results to explain every step. Many thanks in advance for all your guidance.
      Gratefully,
      Sumedha

      Comment


      • #4
        -merlin- is a user written command, and my familiarity with it is limited to knowing that it estimates parametric multi-level models. I can imagine that -merlin- and -streg- use different algorithms to maximize the likelihood. But, probably more important, I notice that the sample sizes shown are different. -streg- has an estimation sample with N = 50,759, whereas for -merlin- it was 48,485. I wonder if perhaps -merlin- is omitting the censored observations from the analysis. If the number of censored observations with -streg- is 50759-48485, that is unlikely to be a coincidence. If it's not that, then I don't have any good ideas about what may be going wrong--but I think that before you pursue any other ideas you must figure out why -merlin- is not using the full -streg- estimation sample: you cannot expect the results to match when the samples are different.

        You are correct in noting that your C19 variable is colinear with time, and you cannot have both in the model together. When you put them together, -streg-, follows the process of all official Stata estimation commands, and checks for colinear predictor variables before proceeding, and, on finding them, eliminates as many as necessary to break the colinearity. I gather that -merlin- does not check first. What happens to -merlin- then is a doomed attempt to maximize a likelihood that, by virtue of the colinear variables, is not concave and has no unique maximum, leading to the endless iteration you see. It is precisely to avoid this that official Stata commands check for colinearity first.

        If you are trying to explore whether and to what extent the effect on failure time of some other variable in the model differs between the covid era and the pre-covid era, you should do that by including the interaction of C19 with that variable, but not C19 itself.

        Hope this helps.

        Comment


        • #5
          Thank you, Prof. Schechter. I agree with you that probably -merlin- handles censoring differently, very helpful. I am trying to figure this out...and will post back once I know for others too... In the meantime, in case you have any suggestions on alternative official Stata commands to jointly model a longitudinal binary endogenous regressor and a time-to-event I would love to learn about it.

          Gratefully,
          Sumedha

          Comment


          • #6
            In the meantime, in case you have any suggestions on alternative official Stata commands to jointly model a longitudinal binary endogenous regressor and a time-to-event I would love to learn about it.
            I wish I did, but I don't know of any. I don't know of any other user-written commands that can do this either. I'm eager to see what you ultimately come up with using -merlin- so I can learn how to do this.

            Comment


            • #7
              Will do Prof. Schechter.

              Continuing to dig deeper into how -merlin- is handling the endogeneity, I am wondering if the control function approach for binary endogenous regressor (as Prof. Jeff Wooldridge suggests in #12 in ivpoisson with panel-data fixed effects - Statalist) can be implemented for survival outcome data? The binary endogenous regressor and the time-to-event outcome variable seems to create several problems but the above post suggests it can be implemented but not how...

              Comment

              Working...
              X