Stlist not creating new episodes.

Sumedha Gupta

Join Date: May 2016
Posts: 289

Stlist not creating new episodes.

19 Sep 2023, 21:05

Dear All,

I am trying to study how COVID-19 infections accelerates onset of chronic pain conditions. I am new to survival analysis in Stata and struggling with the stsplit command. Here is the dataex of my data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(grpatidtreat d t0 failtime origin maxhadCOVID) byte(Asian White Black Hispanic age6579 age80plus Female Male) str1 race float(agecat sexcat timeCOVID)
 130 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
 194 0 1 18 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
 379 1 1 28 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
 598 1 1 20 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
 633 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
 686 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
 707 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
 836 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
 894 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
 900 1 1 25 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
 977 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
1027 0 1 31 690 0 1 0 0 0 1 0 1 0 "A" 1 1 .
1028 1 1 10 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
1116 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
1224 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
1283 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 .
1323 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
1423 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
1438 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
1599 1 1 13 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
1639 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
1879 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
1903 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
1955 0 1  7 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
1986 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
1988 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
2112 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
2159 0 1 31 690 0 0 0 1 0 1 0 0 1 "B" 1 2 .
2307 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
2369 1 1 17 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
2389 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 .
2423 0 1 31 690 0 0 0 0 0 1 0 1 0 ""  1 1 .
2537 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
2539 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 .
2588 1 1  9 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
2689 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 .
2705 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
2771 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
2779 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
2859 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
2865 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
3039 0 1 31 690 0 1 0 0 0 0 1 1 0 "A" 2 1 .
3118 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
3122 1 1 19 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
3204 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
3460 1 1 17 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
3505 0 1 31 690 0 0 0 0 0 0 1 1 0 ""  2 1 .
3571 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
3578 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
3647 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
3833 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
3876 1 1  2 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
3935 1 1 15 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
3995 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
4088 0 1 31 690 0 0 0 0 0 1 0 1 0 ""  1 1 .
4129 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
4219 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
4263 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
4279 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
4643 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
4669 0 1 31 690 0 1 0 0 0 1 0 0 1 "A" 1 2 .
4691 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
4754 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
4934 1 1 18 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
4964 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
5079 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
5166 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
5231 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
5344 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
5408 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
5627 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
6065 0 1 31 690 0 0 0 1 0 1 0 1 0 "B" 1 1 .
6138 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
6151 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
6261 1 1  7 690 0 0 0 1 0 1 0 0 1 "B" 1 2 .
6714 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
6753 0 1 24 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
6788 1 1  6 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
6926 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
6935 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
6972 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 .
7034 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
7116 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 .
7127 0 1 31 690 0 0 0 1 0 1 0 0 1 "B" 1 2 .
7159 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
7162 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
7183 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
7252 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
7384 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
7461 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
7505 1 1 25 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
7634 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
7646 1 1 31 690 0 1 0 0 0 1 0 1 0 "A" 1 1 .
7866 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 .
7886 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 .
7906 1 1 12 690 0 0 1 0 0 0 1 1 0 "W" 2 1 .
7953 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
8133 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
8230 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
8270 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 .
end
format %tm origin
label values maxhadCOVID maxhadCOVID
label def maxhadCOVID 0 "No confirmed COVID-19", modify
label values agecat agecat
label def agecat 1 "Age 65-79y", modify
label def agecat 2 "Age 80+y", modify

I tried to stset it. Next I want to split the spell at the point of COVID-19 infection, but not all individuals have COVID infections. So, when I try to stsplit i get the following error:

Code:

. stset failtime, id(grpatidtreat) enter(t0) failure(d)

Survival-time data settings

           ID variable: grpatidtreat
         Failure event: d!=0 & d<.
Observed time interval: (failtime[_n-1], failtime]
     Enter on or after: time t0
     Exit on or before: failure

--------------------------------------------------------------------------
     49,963  total observations
      1,476  observations end on or before enter()
--------------------------------------------------------------------------
     48,487  observations remaining, representing
     48,487  subjects
      6,997  failures in single-failure-per-subject data
  1,270,542  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         1
                                          Last observed exit t =        31

. local split=timeCOVID

. stsplit spell, at(`split')
invalid numlist has missing values
option at() must contain nonnegative values
r(127);

end of do-file

r(127);

To overcome this problem, I try to replace the timeCOVID variable with the failtime for individuals who don't have COVID-19, but then no new episodes are created, even though there are many instances where a COVID-19 infection is observed prior to failure (new chronic pain diagnosis), and where I expect new episodes to be created.

Code:


	Code:
	. replace timeCOVID=_t if timeCOVID==.|(timeCOVID>failtime)
(45,989 real changes made, 114 to missing)

. local split=timeCOVID

. stsplit spell, at(`split')
(no new episodes generated)

. 
end of do-file

. do "C:\Users\SUMEDH~1\AppData\Local\Temp\STD737c_000000.tmp"

. dataex grpatidtreat d t0 failtime origin maxhadCOVID Asian White Black Hispanic age6579 age80plus Female Male ra
> ce agecat sexcat timeCOVID _st _d _t _t0 spell 

----------------------- copy starting from the next line -----------------------


	Code:
	* Example generated by -dataex-. For more info, type help dataex
clear
input float(grpatidtreat d t0 failtime origin maxhadCOVID) byte(Asian White Black Hispanic age6579 age80plus Female Male) str1 race float(agecat sexcat timeCOVID) byte(_st _d _t _t0 spell)
 130 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
 194 0 1 18 690 0 0 1 0 0 1 0 0 1 "W" 1 2 18 1 0 18 1 0
 379 1 1 28 690 0 0 1 0 0 0 1 1 0 "W" 2 1 28 1 1 28 1 0
 598 1 1 20 690 0 0 1 0 0 0 1 0 1 "W" 2 2 20 1 1 20 1 0
 633 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
 686 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
 707 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
 836 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
 894 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
 900 1 1 25 690 0 0 1 0 0 1 0 0 1 "W" 1 2 25 1 1 25 1 0
 977 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
1027 0 1 31 690 0 1 0 0 0 1 0 1 0 "A" 1 1 31 1 0 31 1 0
1028 1 1 10 690 0 0 1 0 0 0 1 1 0 "W" 2 1 10 1 1 10 1 0
1116 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
1224 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
1283 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 31 1 0 31 1 0
1323 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
1423 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
1438 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
1599 1 1 13 690 0 0 1 0 0 0 1 0 1 "W" 2 2 13 1 1 13 1 0
1639 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
1879 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
1903 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
1955 0 1  7 690 0 0 1 0 0 0 1 0 1 "W" 2 2  7 1 0  7 1 0
1986 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
1988 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
2112 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
2159 0 1 31 690 0 0 0 1 0 1 0 0 1 "B" 1 2 31 1 0 31 1 0
2307 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
2369 1 1 17 690 0 0 1 0 0 0 1 1 0 "W" 2 1 17 1 1 17 1 0
2389 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 31 1 0 31 1 0
2423 0 1 31 690 0 0 0 0 0 1 0 1 0 ""  1 1 31 1 0 31 1 0
2537 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
2539 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 31 1 0 31 1 0
2588 1 1  9 690 0 0 1 0 0 1 0 1 0 "W" 1 1  9 1 1  9 1 0
2689 0 1 31 690 0 0 0 0 1 1 0 0 1 "H" 1 2 31 1 0 31 1 0
2705 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
2771 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
2779 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
2859 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
2865 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
3039 0 1 31 690 0 1 0 0 0 0 1 1 0 "A" 2 1 31 1 0 31 1 0
3118 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
3122 1 1 19 690 0 0 1 0 0 1 0 0 1 "W" 1 2 19 1 1 19 1 0
3204 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
3460 1 1 17 690 0 0 1 0 0 1 0 1 0 "W" 1 1 17 1 1 17 1 0
3505 0 1 31 690 0 0 0 0 0 0 1 1 0 ""  2 1 31 1 0 31 1 0
3571 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
3578 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
3647 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
3833 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
3876 1 1  2 690 0 0 1 0 0 0 1 0 1 "W" 2 2  2 1 1  2 1 0
3935 1 1 15 690 0 0 1 0 0 0 1 1 0 "W" 2 1 15 1 1 15 1 0
3995 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
4088 0 1 31 690 0 0 0 0 0 1 0 1 0 ""  1 1 31 1 0 31 1 0
4129 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
4219 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
4263 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
4279 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
4643 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
4669 0 1 31 690 0 1 0 0 0 1 0 0 1 "A" 1 2 31 1 0 31 1 0
4691 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
4754 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
4934 1 1 18 690 0 0 1 0 0 0 1 1 0 "W" 2 1 18 1 1 18 1 0
4964 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
5079 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
5166 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
5231 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
5344 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
5408 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
5627 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
6065 0 1 31 690 0 0 0 1 0 1 0 1 0 "B" 1 1 31 1 0 31 1 0
6138 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
6151 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
6261 1 1  7 690 0 0 0 1 0 1 0 0 1 "B" 1 2  7 1 1  7 1 0
6714 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
6753 0 1 24 690 0 0 1 0 0 1 0 1 0 "W" 1 1 24 1 0 24 1 0
6788 1 1  6 690 0 0 1 0 0 1 0 0 1 "W" 1 2  6 1 1  6 1 0
6926 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
6935 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
6972 0 1 31 690 0 0 1 0 0 0 1 0 1 "W" 2 2 31 1 0 31 1 0
7034 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
7116 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 31 1 0 31 1 0
7127 0 1 31 690 0 0 0 1 0 1 0 0 1 "B" 1 2 31 1 0 31 1 0
7159 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
7162 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
7183 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
7252 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
7384 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
7461 0 1 31 690 0 0 1 0 0 0 1 1 0 "W" 2 1 31 1 0 31 1 0
7505 1 1 25 690 0 0 1 0 0 1 0 1 0 "W" 1 1 25 1 1 25 1 0
7634 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
7646 1 1 31 690 0 1 0 0 0 1 0 1 0 "A" 1 1 31 1 1 31 1 0
7866 0 1 31 690 0 0 1 0 0 1 0 0 1 "W" 1 2 31 1 0 31 1 0
7886 0 1 31 690 0 0 0 0 1 1 0 1 0 "H" 1 1 31 1 0 31 1 0
7906 1 1 12 690 0 0 1 0 0 0 1 1 0 "W" 2 1 12 1 1 12 1 0
7953 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
8133 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
8230 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
8270 0 1 31 690 0 0 1 0 0 1 0 1 0 "W" 1 1 31 1 0 31 1 0
end
format %tm origin
label values maxhadCOVID maxhadCOVID
label def maxhadCOVID 0 "No confirmed COVID-19", modify
label values agecat agecat
label def agecat 1 "Age 65-79y", modify
label def agecat 2 "Age 80+y", modify
------------------ copy up to and including the previous line ------------------

What am I doing wrong? Any help would be much appreciated.
Gratefully,
Sumedha

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#2

19 Sep 2023, 22:12

After replacing the missing values of timeCOVID in the way you describe, I think what you need to do with -stsplit- is

Code:

stsplit spell, at(0) after(timeCOVID)

Actually, you don't even have to modify the missing values of timeCOVID first. -stsplit- will interpret the missing values appropriately.

Last edited by Clyde Schechter; 19 Sep 2023, 22:14.
Comment

Sumedha Gupta

Join Date: May 2016
Posts: 289

20 Sep 2023, 09:17

Dear Prof. Schechter,
As always, your advice was on point. Thank you so much.

I do have some follow up questions though (and, of course, please let me know if it would be better to start a new post instead). Here is the dataex of a few individuals in my sample:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(grpatidtreat d t0 failtime origin maxhadCOVID) byte(Asian White Black Hispanic age6579 age80plus Female Male) float(agecat sexcat timeCOVID pain dead)
   2588 1 1  9 690 0 0 1 0 0 1 0 1 0 1 1  . 1 0
   3122 1 1 19 690 0 0 1 0 0 1 0 0 1 1 2  . 1 0
2344926 0 1 31 714 0 0 1 0 0 1 0 0 1 1 2  . 0 0
2346371 0 1 31 714 1 0 1 0 0 1 0 1 0 1 1 28 0 0
3734179 0 1 17 714 1 0 1 0 0 0 1 1 0 2 1 16 0 1
4972079 1 1 28 714 1 0 1 0 0 0 1 1 0 2 1 17 1 0
end
format %tm origin
label values maxhadCOVID maxhadCOVID
label def maxhadCOVID 0 "No confirmed COVID-19", modify
label def maxhadCOVID 1 "Confirmed COVID-19", modify
label values agecat agecat
label def agecat 1 "Age 65-79y", modify
label def agecat 2 "Age 80+y", modify

Following your helpful advice, I stsplit and created indicators for 'time' and the patient having Covid in the second spell (created using stsplit) as follows:

Code:

. stsplit spell, at(0) after(timeCOVID)
(2,274 observations (episodes) created)


. gen time=spell+1
(1,476 missing values generated)

. 
. gen C19=0

. replace C19=1 if time==1
(47,741 real changes made)

. gen stime=_t /*=failtime*/
(1,476 missing values generated)

. 
. dataex grpatidtreat d t0 failtime origin maxhadCOVID Asian White Black Hispanic age6579 age80plus Female Male ag
> ecat sexcat timeCOVID pain dead time C19 if (grpatidtreat==2346371| ///
> grpatidtreat==2344926|grpatidtreat==2588|grpatidtreat==3122 |grpatidtreat==3734179|grpatidtreat==4972079)

----------------------- copy starting from the next line -----------------------


	Code:
	* Example generated by -dataex-. For more info, type help dataex
clear
input float(grpatidtreat d t0 failtime origin maxhadCOVID) byte(Asian White Black Hispanic age6579 age80plus Female Male) float(agecat sexcat timeCOVID pain dead time C19)
   2588 1 1  9 690 0 0 1 0 0 1 0 1 0 1 1  . 1 0 1 1
   3122 1 1 19 690 0 0 1 0 0 1 0 0 1 1 2  . 1 0 1 1
2344926 0 1 31 714 0 0 1 0 0 1 0 0 1 1 2  . 0 0 1 1
2346371 . 1 28 714 1 0 1 0 0 1 0 1 0 1 1 28 0 0 0 0
2346371 0 1 31 714 1 0 1 0 0 1 0 1 0 1 1 28 0 0 1 1
3734179 . 1 16 714 1 0 1 0 0 0 1 1 0 2 1 16 0 1 0 0
3734179 0 1 17 714 1 0 1 0 0 0 1 1 0 2 1 16 0 1 1 1
4972079 . 1 17 714 1 0 1 0 0 0 1 1 0 2 1 17 1 0 0 0
4972079 1 1 28 714 1 0 1 0 0 0 1 1 0 2 1 17 1 0 1 1
end
format %tm origin
label values maxhadCOVID maxhadCOVID
label def maxhadCOVID 0 "No confirmed COVID-19", modify
label def maxhadCOVID 1 "Confirmed COVID-19", modify
label values agecat agecat
label def agecat 1 "Age 65-79y", modify
label def agecat 2 "Age 80+y", modify
------------------ copy up to and including the previous line ------------------

Now I run simple regressions, to compare outcome from streg and merlin as follows:

Code:

. streg Female age80plus time, distribution(weibull) nohr

         Failure _d: d
   Analysis time _t: failtime
  Enter on or after: time t0
        ID variable: grpatidtreat

Fitting constant-only model:
Iteration 0:  Log likelihood =  -26989.72
Iteration 1:  Log likelihood = -26871.984
Iteration 2:  Log likelihood = -26871.868
Iteration 3:  Log likelihood = -26871.868

Fitting full model:
Iteration 0:  Log likelihood = -26871.868  
Iteration 1:  Log likelihood = -25941.045  
Iteration 2:  Log likelihood = -25669.942  
Iteration 3:  Log likelihood = -25669.154  
Iteration 4:  Log likelihood = -25669.154  

Weibull PH regression

No. of subjects =    48,485                            Number of obs =  50,759
No. of failures =     6,997
Time at risk    = 1,270,527
                                                       LR chi2(3)    = 2405.43
Log likelihood = -25669.154                            Prob > chi2   =  0.0000

------------------------------------------------------------------------------
          _t | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      Female |   .1099982    .024407     4.51   0.000     .0621613     .157835
   age80plus |   1.163013    .023975    48.51   0.000     1.116023    1.210003
        time |  -.4915093   .0455772   -10.78   0.000     -.580839   -.4021796
       _cons |  -4.607462   .0669945   -68.77   0.000    -4.738769   -4.476155
-------------+----------------------------------------------------------------
       /ln_p |  -.1977041   .0171623   -11.52   0.000    -.2313417   -.1640665
-------------+----------------------------------------------------------------
           p |   .8206126   .0140836                      .7934683    .8486856
         1/p |   1.218602   .0209141                      1.178293     1.26029
------------------------------------------------------------------------------



. merlin  (failtime Female age80plus time, family(weibull, failure(d)) timevar(failtime)) 

Fitting full model:

Iteration 0:  Log likelihood =   -1319012  
Iteration 1:  Log likelihood = -44116.307  
Iteration 2:  Log likelihood =  -42621.37  
Iteration 3:  Log likelihood = -41924.584  
Iteration 4:  Log likelihood =  -41921.72  
Iteration 5:  Log likelihood = -41921.719  

Fixed effects regression model                          Number of obs = 48,485
Log likelihood = -41921.719
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
failtime:    |            
      Female |   .1177565   .0244106     4.82   0.000     .0699127    .1656003
   age80plus |   1.127935   .0240828    46.84   0.000     1.080734    1.175137
        time |  -1.974778    .045855   -43.07   0.000    -2.064652   -1.884904
       _cons |  -3.907423   .0599555   -65.17   0.000    -4.024933   -3.789912
  log(gamma) |   .0263636   .0113976     2.31   0.021     .0040248    .0487024
------------------------------------------------------------------------------

My first question is why are these slightly different? It seems it could be because of log(gamma), but not sure what exactly that is...

Second, when I try to extend the above model by including the indicator for the patient having COVID-19 (C19 created above), the model breaks down (just showing a few lines here but I let it run for longer, but it never converges):

Code:

. merlin  (failtime Female age80plus time C19, family(weibull, failure(d)) timevar(failtime)) 

Fitting full model:

Iteration 0:  Log likelihood =   -1319012  (not concave)
Iteration 1:  Log likelihood = -46381.473  (not concave)
Iteration 2:  Log likelihood = -43490.659  (not concave)
Iteration 3:  Log likelihood = -42887.386  (not concave)
Iteration 4:  Log likelihood = -42564.573  (not concave)
Iteration 5:  Log likelihood = -42418.566  (not concave)
Iteration 6:  Log likelihood = -42332.731  (not concave)
Iteration 7:  Log likelihood = -42262.363  (not concave)
Iteration 8:  Log likelihood = -42198.327  (not concave)
Iteration 9:  Log likelihood = -42154.329  (not concave)
Iteration 10: Log likelihood = -42118.697  (not concave)
--Break--
r(1);

I tried to check if streg would give me any clues, and it seems it could be because C19 dummy gets omitted (may be because its perfectly collinear with time, as second wave only gets created when C19==1)?

Code:

. streg Female age80plus time C19, distribution(weibull) nohr

         Failure _d: d
   Analysis time _t: failtime
  Enter on or after: time t0
        ID variable: grpatidtreat
note: C19 omitted because of collinearity.

Fitting constant-only model:
Iteration 0:  Log likelihood =  -26989.72
Iteration 1:  Log likelihood = -26871.984
Iteration 2:  Log likelihood = -26871.868
Iteration 3:  Log likelihood = -26871.868

Fitting full model:
Iteration 0:  Log likelihood = -26871.868  
Iteration 1:  Log likelihood = -25941.045  
Iteration 2:  Log likelihood = -25669.942  
Iteration 3:  Log likelihood = -25669.154  
Iteration 4:  Log likelihood = -25669.154  

Weibull PH regression

No. of subjects =    48,485                            Number of obs =  50,759
No. of failures =     6,997
Time at risk    = 1,270,527
                                                       LR chi2(3)    = 2405.43
Log likelihood = -25669.154                            Prob > chi2   =  0.0000

------------------------------------------------------------------------------
          _t | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      Female |   .1099982    .024407     4.51   0.000     .0621613     .157835
   age80plus |   1.163013    .023975    48.51   0.000     1.116023    1.210003
        time |  -.4915093   .0455772   -10.78   0.000     -.580839   -.4021796
         C19 |          0  (omitted)
       _cons |  -4.607462   .0669945   -68.77   0.000    -4.738769   -4.476155
-------------+----------------------------------------------------------------
       /ln_p |  -.1977041   .0171623   -11.52   0.000    -.2313417   -.1640665
-------------+----------------------------------------------------------------
           p |   .8206126   .0140836                      .7934683    .8486856
         1/p |   1.218602   .0209141                      1.178293     1.26029
------------------------------------------------------------------------------

. 
end of do-file

.

I also tried omitting time to check if it makes a difference, and it did (C19 is no longer omitted):

Code:

. streg Female age80plus C19, distribution(weibull) nohr

         Failure _d: d
   Analysis time _t: failtime
  Enter on or after: time t0
        ID variable: grpatidtreat

Fitting constant-only model:
Iteration 0:  Log likelihood =  -26989.72
Iteration 1:  Log likelihood = -26871.984
Iteration 2:  Log likelihood = -26871.868
Iteration 3:  Log likelihood = -26871.868

Fitting full model:
Iteration 0:  Log likelihood = -26871.868  
Iteration 1:  Log likelihood = -25941.045  
Iteration 2:  Log likelihood = -25669.942  
Iteration 3:  Log likelihood = -25669.154  
Iteration 4:  Log likelihood = -25669.154  

Weibull PH regression

No. of subjects =    48,485                            Number of obs =  50,759
No. of failures =     6,997
Time at risk    = 1,270,527
                                                       LR chi2(3)    = 2405.43
Log likelihood = -25669.154                            Prob > chi2   =  0.0000

------------------------------------------------------------------------------
          _t | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      Female |   .1099982    .024407     4.51   0.000     .0621613     .157835
   age80plus |   1.163013    .023975    48.51   0.000     1.116023    1.210003
         C19 |  -.4915093   .0455772   -10.78   0.000     -.580839   -.4021796
       _cons |  -4.607462   .0669945   -68.77   0.000    -4.738769   -4.476155
-------------+----------------------------------------------------------------
       /ln_p |  -.1977041   .0171623   -11.52   0.000    -.2313417   -.1640665
-------------+----------------------------------------------------------------
           p |   .8206126   .0140836                      .7934683    .8486856
         1/p |   1.218602   .0209141                      1.178293     1.26029
------------------------------------------------------------------------------

. 
end of do-file

But I could not replicate this with merlin:

Code:

. merlin  (failtime Female age80plus C19, family(weibull, failure(d)) timevar(failtime)) 

Fitting full model:

initial values not feasible

-> Starting values failed - trying zero vector

initial values not feasible
r(1400);

end of do-file

r(1400);

Why might the outcome be now different for streg vs merlin?

Sorry for such a long post, but I wanted to provide code and results to explain every step. Many thanks in advance for all your guidance.
Gratefully,
Sumedha

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#4

20 Sep 2023, 09:45

-merlin- is a user written command, and my familiarity with it is limited to knowing that it estimates parametric multi-level models. I can imagine that -merlin- and -streg- use different algorithms to maximize the likelihood. But, probably more important, I notice that the sample sizes shown are different. -streg- has an estimation sample with N = 50,759, whereas for -merlin- it was 48,485. I wonder if perhaps -merlin- is omitting the censored observations from the analysis. If the number of censored observations with -streg- is 50759-48485, that is unlikely to be a coincidence. If it's not that, then I don't have any good ideas about what may be going wrong--but I think that before you pursue any other ideas you must figure out why -merlin- is not using the full -streg- estimation sample: you cannot expect the results to match when the samples are different.

You are correct in noting that your C19 variable is colinear with time, and you cannot have both in the model together. When you put them together, -streg-, follows the process of all official Stata estimation commands, and checks for colinear predictor variables before proceeding, and, on finding them, eliminates as many as necessary to break the colinearity. I gather that -merlin- does not check first. What happens to -merlin- then is a doomed attempt to maximize a likelihood that, by virtue of the colinear variables, is not concave and has no unique maximum, leading to the endless iteration you see. It is precisely to avoid this that official Stata commands check for colinearity first.

If you are trying to explore whether and to what extent the effect on failure time of some other variable in the model differs between the covid era and the pre-covid era, you should do that by including the interaction of C19 with that variable, but not C19 itself.

Hope this helps.
Comment
Sumedha Gupta

Join Date: May 2016

Posts: 289
#5

20 Sep 2023, 20:57

Thank you, Prof. Schechter. I agree with you that probably -merlin- handles censoring differently, very helpful. I am trying to figure this out...and will post back once I know for others too... In the meantime, in case you have any suggestions on alternative official Stata commands to jointly model a longitudinal binary endogenous regressor and a time-to-event I would love to learn about it.

Gratefully,
Sumedha
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#6

20 Sep 2023, 21:58

In the meantime, in case you have any suggestions on alternative official Stata commands to jointly model a longitudinal binary endogenous regressor and a time-to-event I would love to learn about it.

I wish I did, but I don't know of any. I don't know of any other user-written commands that can do this either. I'm eager to see what you ultimately come up with using -merlin- so I can learn how to do this.
Comment
Sumedha Gupta

Join Date: May 2016

Posts: 289
#7

21 Sep 2023, 07:00

Will do Prof. Schechter.

Continuing to dig deeper into how -merlin- is handling the endogeneity, I am wondering if the control function approach for binary endogenous regressor (as Prof. Jeff Wooldridge suggests in #12 in ivpoisson with panel-data fixed effects - Statalist) can be implemented for survival outcome data? The binary endogenous regressor and the time-to-event outcome variable seems to create several problems but the above post suggests it can be implemented but not how...
Comment

Announcement

Stlist not creating new episodes.

Comment

Comment

Comment

Comment

Comment

Comment