Dear All
I simulated data from a joint longitudinal and survival model (random intercept and slope model for the longitudinal outcome and an exponential survival model for the time to event outcome.
You can find the code below. I assumed in the simulation that the longitudinal and survival models are not associated. I tried to use gsem to obtain the estimates of these models. Theoretically, the estimates of the gsem should be very similar or identical to the separate mixed model and exponential survival model. Although the estimates of the mixed model match almost exactly the gsem command, the survival model estimates differ. I wonder whether I have done something wrong related with the data format??
Please could you help me?
Also please could you clarify what @0 syntax in gsem means?
Thanks
This is the output fromthe exponential survival model
This is the output from the random intercept and slope model
This is the output from the joint survival longitudinal model using gsem
the log hazard ratio for the survival model in gsem is -.937202 which is a bit different from the streg command which is -.8181656. I assumed independence between the longitudinal and survival models so I don't understand why there is this discrepancy. I tried to simulate a larger sample size number and I still get a discrepancy.
Please could you let me know what I am doing wrong?
Thanks
A
I simulated data from a joint longitudinal and survival model (random intercept and slope model for the longitudinal outcome and an exponential survival model for the time to event outcome.
You can find the code below. I assumed in the simulation that the longitudinal and survival models are not associated. I tried to use gsem to obtain the estimates of these models. Theoretically, the estimates of the gsem should be very similar or identical to the separate mixed model and exponential survival model. Although the estimates of the mixed model match almost exactly the gsem command, the survival model estimates differ. I wonder whether I have done something wrong related with the data format??
Please could you help me?
Also please could you clarify what @0 syntax in gsem means?
Thanks
Code:
clear
set seed 6765
set obs 500
gen id = _n
gen trt = rbinomial(1,0.5)
display ln(0.50)
//#########################
preserve
keep id
duplicates drop
matrix a = (1, 0.5 \ 0.5, 1)
drawnorm u1 u0, means(0 0) corr(a) sds(0.05 20)
sum u1 u0
corr u1 u0
tempfile rc
save "`rc'"
restore
merge 1:1 id using "`rc'", update
drop _merge
sum u1 u0
gen double b0 = 70 + u0
sum b0
gen double b1=-0.05+u1
sum b1
corr b0 b1
//Define the association between the biomarker and survival
local alpha = 0
survsim stime died, chazard(0.05 :* {t} :+ `alpha' :* (b0 :+ b1 :* {t})) cov(trt -.69314718) maxtime(48)
stset stime, fail(died)
stcox i.trt,nohr
streg i.trt, dist(exponential) nohr
gsem (stime <- i.trt, family(exponential, failure(died)))
expand 48
bys id: gen double meastime = _n-1
//Remove observations after event or censoring time
bys id: drop if meastime>=stime
//Generate observed biomarker values incorporating measurement error
generate e_ij = rnormal(0,15)
gen float response = b0 + b1*meastime + -5*trt + e_ij
sort id meastime
keep id trt stime died meastime response
bysort id: gen stoptime=meastime[_n+1]
replace stoptime=stime if stoptime==.
rename meastime starttime
mixed response c.starttime i.trt || id: starttime, cov(unstruct)
bysort id: gen N=_N
bysort id: gen n=_n
gen event_long=died if n==N
tab event_long
replace event_long=0 if event_long==.
gsem (response <- i.trt c.starttime U1[id]@1 c.starttime#U2[id]@1) ///
(stime <- i.trt U1[id]@0 U2[id]@0 , family(exponential, failure(died)))
This is the output fromthe exponential survival model
HTML Code:
. streg i.trt, dist(exponential) nohr
Failure _d: died
Analysis time _t: stime
Iteration 0: log likelihood = -771.07598
Iteration 1: log likelihood = -739.02466
Iteration 2: log likelihood = -737.17115
Iteration 3: log likelihood = -737.16669
Iteration 4: log likelihood = -737.16669
Exponential PH regression
No. of subjects = 500 Number of obs = 500
No. of failures = 409
Time at risk = 11,500.9626
LR chi2(1) = 67.82
Log likelihood = -737.16669 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
1.trt | -.8181656 .0998001 -8.20 0.000 -1.01377 -.622561
_cons | -2.899491 .0656532 -44.16 0.000 -3.028169 -2.770813
------------------------------------------------------------------------------
This is the output from the random intercept and slope model
HTML Code:
. mixed response c.starttime i.trt || id: starttime, cov(unstruct)
Performing EM optimization ...
Performing gradient-based optimization:
Iteration 0: log likelihood = -49228.39
Iteration 1: log likelihood = -49223.86
Iteration 2: log likelihood = -49223.858
Computing standard errors ...
Mixed-effects ML regression Number of obs = 11,698
Group variable: id Number of groups = 500
Obs per group:
min = 1
avg = 23.4
max = 48
Wald chi2(2) = 45.57
Log likelihood = -49223.858 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
response | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
starttime | -.0635027 .0138042 -4.60 0.000 -.0905585 -.0364469
1.trt | -9.411631 1.872736 -5.03 0.000 -13.08213 -5.741135
_cons | 71.85274 1.352029 53.14 0.000 69.20281 74.50267
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects parameters | Estimate Std. err. [95% conf. interval]
-----------------------------+------------------------------------------------
id: Unstructured |
var(starttime) | .0047512 .0038289 .0009792 .0230549
var(_cons) | 403.1105 27.88933 351.9926 461.6521
cov(starttime,_cons) | .7260532 .2937639 .1502865 1.30182
-----------------------------+------------------------------------------------
var(Residual) | 227.9082 3.073675 221.9628 234.0128
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 10193.23 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
This is the output from the joint survival longitudinal model using gsem
HTML Code:
. gsem (response <- i.trt c.starttime U1[id]@1 c.starttime#U2[id]@1) /// > (stime <- i.trt U1[id]@0 U2[id]@0 , family(exponential, failure(died))) Fitting fixed-effects model: Iteration 0: log likelihood = -15547395 Iteration 1: log likelihood = -95269.418 Iteration 2: log likelihood = -93877.901 Iteration 3: log likelihood = -90338.928 Iteration 4: log likelihood = -90246.428 Iteration 5: log likelihood = -90246.302 Iteration 6: log likelihood = -90246.302 Refining starting values: Grid node 0: log likelihood = -88216.813 Fitting full model: Iteration 0: log likelihood = -88216.813 (not concave) Iteration 1: log likelihood = -86663.765 (not concave) Iteration 2: log likelihood = -86431.117 Iteration 3: log likelihood = -86399.618 (backed up) Iteration 4: log likelihood = -86341.081 Iteration 5: log likelihood = -86287.597 Iteration 6: log likelihood = -86237.541 Iteration 7: log likelihood = -86213.525 Iteration 8: log likelihood = -86190.121 Iteration 9: log likelihood = -86187.228 (backed up) Iteration 10: log likelihood = -86186.506 (backed up) Iteration 11: log likelihood = -86186.145 (backed up) Iteration 12: log likelihood = -86185.965 (backed up) Iteration 13: log likelihood = -86185.942 (backed up) Iteration 14: log likelihood = -86185.931 (backed up) Iteration 15: log likelihood = -86185.925 (backed up) Iteration 16: log likelihood = -86185.924 (backed up) Iteration 17: log likelihood = -86185.924 (backed up) Iteration 18: log likelihood = -86185.923 (backed up) Iteration 19: log likelihood = -86185.923 (backed up) Iteration 20: log likelihood = -86185.923 (backed up) Iteration 21: log likelihood = -86185.923 (backed up) Iteration 22: log likelihood = -86185.923 (backed up) Iteration 23: log likelihood = -86185.923 (not concave) Iteration 24: log likelihood = -85399.166 (not concave) Iteration 25: log likelihood = -85319.271 (not concave) Iteration 26: log likelihood = -85222.571 Iteration 27: log likelihood = -85204.678 Iteration 28: log likelihood = -85178.789 Iteration 29: log likelihood = -85150.303 Iteration 30: log likelihood = -85149.688 Iteration 31: log likelihood = -85149.686 Generalized structural equation model Number of obs = 11,698 Response: response Family: Gaussian Link: Identity Response: stime No. of failures = 7,330 Family: Exponential Time at risk = 403,509.31 Form: Proportional hazards Link: Log Log likelihood = -85149.686 ( 1) [stime]U1[id] = 0 ( 2) [stime]U2[id] = 0 ( 3) [response]U1[id] = 1 ( 4) [response]c.starttime#U2[id] = 1 ------------------------------------------------------------------------------------ | Coefficient Std. err. z P>|z| [95% conf. interval] -------------------+---------------------------------------------------------------- response | 1.trt | -9.411655 1.873263 -5.02 0.000 -13.08318 -5.740126 starttime | -.0635024 .0138293 -4.59 0.000 -.0906074 -.0363974 | U1[id] | 1 (constrained) | c.starttime#U2[id] | 1 (constrained) | _cons | 71.85275 1.352239 53.14 0.000 69.20241 74.50309 -------------------+---------------------------------------------------------------- stime | 1.trt | -.937202 .0233759 -40.09 0.000 -.983018 -.8913861 | U1[id] | 0 (omitted) U2[id] | 0 (omitted) | _cons | -3.449736 .0162243 -212.63 0.000 -3.481535 -3.417937 -------------------+---------------------------------------------------------------- var(U1[id])| 403.1091 27.88905 351.9916 461.65 var(U2[id])| .0047526 .0038287 .00098 .0230492 -------------------+---------------------------------------------------------------- cov(U1[id],U2[id])| .7259616 .293765 2.47 0.013 .1501928 1.30173 -------------------+---------------------------------------------------------------- var(e.response)| 227.9077 3.073659 221.9624 234.0123 ------------------------------------------------------------------------------------
the log hazard ratio for the survival model in gsem is -.937202 which is a bit different from the streg command which is -.8181656. I assumed independence between the longitudinal and survival models so I don't understand why there is this discrepancy. I tried to simulate a larger sample size number and I still get a discrepancy.
Please could you let me know what I am doing wrong?
Thanks
A
