Dear All
I simulated data from a joint longitudinal and survival model (random intercept and slope model for the longitudinal outcome and an exponential survival model for the time to event outcome.
You can find the code below. I assumed in the simulation that the longitudinal and survival models are not associated. I tried to use gsem to obtain the estimates of these models. Theoretically, the estimates of the gsem should be very similar or identical to the separate mixed model and exponential survival model. Although the estimates of the mixed model match almost exactly the gsem command, the survival model estimates differ. I wonder whether I have done something wrong related with the data format??
Please could you help me?
Also please could you clarify what @0 syntax in gsem means?
Thanks
This is the output fromthe exponential survival model
This is the output from the random intercept and slope model
This is the output from the joint survival longitudinal model using gsem
the log hazard ratio for the survival model in gsem is -.937202 which is a bit different from the streg command which is -.8181656. I assumed independence between the longitudinal and survival models so I don't understand why there is this discrepancy. I tried to simulate a larger sample size number and I still get a discrepancy.
Please could you let me know what I am doing wrong?
Thanks
A
I simulated data from a joint longitudinal and survival model (random intercept and slope model for the longitudinal outcome and an exponential survival model for the time to event outcome.
You can find the code below. I assumed in the simulation that the longitudinal and survival models are not associated. I tried to use gsem to obtain the estimates of these models. Theoretically, the estimates of the gsem should be very similar or identical to the separate mixed model and exponential survival model. Although the estimates of the mixed model match almost exactly the gsem command, the survival model estimates differ. I wonder whether I have done something wrong related with the data format??
Please could you help me?
Also please could you clarify what @0 syntax in gsem means?
Thanks
Code:
clear set seed 6765 set obs 500 gen id = _n gen trt = rbinomial(1,0.5) display ln(0.50) //######################### preserve keep id duplicates drop matrix a = (1, 0.5 \ 0.5, 1) drawnorm u1 u0, means(0 0) corr(a) sds(0.05 20) sum u1 u0 corr u1 u0 tempfile rc save "`rc'" restore merge 1:1 id using "`rc'", update drop _merge sum u1 u0 gen double b0 = 70 + u0 sum b0 gen double b1=-0.05+u1 sum b1 corr b0 b1 //Define the association between the biomarker and survival local alpha = 0 survsim stime died, chazard(0.05 :* {t} :+ `alpha' :* (b0 :+ b1 :* {t})) cov(trt -.69314718) maxtime(48) stset stime, fail(died) stcox i.trt,nohr streg i.trt, dist(exponential) nohr gsem (stime <- i.trt, family(exponential, failure(died))) expand 48 bys id: gen double meastime = _n-1 //Remove observations after event or censoring time bys id: drop if meastime>=stime //Generate observed biomarker values incorporating measurement error generate e_ij = rnormal(0,15) gen float response = b0 + b1*meastime + -5*trt + e_ij sort id meastime keep id trt stime died meastime response bysort id: gen stoptime=meastime[_n+1] replace stoptime=stime if stoptime==. rename meastime starttime mixed response c.starttime i.trt || id: starttime, cov(unstruct) bysort id: gen N=_N bysort id: gen n=_n gen event_long=died if n==N tab event_long replace event_long=0 if event_long==. gsem (response <- i.trt c.starttime U1[id]@1 c.starttime#U2[id]@1) /// (stime <- i.trt U1[id]@0 U2[id]@0 , family(exponential, failure(died)))
This is the output fromthe exponential survival model
HTML Code:
. streg i.trt, dist(exponential) nohr Failure _d: died Analysis time _t: stime Iteration 0: log likelihood = -771.07598 Iteration 1: log likelihood = -739.02466 Iteration 2: log likelihood = -737.17115 Iteration 3: log likelihood = -737.16669 Iteration 4: log likelihood = -737.16669 Exponential PH regression No. of subjects = 500 Number of obs = 500 No. of failures = 409 Time at risk = 11,500.9626 LR chi2(1) = 67.82 Log likelihood = -737.16669 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ _t | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- 1.trt | -.8181656 .0998001 -8.20 0.000 -1.01377 -.622561 _cons | -2.899491 .0656532 -44.16 0.000 -3.028169 -2.770813 ------------------------------------------------------------------------------
This is the output from the random intercept and slope model
HTML Code:
. mixed response c.starttime i.trt || id: starttime, cov(unstruct) Performing EM optimization ... Performing gradient-based optimization: Iteration 0: log likelihood = -49228.39 Iteration 1: log likelihood = -49223.86 Iteration 2: log likelihood = -49223.858 Computing standard errors ... Mixed-effects ML regression Number of obs = 11,698 Group variable: id Number of groups = 500 Obs per group: min = 1 avg = 23.4 max = 48 Wald chi2(2) = 45.57 Log likelihood = -49223.858 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ response | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- starttime | -.0635027 .0138042 -4.60 0.000 -.0905585 -.0364469 1.trt | -9.411631 1.872736 -5.03 0.000 -13.08213 -5.741135 _cons | 71.85274 1.352029 53.14 0.000 69.20281 74.50267 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects parameters | Estimate Std. err. [95% conf. interval] -----------------------------+------------------------------------------------ id: Unstructured | var(starttime) | .0047512 .0038289 .0009792 .0230549 var(_cons) | 403.1105 27.88933 351.9926 461.6521 cov(starttime,_cons) | .7260532 .2937639 .1502865 1.30182 -----------------------------+------------------------------------------------ var(Residual) | 227.9082 3.073675 221.9628 234.0128 ------------------------------------------------------------------------------ LR test vs. linear model: chi2(3) = 10193.23 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference.
This is the output from the joint survival longitudinal model using gsem
HTML Code:
. gsem (response <- i.trt c.starttime U1[id]@1 c.starttime#U2[id]@1) /// > (stime <- i.trt U1[id]@0 U2[id]@0 , family(exponential, failure(died))) Fitting fixed-effects model: Iteration 0: log likelihood = -15547395 Iteration 1: log likelihood = -95269.418 Iteration 2: log likelihood = -93877.901 Iteration 3: log likelihood = -90338.928 Iteration 4: log likelihood = -90246.428 Iteration 5: log likelihood = -90246.302 Iteration 6: log likelihood = -90246.302 Refining starting values: Grid node 0: log likelihood = -88216.813 Fitting full model: Iteration 0: log likelihood = -88216.813 (not concave) Iteration 1: log likelihood = -86663.765 (not concave) Iteration 2: log likelihood = -86431.117 Iteration 3: log likelihood = -86399.618 (backed up) Iteration 4: log likelihood = -86341.081 Iteration 5: log likelihood = -86287.597 Iteration 6: log likelihood = -86237.541 Iteration 7: log likelihood = -86213.525 Iteration 8: log likelihood = -86190.121 Iteration 9: log likelihood = -86187.228 (backed up) Iteration 10: log likelihood = -86186.506 (backed up) Iteration 11: log likelihood = -86186.145 (backed up) Iteration 12: log likelihood = -86185.965 (backed up) Iteration 13: log likelihood = -86185.942 (backed up) Iteration 14: log likelihood = -86185.931 (backed up) Iteration 15: log likelihood = -86185.925 (backed up) Iteration 16: log likelihood = -86185.924 (backed up) Iteration 17: log likelihood = -86185.924 (backed up) Iteration 18: log likelihood = -86185.923 (backed up) Iteration 19: log likelihood = -86185.923 (backed up) Iteration 20: log likelihood = -86185.923 (backed up) Iteration 21: log likelihood = -86185.923 (backed up) Iteration 22: log likelihood = -86185.923 (backed up) Iteration 23: log likelihood = -86185.923 (not concave) Iteration 24: log likelihood = -85399.166 (not concave) Iteration 25: log likelihood = -85319.271 (not concave) Iteration 26: log likelihood = -85222.571 Iteration 27: log likelihood = -85204.678 Iteration 28: log likelihood = -85178.789 Iteration 29: log likelihood = -85150.303 Iteration 30: log likelihood = -85149.688 Iteration 31: log likelihood = -85149.686 Generalized structural equation model Number of obs = 11,698 Response: response Family: Gaussian Link: Identity Response: stime No. of failures = 7,330 Family: Exponential Time at risk = 403,509.31 Form: Proportional hazards Link: Log Log likelihood = -85149.686 ( 1) [stime]U1[id] = 0 ( 2) [stime]U2[id] = 0 ( 3) [response]U1[id] = 1 ( 4) [response]c.starttime#U2[id] = 1 ------------------------------------------------------------------------------------ | Coefficient Std. err. z P>|z| [95% conf. interval] -------------------+---------------------------------------------------------------- response | 1.trt | -9.411655 1.873263 -5.02 0.000 -13.08318 -5.740126 starttime | -.0635024 .0138293 -4.59 0.000 -.0906074 -.0363974 | U1[id] | 1 (constrained) | c.starttime#U2[id] | 1 (constrained) | _cons | 71.85275 1.352239 53.14 0.000 69.20241 74.50309 -------------------+---------------------------------------------------------------- stime | 1.trt | -.937202 .0233759 -40.09 0.000 -.983018 -.8913861 | U1[id] | 0 (omitted) U2[id] | 0 (omitted) | _cons | -3.449736 .0162243 -212.63 0.000 -3.481535 -3.417937 -------------------+---------------------------------------------------------------- var(U1[id])| 403.1091 27.88905 351.9916 461.65 var(U2[id])| .0047526 .0038287 .00098 .0230492 -------------------+---------------------------------------------------------------- cov(U1[id],U2[id])| .7259616 .293765 2.47 0.013 .1501928 1.30173 -------------------+---------------------------------------------------------------- var(e.response)| 227.9077 3.073659 221.9624 234.0123 ------------------------------------------------------------------------------------
the log hazard ratio for the survival model in gsem is -.937202 which is a bit different from the streg command which is -.8181656. I assumed independence between the longitudinal and survival models so I don't understand why there is this discrepancy. I tried to simulate a larger sample size number and I still get a discrepancy.
Please could you let me know what I am doing wrong?
Thanks
A