gsem and joint survival longitudinal modelling

Andrew Xavier

Join Date: Jul 2018
Posts: 57

gsem and joint survival longitudinal modelling

17 Aug 2022, 08:49

Dear All

I simulated data from a joint longitudinal and survival model (random intercept and slope model for the longitudinal outcome and an exponential survival model for the time to event outcome.

You can find the code below. I assumed in the simulation that the longitudinal and survival models are not associated. I tried to use gsem to obtain the estimates of these models. Theoretically, the estimates of the gsem should be very similar or identical to the separate mixed model and exponential survival model. Although the estimates of the mixed model match almost exactly the gsem command, the survival model estimates differ. I wonder whether I have done something wrong related with the data format??

Please could you help me?

Also please could you clarify what @0 syntax in gsem means?

Thanks

Code:

        clear
        set seed 6765
        set obs 500

        gen id = _n

        gen trt = rbinomial(1,0.5)

        display ln(0.50)

            //#########################
            preserve
                keep id
                duplicates drop
                matrix a = (1, 0.5 \ 0.5,  1)
                drawnorm u1 u0, means(0 0) corr(a) sds(0.05 20)
                sum u1 u0
                corr u1 u0
                tempfile rc
                save "`rc'"
            restore
            merge 1:1 id using "`rc'", update
            drop _merge
            
        sum u1 u0
        
        gen double b0 = 70 + u0
        sum b0
            
        gen double b1=-0.05+u1
        sum b1
        
        corr b0 b1
            
        //Define the association between the biomarker and survival
        local alpha = 0
        
        survsim stime died, chazard(0.05 :* {t} :+ `alpha' :* (b0 :+ b1 :* {t})) cov(trt -.69314718) maxtime(48)
        
        stset stime, fail(died)

        stcox i.trt,nohr
        streg i.trt, dist(exponential) nohr

        gsem (stime <- i.trt, family(exponential, failure(died)))
        
        expand 48
        bys id: gen double meastime = _n-1
        //Remove observations after event or censoring time
        bys id: drop if meastime>=stime

        //Generate observed biomarker values incorporating measurement error
        generate e_ij = rnormal(0,15)
        gen float response = b0 + b1*meastime + -5*trt + e_ij
        
        sort id meastime
        
        keep id trt stime died meastime response

        bysort id: gen stoptime=meastime[_n+1]
        replace stoptime=stime if stoptime==.
        
        rename meastime starttime
        
        mixed response c.starttime i.trt || id: starttime, cov(unstruct)

        bysort id: gen N=_N
        
        bysort id: gen n=_n
        
        gen event_long=died if n==N
        tab event_long
        replace event_long=0 if event_long==.

        gsem (response <- i.trt c.starttime U1[id]@1 c.starttime#U2[id]@1) ///
        (stime <- i.trt U1[id]@0 U2[id]@0 , family(exponential, failure(died)))

This is the output fromthe exponential survival model

HTML Code:

.                 streg i.trt, dist(exponential) nohr

        Failure _d: died
  Analysis time _t: stime

Iteration 0:   log likelihood = -771.07598  
Iteration 1:   log likelihood = -739.02466  
Iteration 2:   log likelihood = -737.17115  
Iteration 3:   log likelihood = -737.16669  
Iteration 4:   log likelihood = -737.16669  

Exponential PH regression

No. of subjects =         500                           Number of obs =    500
No. of failures =         409
Time at risk    = 11,500.9626
                                                        LR chi2(1)    =  67.82
Log likelihood = -737.16669                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
          _t | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       1.trt |  -.8181656   .0998001    -8.20   0.000     -1.01377    -.622561
       _cons |  -2.899491   .0656532   -44.16   0.000    -3.028169   -2.770813
------------------------------------------------------------------------------

This is the output from the random intercept and slope model

HTML Code:

.                 mixed response c.starttime i.trt || id: starttime, cov(unstruct)                

Performing EM optimization ...

Performing gradient-based optimization: 
Iteration 0:   log likelihood =  -49228.39  
Iteration 1:   log likelihood =  -49223.86  
Iteration 2:   log likelihood = -49223.858  

Computing standard errors ...

Mixed-effects ML regression                     Number of obs     =     11,698
Group variable: id                              Number of groups  =        500
                                                Obs per group:
                                                              min =          1
                                                              avg =       23.4
                                                              max =         48
                                                Wald chi2(2)      =      45.57
Log likelihood = -49223.858                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
    response | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   starttime |  -.0635027   .0138042    -4.60   0.000    -.0905585   -.0364469
       1.trt |  -9.411631   1.872736    -5.03   0.000    -13.08213   -5.741135
       _cons |   71.85274   1.352029    53.14   0.000     69.20281    74.50267
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
id: Unstructured             |
              var(starttime) |   .0047512   .0038289      .0009792    .0230549
                  var(_cons) |   403.1105   27.88933      351.9926    461.6521
        cov(starttime,_cons) |   .7260532   .2937639      .1502865     1.30182
-----------------------------+------------------------------------------------
               var(Residual) |   227.9082   3.073675      221.9628    234.0128
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 10193.23              Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

This is the output from the joint survival longitudinal model using gsem

HTML Code:

.                 gsem (response <- i.trt c.starttime U1[id]@1 c.starttime#U2[id]@1) ///
>                 (stime <- i.trt U1[id]@0 U2[id]@0 , family(exponential, failure(died)))         

Fitting fixed-effects model:

Iteration 0:   log likelihood =  -15547395  
Iteration 1:   log likelihood = -95269.418  
Iteration 2:   log likelihood = -93877.901  
Iteration 3:   log likelihood = -90338.928  
Iteration 4:   log likelihood = -90246.428  
Iteration 5:   log likelihood = -90246.302  
Iteration 6:   log likelihood = -90246.302  

Refining starting values:

Grid node 0:   log likelihood = -88216.813

Fitting full model:

Iteration 0:   log likelihood = -88216.813  (not concave)
Iteration 1:   log likelihood = -86663.765  (not concave)
Iteration 2:   log likelihood = -86431.117  
Iteration 3:   log likelihood = -86399.618  (backed up)
Iteration 4:   log likelihood = -86341.081  
Iteration 5:   log likelihood = -86287.597  
Iteration 6:   log likelihood = -86237.541  
Iteration 7:   log likelihood = -86213.525  
Iteration 8:   log likelihood = -86190.121  
Iteration 9:   log likelihood = -86187.228  (backed up)
Iteration 10:  log likelihood = -86186.506  (backed up)
Iteration 11:  log likelihood = -86186.145  (backed up)
Iteration 12:  log likelihood = -86185.965  (backed up)
Iteration 13:  log likelihood = -86185.942  (backed up)
Iteration 14:  log likelihood = -86185.931  (backed up)
Iteration 15:  log likelihood = -86185.925  (backed up)
Iteration 16:  log likelihood = -86185.924  (backed up)
Iteration 17:  log likelihood = -86185.924  (backed up)
Iteration 18:  log likelihood = -86185.923  (backed up)
Iteration 19:  log likelihood = -86185.923  (backed up)
Iteration 20:  log likelihood = -86185.923  (backed up)
Iteration 21:  log likelihood = -86185.923  (backed up)
Iteration 22:  log likelihood = -86185.923  (backed up)
Iteration 23:  log likelihood = -86185.923  (not concave)
Iteration 24:  log likelihood = -85399.166  (not concave)
Iteration 25:  log likelihood = -85319.271  (not concave)
Iteration 26:  log likelihood = -85222.571  
Iteration 27:  log likelihood = -85204.678  
Iteration 28:  log likelihood = -85178.789  
Iteration 29:  log likelihood = -85150.303  
Iteration 30:  log likelihood = -85149.688  
Iteration 31:  log likelihood = -85149.686  

Generalized structural equation model             Number of obs   =     11,698

Response: response            
Family:   Gaussian            
Link:     Identity            

Response: stime                                   No. of failures =      7,330
Family:   Exponential                             Time at risk    = 403,509.31
Form:     Proportional hazards
Link:     Log                 

Log likelihood = -85149.686

 ( 1)  [stime]U1[id] = 0
 ( 2)  [stime]U2[id] = 0
 ( 3)  [response]U1[id] = 1
 ( 4)  [response]c.starttime#U2[id] = 1
------------------------------------------------------------------------------------
                   | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------------+----------------------------------------------------------------
response           |
             1.trt |  -9.411655   1.873263    -5.02   0.000    -13.08318   -5.740126
         starttime |  -.0635024   .0138293    -4.59   0.000    -.0906074   -.0363974
                   |
            U1[id] |          1  (constrained)
                   |
c.starttime#U2[id] |          1  (constrained)
                   |
             _cons |   71.85275   1.352239    53.14   0.000     69.20241    74.50309
-------------------+----------------------------------------------------------------
stime              |
             1.trt |   -.937202   .0233759   -40.09   0.000     -.983018   -.8913861
                   |
            U1[id] |          0  (omitted)
            U2[id] |          0  (omitted)
                   |
             _cons |  -3.449736   .0162243  -212.63   0.000    -3.481535   -3.417937
-------------------+----------------------------------------------------------------
        var(U1[id])|   403.1091   27.88905                      351.9916      461.65
        var(U2[id])|   .0047526   .0038287                        .00098    .0230492
-------------------+----------------------------------------------------------------
 cov(U1[id],U2[id])|   .7259616    .293765     2.47   0.013     .1501928     1.30173
-------------------+----------------------------------------------------------------
    var(e.response)|   227.9077   3.073659                      221.9624    234.0123
------------------------------------------------------------------------------------

the log hazard ratio for the survival model in gsem is -.937202 which is a bit different from the streg command which is -.8181656. I assumed independence between the longitudinal and survival models so I don't understand why there is this discrepancy. I tried to simulate a larger sample size number and I still get a discrepancy.

Please could you let me know what I am doing wrong?

Thanks

A

Tags: None

Announcement

gsem and joint survival longitudinal modelling