Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • gsem and joint survival longitudinal modelling

    Dear All

    I simulated data from a joint longitudinal and survival model (random intercept and slope model for the longitudinal outcome and an exponential survival model for the time to event outcome.

    You can find the code below. I assumed in the simulation that the longitudinal and survival models are not associated. I tried to use gsem to obtain the estimates of these models. Theoretically, the estimates of the gsem should be very similar or identical to the separate mixed model and exponential survival model. Although the estimates of the mixed model match almost exactly the gsem command, the survival model estimates differ. I wonder whether I have done something wrong related with the data format??

    Please could you help me?

    Also please could you clarify what @0 syntax in gsem means?

    Thanks


    Code:
            clear
            set seed 6765
            set obs 500
    
            gen id = _n
    
            gen trt = rbinomial(1,0.5)
    
            display ln(0.50)
    
                //#########################
                preserve
                    keep id
                    duplicates drop
                    matrix a = (1, 0.5 \ 0.5,  1)
                    drawnorm u1 u0, means(0 0) corr(a) sds(0.05 20)
                    sum u1 u0
                    corr u1 u0
                    tempfile rc
                    save "`rc'"
                restore
                merge 1:1 id using "`rc'", update
                drop _merge
                
            sum u1 u0
            
            gen double b0 = 70 + u0
            sum b0
                
            gen double b1=-0.05+u1
            sum b1
            
            corr b0 b1
                
            //Define the association between the biomarker and survival
            local alpha = 0
            
            survsim stime died, chazard(0.05 :* {t} :+ `alpha' :* (b0 :+ b1 :* {t})) cov(trt -.69314718) maxtime(48)
            
            stset stime, fail(died)
    
            stcox i.trt,nohr
            streg i.trt, dist(exponential) nohr
    
            gsem (stime <- i.trt, family(exponential, failure(died)))
            
            expand 48
            bys id: gen double meastime = _n-1
            //Remove observations after event or censoring time
            bys id: drop if meastime>=stime
    
            //Generate observed biomarker values incorporating measurement error
            generate e_ij = rnormal(0,15)
            gen float response = b0 + b1*meastime + -5*trt + e_ij
            
            sort id meastime
            
            keep id trt stime died meastime response
    
            bysort id: gen stoptime=meastime[_n+1]
            replace stoptime=stime if stoptime==.
            
            rename meastime starttime
            
            mixed response c.starttime i.trt || id: starttime, cov(unstruct)
    
            bysort id: gen N=_N
            
            bysort id: gen n=_n
            
            gen event_long=died if n==N
            tab event_long
            replace event_long=0 if event_long==.
    
            gsem (response <- i.trt c.starttime U1[id]@1 c.starttime#U2[id]@1) ///
            (stime <- i.trt U1[id]@0 U2[id]@0 , family(exponential, failure(died)))

    This is the output fromthe exponential survival model

    HTML Code:
    .                 streg i.trt, dist(exponential) nohr
    
            Failure _d: died
      Analysis time _t: stime
    
    Iteration 0:   log likelihood = -771.07598  
    Iteration 1:   log likelihood = -739.02466  
    Iteration 2:   log likelihood = -737.17115  
    Iteration 3:   log likelihood = -737.16669  
    Iteration 4:   log likelihood = -737.16669  
    
    Exponential PH regression
    
    No. of subjects =         500                           Number of obs =    500
    No. of failures =         409
    Time at risk    = 11,500.9626
                                                            LR chi2(1)    =  67.82
    Log likelihood = -737.16669                             Prob > chi2   = 0.0000
    
    ------------------------------------------------------------------------------
              _t | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           1.trt |  -.8181656   .0998001    -8.20   0.000     -1.01377    -.622561
           _cons |  -2.899491   .0656532   -44.16   0.000    -3.028169   -2.770813
    ------------------------------------------------------------------------------

    This is the output from the random intercept and slope model
    HTML Code:
    .                 mixed response c.starttime i.trt || id: starttime, cov(unstruct)                
    
    Performing EM optimization ...
    
    Performing gradient-based optimization: 
    Iteration 0:   log likelihood =  -49228.39  
    Iteration 1:   log likelihood =  -49223.86  
    Iteration 2:   log likelihood = -49223.858  
    
    Computing standard errors ...
    
    Mixed-effects ML regression                     Number of obs     =     11,698
    Group variable: id                              Number of groups  =        500
                                                    Obs per group:
                                                                  min =          1
                                                                  avg =       23.4
                                                                  max =         48
                                                    Wald chi2(2)      =      45.57
    Log likelihood = -49223.858                     Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
        response | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
       starttime |  -.0635027   .0138042    -4.60   0.000    -.0905585   -.0364469
           1.trt |  -9.411631   1.872736    -5.03   0.000    -13.08213   -5.741135
           _cons |   71.85274   1.352029    53.14   0.000     69.20281    74.50267
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
    -----------------------------+------------------------------------------------
    id: Unstructured             |
                  var(starttime) |   .0047512   .0038289      .0009792    .0230549
                      var(_cons) |   403.1105   27.88933      351.9926    461.6521
            cov(starttime,_cons) |   .7260532   .2937639      .1502865     1.30182
    -----------------------------+------------------------------------------------
                   var(Residual) |   227.9082   3.073675      221.9628    234.0128
    ------------------------------------------------------------------------------
    LR test vs. linear model: chi2(3) = 10193.23              Prob > chi2 = 0.0000
    
    Note: LR test is conservative and provided only for reference.

    This is the output from the joint survival longitudinal model using gsem

    HTML Code:
    .                 gsem (response <- i.trt c.starttime U1[id]@1 c.starttime#U2[id]@1) ///
    >                 (stime <- i.trt U1[id]@0 U2[id]@0 , family(exponential, failure(died)))         
    
    Fitting fixed-effects model:
    
    Iteration 0:   log likelihood =  -15547395  
    Iteration 1:   log likelihood = -95269.418  
    Iteration 2:   log likelihood = -93877.901  
    Iteration 3:   log likelihood = -90338.928  
    Iteration 4:   log likelihood = -90246.428  
    Iteration 5:   log likelihood = -90246.302  
    Iteration 6:   log likelihood = -90246.302  
    
    Refining starting values:
    
    Grid node 0:   log likelihood = -88216.813
    
    Fitting full model:
    
    Iteration 0:   log likelihood = -88216.813  (not concave)
    Iteration 1:   log likelihood = -86663.765  (not concave)
    Iteration 2:   log likelihood = -86431.117  
    Iteration 3:   log likelihood = -86399.618  (backed up)
    Iteration 4:   log likelihood = -86341.081  
    Iteration 5:   log likelihood = -86287.597  
    Iteration 6:   log likelihood = -86237.541  
    Iteration 7:   log likelihood = -86213.525  
    Iteration 8:   log likelihood = -86190.121  
    Iteration 9:   log likelihood = -86187.228  (backed up)
    Iteration 10:  log likelihood = -86186.506  (backed up)
    Iteration 11:  log likelihood = -86186.145  (backed up)
    Iteration 12:  log likelihood = -86185.965  (backed up)
    Iteration 13:  log likelihood = -86185.942  (backed up)
    Iteration 14:  log likelihood = -86185.931  (backed up)
    Iteration 15:  log likelihood = -86185.925  (backed up)
    Iteration 16:  log likelihood = -86185.924  (backed up)
    Iteration 17:  log likelihood = -86185.924  (backed up)
    Iteration 18:  log likelihood = -86185.923  (backed up)
    Iteration 19:  log likelihood = -86185.923  (backed up)
    Iteration 20:  log likelihood = -86185.923  (backed up)
    Iteration 21:  log likelihood = -86185.923  (backed up)
    Iteration 22:  log likelihood = -86185.923  (backed up)
    Iteration 23:  log likelihood = -86185.923  (not concave)
    Iteration 24:  log likelihood = -85399.166  (not concave)
    Iteration 25:  log likelihood = -85319.271  (not concave)
    Iteration 26:  log likelihood = -85222.571  
    Iteration 27:  log likelihood = -85204.678  
    Iteration 28:  log likelihood = -85178.789  
    Iteration 29:  log likelihood = -85150.303  
    Iteration 30:  log likelihood = -85149.688  
    Iteration 31:  log likelihood = -85149.686  
    
    Generalized structural equation model             Number of obs   =     11,698
    
    Response: response            
    Family:   Gaussian            
    Link:     Identity            
    
    Response: stime                                   No. of failures =      7,330
    Family:   Exponential                             Time at risk    = 403,509.31
    Form:     Proportional hazards
    Link:     Log                 
    
    Log likelihood = -85149.686
    
     ( 1)  [stime]U1[id] = 0
     ( 2)  [stime]U2[id] = 0
     ( 3)  [response]U1[id] = 1
     ( 4)  [response]c.starttime#U2[id] = 1
    ------------------------------------------------------------------------------------
                       | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------------+----------------------------------------------------------------
    response           |
                 1.trt |  -9.411655   1.873263    -5.02   0.000    -13.08318   -5.740126
             starttime |  -.0635024   .0138293    -4.59   0.000    -.0906074   -.0363974
                       |
                U1[id] |          1  (constrained)
                       |
    c.starttime#U2[id] |          1  (constrained)
                       |
                 _cons |   71.85275   1.352239    53.14   0.000     69.20241    74.50309
    -------------------+----------------------------------------------------------------
    stime              |
                 1.trt |   -.937202   .0233759   -40.09   0.000     -.983018   -.8913861
                       |
                U1[id] |          0  (omitted)
                U2[id] |          0  (omitted)
                       |
                 _cons |  -3.449736   .0162243  -212.63   0.000    -3.481535   -3.417937
    -------------------+----------------------------------------------------------------
            var(U1[id])|   403.1091   27.88905                      351.9916      461.65
            var(U2[id])|   .0047526   .0038287                        .00098    .0230492
    -------------------+----------------------------------------------------------------
     cov(U1[id],U2[id])|   .7259616    .293765     2.47   0.013     .1501928     1.30173
    -------------------+----------------------------------------------------------------
        var(e.response)|   227.9077   3.073659                      221.9624    234.0123
    ------------------------------------------------------------------------------------

    the log hazard ratio for the survival model in gsem is -.937202 which is a bit different from the streg command which is -.8181656. I assumed independence between the longitudinal and survival models so I don't understand why there is this discrepancy. I tried to simulate a larger sample size number and I still get a discrepancy.

    Please could you let me know what I am doing wrong?

    Thanks

    A
Working...
X