Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Likelihood ratio interpretation Accelerated Failure Time Survival Analysis

    Hi there,

    In my research on the survival of strategies within private equity I'm using an accelerated failure time model and need to determine what distribution fits my data the best.
    Below is my output presented. Unfortunately, I do not know how to proceed in determining my optimal distribution. Hope someone can clarify my output.

    Code:
    stset E_Date, failure(Successful==1) id(Strategy_Number) enter(time P_Date) origin(time P_Date)
    
                    id:  Strategy_Number
         failure event:  Successful == 1
    obs. time interval:  (E_Date[_n-1], E_Date]
     enter on or after:  time P_Date
     exit on or before:  failure
        t for analysis:  (time-origin)
                origin:  time P_Date
    
    ------------------------------------------------------------------------------
          1,197  total observations
              0  exclusions
    ------------------------------------------------------------------------------
          1,197  observations remaining, representing
          1,197  subjects
            251  failures in single-failure-per-subject data
      3,031,231  total analysis time at risk and under observation
                                                    at risk from t =         0
                                         earliest observed entry t =         0
                                              last observed exit t =     8,216
    What is the reason the number of subjects and failures is different than what is presented in the whole sample?

    Code:
    Weibull AFT regression
    
    No. of subjects =          917                  Number of obs    =         917
    No. of failures =          171
    Time at risk    =      2162758
                                                    LR chi2(26)      =      428.44
    Log likelihood  =   -236.11205                  Prob > chi2      =      0.0000
    This output above is just from Weibull, the other models show the same numbers, in terms of total subjects and failures.

    Since the models are nested, I have to use a likelihood ratio test for Log-Normal, Exponential and Weibull. As shown below.

    Model 1 = Gamma
    Model 2 = Weibull
    Model 3 = Exponential
    Model 4 = Log-Normal
    Model 5 = Log-Logistic

    Code:
    . lrtest (Model2)(Model3), force
    
    Likelihood-ratio test                                 LR chi2(2)  =    309.66
    (Assumption: Model3 nested in Model2)                 Prob > chi2 =    0.0000
    
    . lrtest (Model1)(Model2), force
    
    Likelihood-ratio test                                 LR chi2(0)  =   -217.14
    (Assumption: Model2 nested in Model1)                 Prob > chi2 =         .
    
    . lrtest (Model1)(Model4), force
    
    Likelihood-ratio test                                 LR chi2(0)  =   -232.65
    (Assumption: Model4 nested in Model1)                 Prob > chi2 =         .
    
    . lrtest (Model1)(Model3), force
    
    Likelihood-ratio test                                 LR chi2(2)  =     92.52
    (Assumption: Model3 nested in Model1)                 Prob > chi2 =    0.0000
    For the non-nested models I need to compare the AIC values.

    Code:
    . estimates stats _all
    
    Akaike's information criterion and Bayesian information criterion
    
    -----------------------------------------------------------------------------
           Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
    -------------+---------------------------------------------------------------
          Model1 |        917         .  -344.6812      28    745.3624   880.3534
          Model2 |        917 -450.3315  -236.1121      28    528.2241   663.2151
          Model3 |        917         .  -390.9409      26    833.8817   959.2305
          Model4 |        917 -439.1325  -228.3551      28    512.7102   647.7012
          Model5 |        917         .  -357.0475      27    768.0951    898.265
    -----------------------------------------------------------------------------
                   Note: N=Obs used in calculating BIC; see [R] BIC note.
    Again, the number of observations is different than the stset output indicated.

    According to the data above is it correct that I should opt for the log-normal model (model 4)?

    I hope someone could explain my outputs, thanks in advance.

    Kind regards,

    Michael

  • #2
    The difference in sample size is probably due to missing data on covariates included in your models. You could try running the models with no covariates to confirm that the sample sizes are the same as from stset.

    For distribution selection, if you're just following Example 6 here (https://www.stata.com/manuals13/ststreg.pdf) and basing it on lowest AIC values, then you have your answer. But there may be other distribution selection criteria based on diagnostic plots, comparing predicted curves from the parametric models with non-parametric curves, etc.

    Comment


    • #3
      Dear Jenny,

      Thank you very much for your response. Indeed, the reduced number of subjects was due to covariate data, I'm sorry for this mistake. You are right, I was following example 6 from the manual. However, as I am new to survival analyses, I'm not sure whether my output makes sense. For more experienced statisticians, with survival analysis, it might indicate I did something wrong. But many thanks for your reply. For now I will continue with these findings. Although running the streg commands separately, instead of the do-file, for the model testing, it delivered different results, as can be seen below, that is quite strange, right?

      Code:
      . lrtest (Model2)(Model3), force
      
      Likelihood-ratio test                                 LR chi2(1)  =    188.65
      (Assumption: Model3 nested in Model2)                 Prob > chi2 =    0.0000
      
      . lrtest (Model1)(Model2), force
      
      Likelihood-ratio test                                 LR chi2(1)  =     16.04
      (Assumption: Model2 nested in Model1)                 Prob > chi2 =    0.0001
      
      . lrtest (Model1)(Model4), force
      
      Likelihood-ratio test                                 LR chi2(1)  =      0.52
      (Assumption: Model4 nested in Model1)                 Prob > chi2 =    0.4700
      
      . lrtest (Model1)(Model3), force
      
      Likelihood-ratio test                                 LR chi2(2)  =    204.68
      (Assumption: Model3 nested in Model1)                 Prob > chi2 =    0.0000

      Code:
      Akaike's information criterion and Bayesian information criterion
      
      -----------------------------------------------------------------------------
             Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
      -------------+---------------------------------------------------------------
            Model1 |        917 -438.2742  -228.0941      29    514.1882   654.0003
            Model2 |        917 -450.3315  -236.1121      28    528.2241   663.2151
            Model3 |        917  -487.744  -330.4353      27    714.8707   845.0406
            Model4 |        917 -439.1325  -228.3551      28    512.7102   647.7012
            Model5 |        917  -444.326  -227.2248      28    510.4496   645.4406
      -----------------------------------------------------------------------------
                     Note: N=Obs used in calculating BIC; see [R] BIC note.

      Kind regards,

      Michael

      Comment

      Working...
      X