Likelihood ratio interpretation Accelerated Failure Time Survival Analysis

MIchael Jefferson

Join Date: Feb 2019
Posts: 36

Likelihood ratio interpretation Accelerated Failure Time Survival Analysis

11 Oct 2019, 06:45

Hi there,

In my research on the survival of strategies within private equity I'm using an accelerated failure time model and need to determine what distribution fits my data the best.
Below is my output presented. Unfortunately, I do not know how to proceed in determining my optimal distribution. Hope someone can clarify my output.

Code:

stset E_Date, failure(Successful==1) id(Strategy_Number) enter(time P_Date) origin(time P_Date)

                id:  Strategy_Number
     failure event:  Successful == 1
obs. time interval:  (E_Date[_n-1], E_Date]
 enter on or after:  time P_Date
 exit on or before:  failure
    t for analysis:  (time-origin)
            origin:  time P_Date

------------------------------------------------------------------------------
      1,197  total observations
          0  exclusions
------------------------------------------------------------------------------
      1,197  observations remaining, representing
      1,197  subjects
        251  failures in single-failure-per-subject data
  3,031,231  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =     8,216

What is the reason the number of subjects and failures is different than what is presented in the whole sample?

Code:

Weibull AFT regression

No. of subjects =          917                  Number of obs    =         917
No. of failures =          171
Time at risk    =      2162758
                                                LR chi2(26)      =      428.44
Log likelihood  =   -236.11205                  Prob > chi2      =      0.0000

This output above is just from Weibull, the other models show the same numbers, in terms of total subjects and failures.

Since the models are nested, I have to use a likelihood ratio test for Log-Normal, Exponential and Weibull. As shown below.

Model 1 = Gamma
Model 2 = Weibull
Model 3 = Exponential
Model 4 = Log-Normal
Model 5 = Log-Logistic

Code:

. lrtest (Model2)(Model3), force

Likelihood-ratio test                                 LR chi2(2)  =    309.66
(Assumption: Model3 nested in Model2)                 Prob > chi2 =    0.0000

. lrtest (Model1)(Model2), force

Likelihood-ratio test                                 LR chi2(0)  =   -217.14
(Assumption: Model2 nested in Model1)                 Prob > chi2 =         .

. lrtest (Model1)(Model4), force

Likelihood-ratio test                                 LR chi2(0)  =   -232.65
(Assumption: Model4 nested in Model1)                 Prob > chi2 =         .

. lrtest (Model1)(Model3), force

Likelihood-ratio test                                 LR chi2(2)  =     92.52
(Assumption: Model3 nested in Model1)                 Prob > chi2 =    0.0000

For the non-nested models I need to compare the AIC values.

Code:

. estimates stats _all

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
      Model1 |        917         .  -344.6812      28    745.3624   880.3534
      Model2 |        917 -450.3315  -236.1121      28    528.2241   663.2151
      Model3 |        917         .  -390.9409      26    833.8817   959.2305
      Model4 |        917 -439.1325  -228.3551      28    512.7102   647.7012
      Model5 |        917         .  -357.0475      27    768.0951    898.265
-----------------------------------------------------------------------------
               Note: N=Obs used in calculating BIC; see [R] BIC note.

Again, the number of observations is different than the stset output indicated.

According to the data above is it correct that I should opt for the log-normal model (model 4)?

I hope someone could explain my outputs, thanks in advance.

Kind regards,

Michael

Tags: None

Jenny Williams

Join Date: Sep 2017

Posts: 35
#2

11 Oct 2019, 08:51

The difference in sample size is probably due to missing data on covariates included in your models. You could try running the models with no covariates to confirm that the sample sizes are the same as from stset.

For distribution selection, if you're just following Example 6 here (https://www.stata.com/manuals13/ststreg.pdf) and basing it on lowest AIC values, then you have your answer. But there may be other distribution selection criteria based on diagnostic plots, comparing predicted curves from the parametric models with non-parametric curves, etc.
Comment

MIchael Jefferson

Join Date: Feb 2019
Posts: 36

11 Oct 2019, 11:05

Dear Jenny,

Thank you very much for your response. Indeed, the reduced number of subjects was due to covariate data, I'm sorry for this mistake. You are right, I was following example 6 from the manual. However, as I am new to survival analyses, I'm not sure whether my output makes sense. For more experienced statisticians, with survival analysis, it might indicate I did something wrong. But many thanks for your reply. For now I will continue with these findings. Although running the streg commands separately, instead of the do-file, for the model testing, it delivered different results, as can be seen below, that is quite strange, right?

Code:

. lrtest (Model2)(Model3), force

Likelihood-ratio test                                 LR chi2(1)  =    188.65
(Assumption: Model3 nested in Model2)                 Prob > chi2 =    0.0000

. lrtest (Model1)(Model2), force

Likelihood-ratio test                                 LR chi2(1)  =     16.04
(Assumption: Model2 nested in Model1)                 Prob > chi2 =    0.0001

. lrtest (Model1)(Model4), force

Likelihood-ratio test                                 LR chi2(1)  =      0.52
(Assumption: Model4 nested in Model1)                 Prob > chi2 =    0.4700

. lrtest (Model1)(Model3), force

Likelihood-ratio test                                 LR chi2(2)  =    204.68
(Assumption: Model3 nested in Model1)                 Prob > chi2 =    0.0000

Code:

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
      Model1 |        917 -438.2742  -228.0941      29    514.1882   654.0003
      Model2 |        917 -450.3315  -236.1121      28    528.2241   663.2151
      Model3 |        917  -487.744  -330.4353      27    714.8707   845.0406
      Model4 |        917 -439.1325  -228.3551      28    512.7102   647.7012
      Model5 |        917  -444.326  -227.2248      28    510.4496   645.4406
-----------------------------------------------------------------------------
               Note: N=Obs used in calculating BIC; see [R] BIC note.

Kind regards,

Michael

Announcement

Likelihood ratio interpretation Accelerated Failure Time Survival Analysis

Comment

Comment