Problem with validity test for survival model with Weibull distribution using generalized gamma model

Pedro Castro

Join Date: Jul 2023
Posts: 1

Problem with validity test for survival model with Weibull distribution using generalized gamma model

30 Jul 2023, 04:04

Hello,

I am doing a survival analysis for when a vacant plot becomes developed, i.e., failure = development. I'm using a parametric model instead of Cox proportional hazards model, because the assumption of proportional hazards is not met. From parametric tests, I chose Weibull because it has a better fit - AIC and BIC - than Exponential (Gompertz is slightly better but I don't know how to assess its validity). To test the validity of the Weibull model, I fit a generalized gamma model and test the hypothesis that k=0 (test for the appropriateness of the lognormal) and then test the hypothesis that k=1 (test for the appropriateness of the Weibull). This is what is suggested in the Stata manual in streg—Parametric survival models. However, so far when I run the generalized gamma model, Stata takes too long to process the command. I have left it running already for two hours and nothing happens. It's strange, because for all other models (other distributions) all goes smoothly. The only thing different from the other models is that I add the nolog option (see code below), so that may have something to do.

Therefore my questions are:

1. What could be the reason it is taking so long? Could having added the nolog option be the reason? In cases like this in which Stata takes forever, is there any point in waiting for it to respond?

2. Is there an alternative (faster) way of testing the validity of the Weibull model?

3. Is there are a way to test the validity of the Gompertz model?

I'm using stata 17. Please see below, details about the data, code, and output:

First details about the data:

Code:

input float(YEAR_TAXROLL ln_JST_VAL_W_I_P FAILURE ln_HEAT_AR_W Bedrooms Restrooms Stories ln_SQFT_W HH_INCOME DIST_CBD RESDU_3_VAR interact_DIST_UNCERT3) int ZCTA
2016 11.173612 0 7.305188 2 0 1  9.677214 47536  103287.6  .05121636  5290.015 32220
2016  11.37649 0 7.021084 3 0 1  9.408371 47536 103184.12  .04664454  4812.976 32220
2016  10.97931 0 6.927558 3 0 1  9.525151 47536 103415.26 .017120913 1770.5636 32220
2016  11.49621 0 7.825245 3 0 2  9.525151 47536 103515.03 .027632317   2860.36 32220
2016 11.246445 0 7.459915 3 0 1  9.525151 47536  103614.8  .02384571  2470.769 32220
2016  11.26819 0 7.389564 3 0 1  9.525151 47536 103624.84 .006640271   688.097 32220
2016  10.69538 0 6.579251 1 0 1  9.525151 47536 103525.06  .03760462  3893.021 32220
2016  11.23799 0 7.266828 3 0 1  9.525151 47536  103425.3  .03764908  3893.867 32220
2016  11.57281 0  7.53583 3 0 1 10.011175 47536  103841.3  .07288396  7568.366 32220
2016 10.996836 0 7.313221 2 0 1  9.262268 47536  103538.8  .15010385 15541.572 32220

. summarize ln_JST_VAL_W_I_P FAILURE ln_HEAT_AR_W Bedrooms Restrooms Stories ln_SQFT_W
> HH_INCOME DIST_CBD RESDU_3_VAR interact_DIST_UNCERT3 ZCTA

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
ln_JST_VAL~P |  2,635,969     11.6731     .726165   8.597553   16.58054
     FAILURE |  4,639,075    .0035779    .0597082          0          1
ln_HEAT_AR_W |  3,511,930    7.394051    .3701783    6.52503   8.383662
    Bedrooms |  4,200,561    3.034418    1.363163          0        201
   Restrooms |  4,200,561    .1051492    1.586688          0        269
-------------+---------------------------------------------------------
     Stories |  4,200,561    1.235384     34.9962          0      41408
   ln_SQFT_W |  4,639,057    9.317953    .9546204    4.60517   13.02817
   HH_INCOME |  4,639,075     51708.1    16154.66      15279      95819
    DIST_CBD |  4,639,075    43541.89    20941.55   86.80797   133622.5
 RESDU_3_VAR |  1,756,326    .0190573    .0393086   4.14e-14   1.832363
-------------+---------------------------------------------------------
interact_D~3 |  1,756,326    609.4353    1231.284   1.66e-09   70178.39
        ZCTA |  4,639,075    32226.56    18.96891      32205      32277

Weibull Model:

Code:

. streg ln_HEAT_AR_W Bedrooms Restrooms Stories ln_SQFT_W HH_INCOME DIST_CBD RESDU_3_VA
> R interact_DIST_UNCERT3 i.ZCTA, dist (weibull)

        Failure _d: FAILURE
  Analysis time _t: YEAR_TAXROLL

Fitting constant-only model:
Iteration 0:   log likelihood = -4096.4745
Iteration 1:   log likelihood = -3654.7215
Iteration 2:   log likelihood = -3211.7027
Iteration 3:   log likelihood = -2765.4254
Iteration 4:   log likelihood = -2311.6977
Iteration 5:   log likelihood = -1848.9143
Iteration 6:   log likelihood =   -1438.38
Iteration 7:   log likelihood = -1284.9652
Iteration 8:   log likelihood = -1277.2977
Iteration 9:   log likelihood = -1277.2794
Iteration 10:   log likelihood = -1277.2794

Fitting full model:
Iteration 0:   log likelihood = -1277.2794  
Iteration 1:   log likelihood =  -948.8248  
Iteration 2:   log likelihood = -642.79445  
Iteration 3:   log likelihood = -504.70937  
Iteration 4:   log likelihood = -499.37099  
Iteration 5:   log likelihood = -499.15653  
Iteration 6:   log likelihood = -499.11293  
Iteration 7:   log likelihood = -499.10213  
Iteration 8:   log likelihood = -499.09989  
Iteration 9:   log likelihood = -499.09941  
Iteration 10:  log likelihood =  -499.0993  
Iteration 11:  log likelihood = -499.09927  

Weibull PH regression

No. of subjects =  1,756,326                         Number of obs = 1,756,326
No. of failures =        441
Time at risk    = 3541672827
                                                     LR chi2(35)   =   1556.36
Log likelihood = -499.09927                          Prob > chi2   =    0.0000

--------------------------------------------------------------------------------------
                  _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
---------------------+----------------------------------------------------------------
        ln_HEAT_AR_W |   8.614132   1.565881    11.85   0.000     6.032256    12.30108
            Bedrooms |   1.323708   .0438235     8.47   0.000     1.240543    1.412449
           Restrooms |   .0010985   1.637346    -0.00   0.996            0           .
             Stories |   1.024752   .1179899     0.21   0.832     .8177327     1.28418
           ln_SQFT_W |   .4915562   .0500188    -6.98   0.000     .4026784    .6000507
           HH_INCOME |   .9998837   8.58e-06   -13.55   0.000     .9998669    .9999005
            DIST_CBD |   .9999556   8.21e-06    -5.41   0.000     .9999395    .9999717
         RESDU_3_VAR |   20.80433   8.271109     7.63   0.000     9.544302    45.34852
interact_DIST_UNCE~3 |   1.000138   .0000112    12.24   0.000     1.000116     1.00016
                     |
                ZCTA |
              32206  |   .0359272   .0187434    -6.38   0.000     .0129225    .0998849
              32207  |   1.109801   .3035395     0.38   0.703     .6492838    1.896948
              32208  |    .455503   .1390679    -2.58   0.010     .2503883    .8286449
              32209  |   .1186568   .0391461    -6.46   0.000     .0621545    .2265233
              32210  |    1.75124   .4617911     2.12   0.034     1.044454    2.936312
              32211  |   .5077825   .1761033    -1.95   0.051     .2573201    1.002032
              32216  |   .8141964   .4028421    -0.42   0.678     .3087293    2.147239
              32217  |   .5059987   .2037725    -1.69   0.091     .2298048     1.11414
              32218  |   7.461173   2.960943     5.06   0.000      3.42776    16.24066
              32219  |   6.19e-06   .0030522    -0.02   0.981            0           .
              32220  |    33.8469   26.96108     4.42   0.000     7.103724    161.2693
              32221  |   10.33397   6.930979     3.48   0.000     2.775669    38.47396
              32222  |   .0000875   .0649352    -0.01   0.990            0           .
              32223  |   88.55096   53.39508     7.44   0.000        27.16    288.7066
              32224  |   48.09658   28.90272     6.45   0.000     14.81156    156.1807
              32225  |   26.39208   13.05314     6.62   0.000      10.0111    69.57696
              32226  |   535.6981   326.4417    10.31   0.000     162.2624     1768.57
              32233  |   270.8516   142.8178    10.62   0.000     96.36071    761.3123
              32244  |    1.76229   1.045261     0.96   0.339     .5510707    5.635695
              32246  |   8.378658   3.517927     5.06   0.000     3.679447    19.07947
              32250  |   2057.785   1188.296    13.21   0.000     663.5322    6381.723
              32254  |   .2503607   .1082887    -3.20   0.001     .1072495    .5844361
              32256  |   8.877869   5.157508     3.76   0.000     2.843229    27.72079
              32257  |   9.945448   4.675599     4.89   0.000     3.957798    24.99166
              32258  |   452.1762   294.3879     9.39   0.000     126.2221    1619.869
              32277  |   .3610453   .1779629    -2.07   0.039     .1374029    .9486971
                     |
               _cons |          0          0   -29.08   0.000            0           0
---------------------+----------------------------------------------------------------
               /ln_p |    7.03125   .0344609   204.04   0.000     6.963708    7.098792
---------------------+----------------------------------------------------------------
                   p |   1131.444   38.99061                      1057.547    1210.504
                 1/p |   .0008838   .0000305                      .0008261    .0009456
--------------------------------------------------------------------------------------
Note: _cons estimates baseline hazard.

. //assess the fit of model with AIC
. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
           . |  1,756,326  -1277.279  -499.0993      37   1072.199   1530.212
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.

And lastly, the generalized gamma model (this is the point where Stata stops working or at least takes too long processing) and Wald test:

Code:

HH_INCOME DIST_CBD RESDU_3_VAR interact_DIST_UNCERT3 i.ZCTA, dist (ggamma) nolog
test [kappa]_cons = 1

Again, my questions are:

1. What could be the reason it is taking so long? Could having added the nolog option be the reason? In cases like this in which Stata takes forever, is there any point in waiting for it to respond?

2. Is there an alternative (faster) way of testing the validity of the Weibull model?

3. Is there are a way to test the validity of the Gompertz model?

Thank you in advance!

Last edited by Pedro Castro; 30 Jul 2023, 04:11.

Tags: Gompertz, runtime, survival analysis, Wald Test, weibull

Announcement

Problem with validity test for survival model with Weibull distribution using generalized gamma model