Hello,
I am doing a survival analysis for when a vacant plot becomes developed, i.e., failure = development. I'm using a parametric model instead of Cox proportional hazards model, because the assumption of proportional hazards is not met. From parametric tests, I chose Weibull because it has a better fit - AIC and BIC - than Exponential (Gompertz is slightly better but I don't know how to assess its validity). To test the validity of the Weibull model, I fit a generalized gamma model and test the hypothesis that k=0 (test for the appropriateness of the lognormal) and then test the hypothesis that k=1 (test for the appropriateness of the Weibull). This is what is suggested in the Stata manual in streg—Parametric survival models. However, so far when I run the generalized gamma model, Stata takes too long to process the command. I have left it running already for two hours and nothing happens. It's strange, because for all other models (other distributions) all goes smoothly. The only thing different from the other models is that I add the nolog option (see code below), so that may have something to do.
Therefore my questions are:
1. What could be the reason it is taking so long? Could having added the nolog option be the reason? In cases like this in which Stata takes forever, is there any point in waiting for it to respond?
2. Is there an alternative (faster) way of testing the validity of the Weibull model?
3. Is there are a way to test the validity of the Gompertz model?
I'm using stata 17. Please see below, details about the data, code, and output:
First details about the data:
Weibull Model:
And lastly, the generalized gamma model (this is the point where Stata stops working or at least takes too long processing) and Wald test:
Again, my questions are:
1. What could be the reason it is taking so long? Could having added the nolog option be the reason? In cases like this in which Stata takes forever, is there any point in waiting for it to respond?
2. Is there an alternative (faster) way of testing the validity of the Weibull model?
3. Is there are a way to test the validity of the Gompertz model?
Thank you in advance!
I am doing a survival analysis for when a vacant plot becomes developed, i.e., failure = development. I'm using a parametric model instead of Cox proportional hazards model, because the assumption of proportional hazards is not met. From parametric tests, I chose Weibull because it has a better fit - AIC and BIC - than Exponential (Gompertz is slightly better but I don't know how to assess its validity). To test the validity of the Weibull model, I fit a generalized gamma model and test the hypothesis that k=0 (test for the appropriateness of the lognormal) and then test the hypothesis that k=1 (test for the appropriateness of the Weibull). This is what is suggested in the Stata manual in streg—Parametric survival models. However, so far when I run the generalized gamma model, Stata takes too long to process the command. I have left it running already for two hours and nothing happens. It's strange, because for all other models (other distributions) all goes smoothly. The only thing different from the other models is that I add the nolog option (see code below), so that may have something to do.
Therefore my questions are:
1. What could be the reason it is taking so long? Could having added the nolog option be the reason? In cases like this in which Stata takes forever, is there any point in waiting for it to respond?
2. Is there an alternative (faster) way of testing the validity of the Weibull model?
3. Is there are a way to test the validity of the Gompertz model?
I'm using stata 17. Please see below, details about the data, code, and output:
First details about the data:
Code:
input float(YEAR_TAXROLL ln_JST_VAL_W_I_P FAILURE ln_HEAT_AR_W Bedrooms Restrooms Stories ln_SQFT_W HH_INCOME DIST_CBD RESDU_3_VAR interact_DIST_UNCERT3) int ZCTA
2016 11.173612 0 7.305188 2 0 1 9.677214 47536 103287.6 .05121636 5290.015 32220
2016 11.37649 0 7.021084 3 0 1 9.408371 47536 103184.12 .04664454 4812.976 32220
2016 10.97931 0 6.927558 3 0 1 9.525151 47536 103415.26 .017120913 1770.5636 32220
2016 11.49621 0 7.825245 3 0 2 9.525151 47536 103515.03 .027632317 2860.36 32220
2016 11.246445 0 7.459915 3 0 1 9.525151 47536 103614.8 .02384571 2470.769 32220
2016 11.26819 0 7.389564 3 0 1 9.525151 47536 103624.84 .006640271 688.097 32220
2016 10.69538 0 6.579251 1 0 1 9.525151 47536 103525.06 .03760462 3893.021 32220
2016 11.23799 0 7.266828 3 0 1 9.525151 47536 103425.3 .03764908 3893.867 32220
2016 11.57281 0 7.53583 3 0 1 10.011175 47536 103841.3 .07288396 7568.366 32220
2016 10.996836 0 7.313221 2 0 1 9.262268 47536 103538.8 .15010385 15541.572 32220
. summarize ln_JST_VAL_W_I_P FAILURE ln_HEAT_AR_W Bedrooms Restrooms Stories ln_SQFT_W
> HH_INCOME DIST_CBD RESDU_3_VAR interact_DIST_UNCERT3 ZCTA
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
ln_JST_VAL~P | 2,635,969 11.6731 .726165 8.597553 16.58054
FAILURE | 4,639,075 .0035779 .0597082 0 1
ln_HEAT_AR_W | 3,511,930 7.394051 .3701783 6.52503 8.383662
Bedrooms | 4,200,561 3.034418 1.363163 0 201
Restrooms | 4,200,561 .1051492 1.586688 0 269
-------------+---------------------------------------------------------
Stories | 4,200,561 1.235384 34.9962 0 41408
ln_SQFT_W | 4,639,057 9.317953 .9546204 4.60517 13.02817
HH_INCOME | 4,639,075 51708.1 16154.66 15279 95819
DIST_CBD | 4,639,075 43541.89 20941.55 86.80797 133622.5
RESDU_3_VAR | 1,756,326 .0190573 .0393086 4.14e-14 1.832363
-------------+---------------------------------------------------------
interact_D~3 | 1,756,326 609.4353 1231.284 1.66e-09 70178.39
ZCTA | 4,639,075 32226.56 18.96891 32205 32277
Weibull Model:
Code:
. streg ln_HEAT_AR_W Bedrooms Restrooms Stories ln_SQFT_W HH_INCOME DIST_CBD RESDU_3_VA
> R interact_DIST_UNCERT3 i.ZCTA, dist (weibull)
Failure _d: FAILURE
Analysis time _t: YEAR_TAXROLL
Fitting constant-only model:
Iteration 0: log likelihood = -4096.4745
Iteration 1: log likelihood = -3654.7215
Iteration 2: log likelihood = -3211.7027
Iteration 3: log likelihood = -2765.4254
Iteration 4: log likelihood = -2311.6977
Iteration 5: log likelihood = -1848.9143
Iteration 6: log likelihood = -1438.38
Iteration 7: log likelihood = -1284.9652
Iteration 8: log likelihood = -1277.2977
Iteration 9: log likelihood = -1277.2794
Iteration 10: log likelihood = -1277.2794
Fitting full model:
Iteration 0: log likelihood = -1277.2794
Iteration 1: log likelihood = -948.8248
Iteration 2: log likelihood = -642.79445
Iteration 3: log likelihood = -504.70937
Iteration 4: log likelihood = -499.37099
Iteration 5: log likelihood = -499.15653
Iteration 6: log likelihood = -499.11293
Iteration 7: log likelihood = -499.10213
Iteration 8: log likelihood = -499.09989
Iteration 9: log likelihood = -499.09941
Iteration 10: log likelihood = -499.0993
Iteration 11: log likelihood = -499.09927
Weibull PH regression
No. of subjects = 1,756,326 Number of obs = 1,756,326
No. of failures = 441
Time at risk = 3541672827
LR chi2(35) = 1556.36
Log likelihood = -499.09927 Prob > chi2 = 0.0000
--------------------------------------------------------------------------------------
_t | Haz. ratio Std. err. z P>|z| [95% conf. interval]
---------------------+----------------------------------------------------------------
ln_HEAT_AR_W | 8.614132 1.565881 11.85 0.000 6.032256 12.30108
Bedrooms | 1.323708 .0438235 8.47 0.000 1.240543 1.412449
Restrooms | .0010985 1.637346 -0.00 0.996 0 .
Stories | 1.024752 .1179899 0.21 0.832 .8177327 1.28418
ln_SQFT_W | .4915562 .0500188 -6.98 0.000 .4026784 .6000507
HH_INCOME | .9998837 8.58e-06 -13.55 0.000 .9998669 .9999005
DIST_CBD | .9999556 8.21e-06 -5.41 0.000 .9999395 .9999717
RESDU_3_VAR | 20.80433 8.271109 7.63 0.000 9.544302 45.34852
interact_DIST_UNCE~3 | 1.000138 .0000112 12.24 0.000 1.000116 1.00016
|
ZCTA |
32206 | .0359272 .0187434 -6.38 0.000 .0129225 .0998849
32207 | 1.109801 .3035395 0.38 0.703 .6492838 1.896948
32208 | .455503 .1390679 -2.58 0.010 .2503883 .8286449
32209 | .1186568 .0391461 -6.46 0.000 .0621545 .2265233
32210 | 1.75124 .4617911 2.12 0.034 1.044454 2.936312
32211 | .5077825 .1761033 -1.95 0.051 .2573201 1.002032
32216 | .8141964 .4028421 -0.42 0.678 .3087293 2.147239
32217 | .5059987 .2037725 -1.69 0.091 .2298048 1.11414
32218 | 7.461173 2.960943 5.06 0.000 3.42776 16.24066
32219 | 6.19e-06 .0030522 -0.02 0.981 0 .
32220 | 33.8469 26.96108 4.42 0.000 7.103724 161.2693
32221 | 10.33397 6.930979 3.48 0.000 2.775669 38.47396
32222 | .0000875 .0649352 -0.01 0.990 0 .
32223 | 88.55096 53.39508 7.44 0.000 27.16 288.7066
32224 | 48.09658 28.90272 6.45 0.000 14.81156 156.1807
32225 | 26.39208 13.05314 6.62 0.000 10.0111 69.57696
32226 | 535.6981 326.4417 10.31 0.000 162.2624 1768.57
32233 | 270.8516 142.8178 10.62 0.000 96.36071 761.3123
32244 | 1.76229 1.045261 0.96 0.339 .5510707 5.635695
32246 | 8.378658 3.517927 5.06 0.000 3.679447 19.07947
32250 | 2057.785 1188.296 13.21 0.000 663.5322 6381.723
32254 | .2503607 .1082887 -3.20 0.001 .1072495 .5844361
32256 | 8.877869 5.157508 3.76 0.000 2.843229 27.72079
32257 | 9.945448 4.675599 4.89 0.000 3.957798 24.99166
32258 | 452.1762 294.3879 9.39 0.000 126.2221 1619.869
32277 | .3610453 .1779629 -2.07 0.039 .1374029 .9486971
|
_cons | 0 0 -29.08 0.000 0 0
---------------------+----------------------------------------------------------------
/ln_p | 7.03125 .0344609 204.04 0.000 6.963708 7.098792
---------------------+----------------------------------------------------------------
p | 1131.444 38.99061 1057.547 1210.504
1/p | .0008838 .0000305 .0008261 .0009456
--------------------------------------------------------------------------------------
Note: _cons estimates baseline hazard.
. //assess the fit of model with AIC
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 1,756,326 -1277.279 -499.0993 37 1072.199 1530.212
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.
And lastly, the generalized gamma model (this is the point where Stata stops working or at least takes too long processing) and Wald test:
Code:
HH_INCOME DIST_CBD RESDU_3_VAR interact_DIST_UNCERT3 i.ZCTA, dist (ggamma) nolog test [kappa]_cons = 1
Again, my questions are:
1. What could be the reason it is taking so long? Could having added the nolog option be the reason? In cases like this in which Stata takes forever, is there any point in waiting for it to respond?
2. Is there an alternative (faster) way of testing the validity of the Weibull model?
3. Is there are a way to test the validity of the Gompertz model?
Thank you in advance!
