Hello,
I am doing a survival analysis for when a vacant plot becomes developed, i.e., failure = development. I'm using a parametric model instead of Cox proportional hazards model, because the assumption of proportional hazards is not met. From parametric tests, I chose Weibull because it has a better fit - AIC and BIC - than Exponential (Gompertz is slightly better but I don't know how to assess its validity). To test the validity of the Weibull model, I fit a generalized gamma model and test the hypothesis that k=0 (test for the appropriateness of the lognormal) and then test the hypothesis that k=1 (test for the appropriateness of the Weibull). This is what is suggested in the Stata manual in streg—Parametric survival models. However, so far when I run the generalized gamma model, Stata takes too long to process the command. I have left it running already for two hours and nothing happens. It's strange, because for all other models (other distributions) all goes smoothly. The only thing different from the other models is that I add the nolog option (see code below), so that may have something to do.
Therefore my questions are:
1. What could be the reason it is taking so long? Could having added the nolog option be the reason? In cases like this in which Stata takes forever, is there any point in waiting for it to respond?
2. Is there an alternative (faster) way of testing the validity of the Weibull model?
3. Is there are a way to test the validity of the Gompertz model?
I'm using stata 17. Please see below, details about the data, code, and output:
First details about the data:
Weibull Model:
And lastly, the generalized gamma model (this is the point where Stata stops working or at least takes too long processing) and Wald test:
Again, my questions are:
1. What could be the reason it is taking so long? Could having added the nolog option be the reason? In cases like this in which Stata takes forever, is there any point in waiting for it to respond?
2. Is there an alternative (faster) way of testing the validity of the Weibull model?
3. Is there are a way to test the validity of the Gompertz model?
Thank you in advance!
I am doing a survival analysis for when a vacant plot becomes developed, i.e., failure = development. I'm using a parametric model instead of Cox proportional hazards model, because the assumption of proportional hazards is not met. From parametric tests, I chose Weibull because it has a better fit - AIC and BIC - than Exponential (Gompertz is slightly better but I don't know how to assess its validity). To test the validity of the Weibull model, I fit a generalized gamma model and test the hypothesis that k=0 (test for the appropriateness of the lognormal) and then test the hypothesis that k=1 (test for the appropriateness of the Weibull). This is what is suggested in the Stata manual in streg—Parametric survival models. However, so far when I run the generalized gamma model, Stata takes too long to process the command. I have left it running already for two hours and nothing happens. It's strange, because for all other models (other distributions) all goes smoothly. The only thing different from the other models is that I add the nolog option (see code below), so that may have something to do.
Therefore my questions are:
1. What could be the reason it is taking so long? Could having added the nolog option be the reason? In cases like this in which Stata takes forever, is there any point in waiting for it to respond?
2. Is there an alternative (faster) way of testing the validity of the Weibull model?
3. Is there are a way to test the validity of the Gompertz model?
I'm using stata 17. Please see below, details about the data, code, and output:
First details about the data:
Code:
input float(YEAR_TAXROLL ln_JST_VAL_W_I_P FAILURE ln_HEAT_AR_W Bedrooms Restrooms Stories ln_SQFT_W HH_INCOME DIST_CBD RESDU_3_VAR interact_DIST_UNCERT3) int ZCTA 2016 11.173612 0 7.305188 2 0 1 9.677214 47536 103287.6 .05121636 5290.015 32220 2016 11.37649 0 7.021084 3 0 1 9.408371 47536 103184.12 .04664454 4812.976 32220 2016 10.97931 0 6.927558 3 0 1 9.525151 47536 103415.26 .017120913 1770.5636 32220 2016 11.49621 0 7.825245 3 0 2 9.525151 47536 103515.03 .027632317 2860.36 32220 2016 11.246445 0 7.459915 3 0 1 9.525151 47536 103614.8 .02384571 2470.769 32220 2016 11.26819 0 7.389564 3 0 1 9.525151 47536 103624.84 .006640271 688.097 32220 2016 10.69538 0 6.579251 1 0 1 9.525151 47536 103525.06 .03760462 3893.021 32220 2016 11.23799 0 7.266828 3 0 1 9.525151 47536 103425.3 .03764908 3893.867 32220 2016 11.57281 0 7.53583 3 0 1 10.011175 47536 103841.3 .07288396 7568.366 32220 2016 10.996836 0 7.313221 2 0 1 9.262268 47536 103538.8 .15010385 15541.572 32220 . summarize ln_JST_VAL_W_I_P FAILURE ln_HEAT_AR_W Bedrooms Restrooms Stories ln_SQFT_W > HH_INCOME DIST_CBD RESDU_3_VAR interact_DIST_UNCERT3 ZCTA Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- ln_JST_VAL~P | 2,635,969 11.6731 .726165 8.597553 16.58054 FAILURE | 4,639,075 .0035779 .0597082 0 1 ln_HEAT_AR_W | 3,511,930 7.394051 .3701783 6.52503 8.383662 Bedrooms | 4,200,561 3.034418 1.363163 0 201 Restrooms | 4,200,561 .1051492 1.586688 0 269 -------------+--------------------------------------------------------- Stories | 4,200,561 1.235384 34.9962 0 41408 ln_SQFT_W | 4,639,057 9.317953 .9546204 4.60517 13.02817 HH_INCOME | 4,639,075 51708.1 16154.66 15279 95819 DIST_CBD | 4,639,075 43541.89 20941.55 86.80797 133622.5 RESDU_3_VAR | 1,756,326 .0190573 .0393086 4.14e-14 1.832363 -------------+--------------------------------------------------------- interact_D~3 | 1,756,326 609.4353 1231.284 1.66e-09 70178.39 ZCTA | 4,639,075 32226.56 18.96891 32205 32277
Weibull Model:
Code:
. streg ln_HEAT_AR_W Bedrooms Restrooms Stories ln_SQFT_W HH_INCOME DIST_CBD RESDU_3_VA > R interact_DIST_UNCERT3 i.ZCTA, dist (weibull) Failure _d: FAILURE Analysis time _t: YEAR_TAXROLL Fitting constant-only model: Iteration 0: log likelihood = -4096.4745 Iteration 1: log likelihood = -3654.7215 Iteration 2: log likelihood = -3211.7027 Iteration 3: log likelihood = -2765.4254 Iteration 4: log likelihood = -2311.6977 Iteration 5: log likelihood = -1848.9143 Iteration 6: log likelihood = -1438.38 Iteration 7: log likelihood = -1284.9652 Iteration 8: log likelihood = -1277.2977 Iteration 9: log likelihood = -1277.2794 Iteration 10: log likelihood = -1277.2794 Fitting full model: Iteration 0: log likelihood = -1277.2794 Iteration 1: log likelihood = -948.8248 Iteration 2: log likelihood = -642.79445 Iteration 3: log likelihood = -504.70937 Iteration 4: log likelihood = -499.37099 Iteration 5: log likelihood = -499.15653 Iteration 6: log likelihood = -499.11293 Iteration 7: log likelihood = -499.10213 Iteration 8: log likelihood = -499.09989 Iteration 9: log likelihood = -499.09941 Iteration 10: log likelihood = -499.0993 Iteration 11: log likelihood = -499.09927 Weibull PH regression No. of subjects = 1,756,326 Number of obs = 1,756,326 No. of failures = 441 Time at risk = 3541672827 LR chi2(35) = 1556.36 Log likelihood = -499.09927 Prob > chi2 = 0.0000 -------------------------------------------------------------------------------------- _t | Haz. ratio Std. err. z P>|z| [95% conf. interval] ---------------------+---------------------------------------------------------------- ln_HEAT_AR_W | 8.614132 1.565881 11.85 0.000 6.032256 12.30108 Bedrooms | 1.323708 .0438235 8.47 0.000 1.240543 1.412449 Restrooms | .0010985 1.637346 -0.00 0.996 0 . Stories | 1.024752 .1179899 0.21 0.832 .8177327 1.28418 ln_SQFT_W | .4915562 .0500188 -6.98 0.000 .4026784 .6000507 HH_INCOME | .9998837 8.58e-06 -13.55 0.000 .9998669 .9999005 DIST_CBD | .9999556 8.21e-06 -5.41 0.000 .9999395 .9999717 RESDU_3_VAR | 20.80433 8.271109 7.63 0.000 9.544302 45.34852 interact_DIST_UNCE~3 | 1.000138 .0000112 12.24 0.000 1.000116 1.00016 | ZCTA | 32206 | .0359272 .0187434 -6.38 0.000 .0129225 .0998849 32207 | 1.109801 .3035395 0.38 0.703 .6492838 1.896948 32208 | .455503 .1390679 -2.58 0.010 .2503883 .8286449 32209 | .1186568 .0391461 -6.46 0.000 .0621545 .2265233 32210 | 1.75124 .4617911 2.12 0.034 1.044454 2.936312 32211 | .5077825 .1761033 -1.95 0.051 .2573201 1.002032 32216 | .8141964 .4028421 -0.42 0.678 .3087293 2.147239 32217 | .5059987 .2037725 -1.69 0.091 .2298048 1.11414 32218 | 7.461173 2.960943 5.06 0.000 3.42776 16.24066 32219 | 6.19e-06 .0030522 -0.02 0.981 0 . 32220 | 33.8469 26.96108 4.42 0.000 7.103724 161.2693 32221 | 10.33397 6.930979 3.48 0.000 2.775669 38.47396 32222 | .0000875 .0649352 -0.01 0.990 0 . 32223 | 88.55096 53.39508 7.44 0.000 27.16 288.7066 32224 | 48.09658 28.90272 6.45 0.000 14.81156 156.1807 32225 | 26.39208 13.05314 6.62 0.000 10.0111 69.57696 32226 | 535.6981 326.4417 10.31 0.000 162.2624 1768.57 32233 | 270.8516 142.8178 10.62 0.000 96.36071 761.3123 32244 | 1.76229 1.045261 0.96 0.339 .5510707 5.635695 32246 | 8.378658 3.517927 5.06 0.000 3.679447 19.07947 32250 | 2057.785 1188.296 13.21 0.000 663.5322 6381.723 32254 | .2503607 .1082887 -3.20 0.001 .1072495 .5844361 32256 | 8.877869 5.157508 3.76 0.000 2.843229 27.72079 32257 | 9.945448 4.675599 4.89 0.000 3.957798 24.99166 32258 | 452.1762 294.3879 9.39 0.000 126.2221 1619.869 32277 | .3610453 .1779629 -2.07 0.039 .1374029 .9486971 | _cons | 0 0 -29.08 0.000 0 0 ---------------------+---------------------------------------------------------------- /ln_p | 7.03125 .0344609 204.04 0.000 6.963708 7.098792 ---------------------+---------------------------------------------------------------- p | 1131.444 38.99061 1057.547 1210.504 1/p | .0008838 .0000305 .0008261 .0009456 -------------------------------------------------------------------------------------- Note: _cons estimates baseline hazard. . //assess the fit of model with AIC . estat ic Akaike's information criterion and Bayesian information criterion ----------------------------------------------------------------------------- Model | N ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- . | 1,756,326 -1277.279 -499.0993 37 1072.199 1530.212 ----------------------------------------------------------------------------- Note: BIC uses N = number of observations. See [R] BIC note.
And lastly, the generalized gamma model (this is the point where Stata stops working or at least takes too long processing) and Wald test:
Code:
HH_INCOME DIST_CBD RESDU_3_VAR interact_DIST_UNCERT3 i.ZCTA, dist (ggamma) nolog test [kappa]_cons = 1
Again, my questions are:
1. What could be the reason it is taking so long? Could having added the nolog option be the reason? In cases like this in which Stata takes forever, is there any point in waiting for it to respond?
2. Is there an alternative (faster) way of testing the validity of the Weibull model?
3. Is there are a way to test the validity of the Gompertz model?
Thank you in advance!