Hello,
I implemented two separate two-part models to analyze the variable oopdental_costs, which represents total dental expenditures. For the first model, I used a probit model to account for the excess zeros, followed by a gamma regression. For the second model, I used a logit model for the excess zeros, followed by a Poisson regression.
To choose between these two models, are AIC and BIC the only model selection criteria available, or are there other metrics I should consider? I have attached the output with the estimation results for both models. Any advice you could provide would be greatly appreciated.
Thank you!
I implemented two separate two-part models to analyze the variable oopdental_costs, which represents total dental expenditures. For the first model, I used a probit model to account for the excess zeros, followed by a gamma regression. For the second model, I used a logit model for the excess zeros, followed by a Poisson regression.
To choose between these two models, are AIC and BIC the only model selection criteria available, or are there other metrics I should consider? I have attached the output with the estimation results for both models. Any advice you could provide would be greatly appreciated.
Thank you!
Code:
. svyset raehsamp [pweight=new_weight], strata (raestrat) singleunit(centered) Sampling weights: new_weight VCE: linearized Single unit: centered Strata 1: raestrat Sampling unit 1: raehsamp FPC 1: <zero> . . svy:twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.education i.veteran i.mothered > i.dentalinsurance_wave1 /// > i.QuantHI_wave1 i.Quant_wealth_wave1 /// > i.smoke_now c.chronicdisease_wave1 i.dentistvisit_wave1, firstpart(probit) secondpart(glm, family(gamma) > link(log)) (running twopm on estimation sample) Survey data analysis Number of strata = 56 Number of obs = 12,388 Number of PSUs = 112 Population size = 74,493,528 Design df = 56 F(25, 32) = 122.19 Prob > F = 0.0000 ----------------------------------------------------------------------------------------- | Linearized oopdental_costs | Coefficient std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- probit | inc_d | Yes | .0645439 .0739965 0.87 0.387 -.0836888 .2127766 | endentulism | Yes | -.6190539 .0541305 -11.44 0.000 -.7274904 -.5106174 | race | Black | -.2975578 .0570491 -5.22 0.000 -.411841 -.1832747 Hispanic | -.0687343 .0779752 -0.88 0.382 -.2249375 .0874688 Other | -.2058506 .0875006 -2.35 0.022 -.3811354 -.0305658 | age_cat | 60-69 | .0013122 .0438793 0.03 0.976 -.0865887 .0892131 70-79 | .1733067 .0488945 3.54 0.001 .0753593 .2712542 80+ | .1706571 .0554985 3.07 0.003 .0594802 .281834 | male | Male | -.1026983 .0412219 -2.49 0.016 -.1852756 -.020121 | education | 2.ged | .1715042 .098151 1.75 0.086 -.0251158 .3681243 3.high-school graduate | .1273984 .0727823 1.75 0.086 -.0184021 .2731989 4.some college | .2038584 .0836009 2.44 0.018 .0363857 .3713312 5.college and above | .2661808 .0814309 3.27 0.002 .1030552 .4293064 | veteran | Yes | -.1071293 .0422574 -2.54 0.014 -.1917809 -.0224776 | mothered | High School or Higher | .0434551 .0344728 1.26 0.213 -.0256021 .1125123 | dentalinsurance_wave1 | Yes | -.313593 .0420056 -7.47 0.000 -.3977403 -.2294457 | QuantHI_wave1 | 2 | .1684179 .0575311 2.93 0.005 .0531692 .2836665 3 | .2413506 .0550388 4.39 0.000 .1310947 .3516066 4 | .248139 .0621368 3.99 0.000 .123664 .3726141 | Quant_wealth_wave1 | 2 | .1423688 .0512211 2.78 0.007 .0397606 .244977 3 | .2690436 .0517679 5.20 0.000 .1653401 .3727471 4 | .3289222 .06456 5.09 0.000 .1995928 .4582515 | smoke_now | Currently Smokes | .0865485 .0592402 1.46 0.150 -.0321239 .2052208 chronicdisease_wave1 | .0373084 .0158548 2.35 0.022 .0055474 .0690693 | dentistvisit_wave1 | 1.yes | 1.840132 .039369 46.74 0.000 1.761266 1.918997 _cons | -1.487437 .0863191 -17.23 0.000 -1.660355 -1.314519 ------------------------+---------------------------------------------------------------- glm | inc_d | Yes | -.0607203 .0955611 -0.64 0.528 -.2521522 .1307116 | endentulism | Yes | .5369626 .1633298 3.29 0.002 .2097738 .8641514 | race | Black | .0966711 .0974487 0.99 0.325 -.0985421 .2918842 Hispanic | .4307845 .1092938 3.94 0.000 .2118428 .6497263 Other | .2761951 .1443239 1.91 0.061 -.0129205 .5653107 | age_cat | 60-69 | .1501501 .0721103 2.08 0.042 .0056957 .2946045 70-79 | .0623833 .0750332 0.83 0.409 -.0879263 .2126928 80+ | .1482708 .0998309 1.49 0.143 -.0517146 .3482562 | male | Male | -.1250631 .0561875 -2.23 0.030 -.2376202 -.012506 | education | 2.ged | .1059467 .1488747 0.71 0.480 -.1922851 .4041784 3.high-school graduate | -.0370282 .1000928 -0.37 0.713 -.2375382 .1634819 4.some college | .0022214 .0941311 0.02 0.981 -.1863458 .1907886 5.college and above | .0358856 .0949888 0.38 0.707 -.1543997 .226171 | veteran | Yes | .1139162 .0707427 1.61 0.113 -.0277984 .2556309 | mothered | High School or Higher | .1328508 .0449431 2.96 0.005 .0428189 .2228827 | dentalinsurance_wave1 | Yes | -.3424696 .0439136 -7.80 0.000 -.430439 -.2545001 | QuantHI_wave1 | 2 | .2043584 .0795473 2.57 0.013 .0450061 .3637107 3 | .2008274 .0784675 2.56 0.013 .0436382 .3580167 4 | .2995638 .0819811 3.65 0.001 .1353359 .4637918 | Quant_wealth_wave1 | 2 | .1517353 .0831079 1.83 0.073 -.0147499 .3182204 3 | .1427968 .0843859 1.69 0.096 -.0262486 .3118421 4 | .279117 .0640514 4.36 0.000 .1508066 .4074273 | smoke_now | Currently Smokes | .2160437 .0992925 2.18 0.034 .017137 .4149504 chronicdisease_wave1 | .0082414 .0233858 0.35 0.726 -.038606 .0550888 | dentistvisit_wave1 | 1.yes | .192387 .0747353 2.57 0.013 .0426742 .3420998 _cons | 6.352637 .1614073 39.36 0.000 6.029299 6.675974 ----------------------------------------------------------------------------------------- . end of do-file . ereturn list scalars: e(N_glm) = 6334 e(k_glm) = 39 e(k_eq_glm) = 1 e(k_eq_model_glm) = 0 e(k_dv_glm) = 1 e(k_autoCns_glm) = 13 e(df_m_glm) = 25 e(df_glm) = 6308 e(phi_glm) = 16822.68582098085 e(aic_glm) = 107364.1783727158 e(bic_glm) = 66219638.84230246 e(ll_glm) = -340022326.9063911 e(chi2_glm) = 189.6021185551432 e(p_glm) = 3.02700642579e-27 e(deviance_glm) = 66274857.101331 e(deviance_s_glm) = 3939.612128919069 e(deviance_p_glm) = 106117502.1587472 e(deviance_ps_glm) = 6308 e(dispers_glm) = 10506.47702938031 e(df_r) = 56 e(rank) = 52 e(p) = 3.18571719565e-25 e(F) = 122.1897826820111 e(df_m) = 25 e(k_eq) = 2 e(census) = 0 e(singleton) = 0 e(N_strata_omit) = 0 e(N_psu) = 112 e(N_strata) = 56 e(N_pop) = 74493528.00646973 e(N) = 12388 e(stages) = 1 e(dispers_p_glm) = 16822.68582098085 e(dispers_ps_glm) = 1 e(nbml_glm) = 0 e(vf_glm) = 1 e(power_glm) = 0 e(rank_glm) = 26 e(ic_glm) = 4 e(rc_glm) = 0 e(converged_glm) = 1 e(df_r_glm) = 55 e(N_probit) = 11495 e(N_cds_probit) = 0 e(N_cdf_probit) = 0 e(k_probit) = 39 e(k_eq_probit) = 1 e(k_eq_model_probit) = 1 e(k_dv_probit) = 1 e(k_autoCns_probit) = 13 e(df_m_probit) = 25 e(r2_p_probit) = .3393799296638733 e(ll_probit) = -33581250.83605153 e(ll_0_probit) = -50832925.52550701 e(chi2_probit) = 34503349.37891096 e(p_probit) = 0 e(rank_probit) = 26 e(ic_probit) = 4 e(rc_probit) = 0 e(converged_probit) = 1 e(df_r_probit) = 55 e(dispers_s_glm) = .6245421891120908 macros: e(cmd) : "twopm" e(cmdline) : "svy :twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.e.." e(prefix) : "svy" e(cmdname) : "twopm" e(command) : "twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.educat.." e(wexp) : "= new_weight" e(wtype) : "pweight" e(estat_cmd) : "svy_estat" e(vce) : "linearized" e(vcetype) : "Linearized" e(title) : "Survey data analysis" e(wvar) : "new_weight" e(singleunit) : "centered" e(su1) : "raehsamp" e(strata1) : "raestrat" e(properties) : "b V" e(depvar) : "oopdental_costs" e(predict) : "twopm_p" e(eqnames) : "probit glm" e(marginsok) : "default normal duan" e(chi2type_probit) : "LR" e(opt_probit) : "moptimize" e(which_probit) : "max" e(ml_method_probit) : "d2" e(user_probit) : "mopt__probit_d2()" e(technique_probit) : "nr" e(singularHmethod_p robit) : "m-marquardt" e(crittype_probit) : "log likelihood" e(varfunc_glm) : "glim_v4" e(varfunct_glm) : "Gamma" e(varfuncf_glm) : "u^2" e(link_glm) : "glim_l03" e(linkt_glm) : "Log" e(linkf_glm) : "ln(u)" e(m_glm) : "1" e(chi2type_glm) : "Wald" e(hac_lag_glm) : "6332" e(opt_glm) : "moptimize" e(opt1_glm) : "ML" e(which_glm) : "max" e(ml_method_glm) : "e2" e(user_glm) : "glim_lf" e(technique_glm) : "nr" e(singularHmethod_g lm) : "m-marquardt" e(crittype_glm) : "log likelihood" e(properties_glm) : "b V" e(predict_glm) : "glim_p" matrices: e(b) : 1 x 78 e(V) : 78 x 78 e(V_modelbased) : 78 x 78 e(V_srs) : 78 x 78 e(_N_strata_certain) : 1 x 1 e(_N_strata_single) : 1 x 1 e(_N_strata) : 1 x 1 functions: e(sample)
Code:
. . *********Poisson model without transforming the outcome*********** . svy:twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.education i.veteran i.mothered > i.dentalinsurance_wave1 /// > i.QuantHI_wave1 i.Quant_wealth_wave1, firstpart(logit) secondpart(glm, family(poisson) link(log)) (running twopm on estimation sample) Survey data analysis Number of strata = 56 Number of obs = 12,428 Number of PSUs = 112 Population size = 74,668,884 Design df = 56 F(22, 35) = 51.71 Prob > F = 0.0000 ----------------------------------------------------------------------------------------- | Linearized oopdental_costs | Coefficient std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- logit | inc_d | Yes | -.0324669 .0870609 -0.37 0.711 -.2068708 .141937 | endentulism | Yes | -1.647131 .0945394 -17.42 0.000 -1.836517 -1.457746 | race | Black | -.5998833 .0830412 -7.22 0.000 -.7662348 -.4335318 Hispanic | -.1958862 .1317737 -1.49 0.143 -.4598607 .0680883 Other | -.2581573 .1349011 -1.91 0.061 -.5283967 .0120821 | age_cat | 60-69 | .0637076 .0674241 0.94 0.349 -.0713591 .1987743 70-79 | .3546848 .0845814 4.19 0.000 .1852479 .5241217 80+ | .4868778 .0904655 5.38 0.000 .3056535 .668102 | male | Male | -.3350102 .0633724 -5.29 0.000 -.4619603 -.20806 | education | 2.ged | .2863622 .1089153 2.63 0.011 .0681785 .5045458 3.high-school graduate | .4090348 .0828033 4.94 0.000 .2431598 .5749098 4.some college | .5503176 .1036014 5.31 0.000 .342779 .7578562 5.college and above | .8449879 .101279 8.34 0.000 .6421018 1.047874 | veteran | Yes | -.0829022 .0743479 -1.12 0.270 -.2318389 .0660345 | mothered | High School or Higher | .0747755 .0477941 1.56 0.123 -.0209675 .1705185 | dentalinsurance_wave1 | Yes | .0780876 .0632878 1.23 0.222 -.048693 .2048682 | QuantHI_wave1 | 2 | .274353 .091446 3.00 0.004 .0911647 .4575414 3 | .5051813 .0857925 5.89 0.000 .3333181 .6770444 4 | .6032085 .0997957 6.04 0.000 .4032937 .8031233 | Quant_wealth_wave1 | 2 | .3379783 .0809602 4.17 0.000 .1757954 .5001611 3 | .7228379 .0822469 8.79 0.000 .5580776 .8875982 4 | .9272186 .1037654 8.94 0.000 .7193515 1.135086 | _cons | -.9422722 .1142121 -8.25 0.000 -1.171067 -.7134779 ------------------------+---------------------------------------------------------------- glm | inc_d | Yes | -.0423142 .1006743 -0.42 0.676 -.243989 .1593607 | endentulism | Yes | .6211031 .2320901 2.68 0.010 .1561708 1.086035 | race | Black | .0925277 .1192009 0.78 0.441 -.1462604 .3313159 Hispanic | .4345403 .1278235 3.40 0.001 .1784791 .6906015 Other | .2232119 .137481 1.62 0.110 -.0521955 .4986194 | age_cat | 60-69 | .1127104 .0841904 1.34 0.186 -.0559433 .2813641 70-79 | .0125822 .0844499 0.15 0.882 -.1565913 .1817556 80+ | .13267 .1118596 1.19 0.241 -.0914116 .3567517 | male | Male | -.0801105 .0722479 -1.11 0.272 -.2248405 .0646195 | education | 2.ged | .1624472 .1446892 1.12 0.266 -.1274001 .4522944 3.high-school graduate | .028098 .1042183 0.27 0.788 -.1806764 .2368724 4.some college | .0900582 .0977156 0.92 0.361 -.1056896 .2858061 5.college and above | .1608487 .1147416 1.40 0.166 -.0690063 .3907036 | veteran | Yes | .0874433 .0781621 1.12 0.268 -.0691342 .2440208 | mothered | High School or Higher | .1135479 .0485165 2.34 0.023 .0163577 .2107381 | dentalinsurance_wave1 | Yes | -.3408696 .0607332 -5.61 0.000 -.4625328 -.2192064 | QuantHI_wave1 | 2 | .206534 .0895251 2.31 0.025 .0271937 .3858744 3 | .1986546 .0928171 2.14 0.037 .0127196 .3845896 4 | .3119243 .1018714 3.06 0.003 .1078514 .5159972 | Quant_wealth_wave1 | 2 | .209604 .1095054 1.91 0.061 -.0097616 .4289697 3 | .1411713 .0853292 1.65 0.104 -.0297636 .3121062 4 | .2905005 .0688465 4.22 0.000 .1525844 .4284166 | _cons | 6.473532 .1593284 40.63 0.000 6.154359 6.792706 ----------------------------------------------------------------------------------------- . end of do-file . ereturn list scalars: e(N_glm) = 6356 e(k_glm) = 34 e(k_eq_glm) = 1 e(k_eq_model_glm) = 0 e(k_dv_glm) = 1 e(k_autoCns_glm) = 11 e(df_m_glm) = 22 e(df_glm) = 6333 e(phi_glm) = 1 e(aic_glm) = 10979155.92527878 e(bic_glm) = 69440000177.63506 e(ll_glm) = -34891757507.53598 e(chi2_glm) = 3780683391.037862 e(p_glm) = 0 e(deviance_glm) = 69440055636.69467 e(deviance_s_glm) = 69440055636.69467 e(deviance_p_glm) = 125197877938.4511 e(deviance_ps_glm) = 125197877938.4511 e(dispers_glm) = 10964796.40560472 e(df_r) = 56 e(rank) = 46 e(p) = 1.61865195513e-20 e(F) = 51.71442009163804 e(df_m) = 22 e(k_eq) = 2 e(census) = 0 e(singleton) = 0 e(N_strata_omit) = 0 e(N_psu) = 112 e(N_strata) = 56 e(N_pop) = 74668884.21630859 e(N) = 12428 e(stages) = 1 e(dispers_p_glm) = 19769126.47062231 e(dispers_ps_glm) = 19769126.47062231 e(nbml_glm) = 0 e(vf_glm) = 1 e(power_glm) = 0 e(rank_glm) = 23 e(ic_glm) = 4 e(rc_glm) = 0 e(converged_glm) = 1 e(df_r_glm) = 55 e(N_logit) = 11535 e(N_cds_logit) = 0 e(N_cdf_logit) = 0 e(k_logit) = 34 e(k_eq_logit) = 1 e(k_eq_model_logit) = 1 e(k_dv_logit) = 1 e(k_autoCns_logit) = 11 e(df_m_logit) = 22 e(r2_p_logit) = .1457996861128672 e(ll_logit) = -43527249.33444504 e(ll_0_logit) = -50956723.64760613 e(chi2_logit) = 14858948.62632218 e(p_logit) = 0 e(rank_logit) = 23 e(ic_logit) = 4 e(rc_logit) = 0 e(converged_logit) = 1 e(df_r_logit) = 55 e(dispers_s_glm) = 10964796.40560472 macros: e(cmd) : "twopm" e(cmdline) : "svy :twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.e.." e(prefix) : "svy" e(cmdname) : "twopm" e(command) : "twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.educat.." e(wexp) : "= new_weight" e(wtype) : "pweight" e(estat_cmd) : "svy_estat" e(vce) : "linearized" e(vcetype) : "Linearized" e(title) : "Survey data analysis" e(wvar) : "new_weight" e(singleunit) : "centered" e(su1) : "raehsamp" e(strata1) : "raestrat" e(properties) : "b V" e(depvar) : "oopdental_costs" e(predict) : "twopm_p" e(eqnames) : "logit glm" e(marginsok) : "default normal duan" e(chi2type_logit) : "LR" e(opt_logit) : "moptimize" e(which_logit) : "max" e(ml_method_logit) : "d2" e(user_logit) : "mopt__logit_d2()" e(technique_logit) : "nr" e(singularHmethod_l ogit) : "m-marquardt" e(crittype_logit) : "log likelihood" e(varfunc_glm) : "glim_v3" e(varfunct_glm) : "Poisson" e(varfuncf_glm) : "u" e(link_glm) : "glim_l03" e(linkt_glm) : "Log" e(linkf_glm) : "ln(u)" e(m_glm) : "1" e(chi2type_glm) : "Wald" e(hac_lag_glm) : "6354" e(opt_glm) : "moptimize" e(opt1_glm) : "ML" e(which_glm) : "max" e(ml_method_glm) : "e2" e(user_glm) : "glim_lf" e(technique_glm) : "nr" e(singularHmethod_g lm) : "m-marquardt" e(crittype_glm) : "log likelihood" e(properties_glm) : "b V" e(predict_glm) : "glim_p" matrices: e(b) : 1 x 68 e(V) : 68 x 68 e(V_modelbased) : 68 x 68 e(V_srs) : 68 x 68 e(_N_strata_certain) : 1 x 1 e(_N_strata_single) : 1 x 1 e(_N_strata) : 1 x 1 functions: e(sample)
Comment