Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Model selection for two-part models

    Hello,

    I implemented two separate two-part models to analyze the variable oopdental_costs, which represents total dental expenditures. For the first model, I used a probit model to account for the excess zeros, followed by a gamma regression. For the second model, I used a logit model for the excess zeros, followed by a Poisson regression.

    To choose between these two models, are AIC and BIC the only model selection criteria available, or are there other metrics I should consider? I have attached the output with the estimation results for both models. Any advice you could provide would be greatly appreciated.

    Thank you!

    Code:
    
    . svyset raehsamp [pweight=new_weight], strata (raestrat) singleunit(centered)
    
    Sampling weights: new_weight
                 VCE: linearized
         Single unit: centered
            Strata 1: raestrat
     Sampling unit 1: raehsamp
               FPC 1: <zero>
    
    . 
    . svy:twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.education i.veteran i.mothered
    >  i.dentalinsurance_wave1 ///
    >  i.QuantHI_wave1 i.Quant_wealth_wave1 /// 
    > i.smoke_now c.chronicdisease_wave1 i.dentistvisit_wave1, firstpart(probit) secondpart(glm, family(gamma)
    >  link(log))
    (running twopm on estimation sample)
    
    Survey data analysis
    
    Number of strata =  56                            Number of obs   =     12,388
    Number of PSUs   = 112                            Population size = 74,493,528
                                                      Design df       =         56
                                                      F(25, 32)       =     122.19
                                                      Prob > F        =     0.0000
    
    -----------------------------------------------------------------------------------------
                            |             Linearized
            oopdental_costs | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ------------------------+----------------------------------------------------------------
    probit                  |
                      inc_d |
                       Yes  |   .0645439   .0739965     0.87   0.387    -.0836888    .2127766
                            |
                endentulism |
                       Yes  |  -.6190539   .0541305   -11.44   0.000    -.7274904   -.5106174
                            |
                       race |
                     Black  |  -.2975578   .0570491    -5.22   0.000     -.411841   -.1832747
                  Hispanic  |  -.0687343   .0779752    -0.88   0.382    -.2249375    .0874688
                     Other  |  -.2058506   .0875006    -2.35   0.022    -.3811354   -.0305658
                            |
                    age_cat |
                     60-69  |   .0013122   .0438793     0.03   0.976    -.0865887    .0892131
                     70-79  |   .1733067   .0488945     3.54   0.001     .0753593    .2712542
                       80+  |   .1706571   .0554985     3.07   0.003     .0594802     .281834
                            |
                       male |
                      Male  |  -.1026983   .0412219    -2.49   0.016    -.1852756    -.020121
                            |
                  education |
                     2.ged  |   .1715042    .098151     1.75   0.086    -.0251158    .3681243
    3.high-school graduate  |   .1273984   .0727823     1.75   0.086    -.0184021    .2731989
            4.some college  |   .2038584   .0836009     2.44   0.018     .0363857    .3713312
       5.college and above  |   .2661808   .0814309     3.27   0.002     .1030552    .4293064
                            |
                    veteran |
                       Yes  |  -.1071293   .0422574    -2.54   0.014    -.1917809   -.0224776
                            |
                   mothered |
     High School or Higher  |   .0434551   .0344728     1.26   0.213    -.0256021    .1125123
                            |
      dentalinsurance_wave1 |
                       Yes  |   -.313593   .0420056    -7.47   0.000    -.3977403   -.2294457
                            |
              QuantHI_wave1 |
                         2  |   .1684179   .0575311     2.93   0.005     .0531692    .2836665
                         3  |   .2413506   .0550388     4.39   0.000     .1310947    .3516066
                         4  |    .248139   .0621368     3.99   0.000      .123664    .3726141
                            |
         Quant_wealth_wave1 |
                         2  |   .1423688   .0512211     2.78   0.007     .0397606     .244977
                         3  |   .2690436   .0517679     5.20   0.000     .1653401    .3727471
                         4  |   .3289222     .06456     5.09   0.000     .1995928    .4582515
                            |
                  smoke_now |
          Currently Smokes  |   .0865485   .0592402     1.46   0.150    -.0321239    .2052208
       chronicdisease_wave1 |   .0373084   .0158548     2.35   0.022     .0055474    .0690693
                            |
         dentistvisit_wave1 |
                     1.yes  |   1.840132    .039369    46.74   0.000     1.761266    1.918997
                      _cons |  -1.487437   .0863191   -17.23   0.000    -1.660355   -1.314519
    ------------------------+----------------------------------------------------------------
    glm                     |
                      inc_d |
                       Yes  |  -.0607203   .0955611    -0.64   0.528    -.2521522    .1307116
                            |
                endentulism |
                       Yes  |   .5369626   .1633298     3.29   0.002     .2097738    .8641514
                            |
                       race |
                     Black  |   .0966711   .0974487     0.99   0.325    -.0985421    .2918842
                  Hispanic  |   .4307845   .1092938     3.94   0.000     .2118428    .6497263
                     Other  |   .2761951   .1443239     1.91   0.061    -.0129205    .5653107
                            |
                    age_cat |
                     60-69  |   .1501501   .0721103     2.08   0.042     .0056957    .2946045
                     70-79  |   .0623833   .0750332     0.83   0.409    -.0879263    .2126928
                       80+  |   .1482708   .0998309     1.49   0.143    -.0517146    .3482562
                            |
                       male |
                      Male  |  -.1250631   .0561875    -2.23   0.030    -.2376202    -.012506
                            |
                  education |
                     2.ged  |   .1059467   .1488747     0.71   0.480    -.1922851    .4041784
    3.high-school graduate  |  -.0370282   .1000928    -0.37   0.713    -.2375382    .1634819
            4.some college  |   .0022214   .0941311     0.02   0.981    -.1863458    .1907886
       5.college and above  |   .0358856   .0949888     0.38   0.707    -.1543997     .226171
                            |
                    veteran |
                       Yes  |   .1139162   .0707427     1.61   0.113    -.0277984    .2556309
                            |
                   mothered |
     High School or Higher  |   .1328508   .0449431     2.96   0.005     .0428189    .2228827
                            |
      dentalinsurance_wave1 |
                       Yes  |  -.3424696   .0439136    -7.80   0.000     -.430439   -.2545001
                            |
              QuantHI_wave1 |
                         2  |   .2043584   .0795473     2.57   0.013     .0450061    .3637107
                         3  |   .2008274   .0784675     2.56   0.013     .0436382    .3580167
                         4  |   .2995638   .0819811     3.65   0.001     .1353359    .4637918
                            |
         Quant_wealth_wave1 |
                         2  |   .1517353   .0831079     1.83   0.073    -.0147499    .3182204
                         3  |   .1427968   .0843859     1.69   0.096    -.0262486    .3118421
                         4  |    .279117   .0640514     4.36   0.000     .1508066    .4074273
                            |
                  smoke_now |
          Currently Smokes  |   .2160437   .0992925     2.18   0.034      .017137    .4149504
       chronicdisease_wave1 |   .0082414   .0233858     0.35   0.726     -.038606    .0550888
                            |
         dentistvisit_wave1 |
                     1.yes  |    .192387   .0747353     2.57   0.013     .0426742    .3420998
                      _cons |   6.352637   .1614073    39.36   0.000     6.029299    6.675974
    -----------------------------------------------------------------------------------------
    
    . 
    end of do-file
    
    . ereturn list
    
    scalars:
                  e(N_glm) =  6334
                  e(k_glm) =  39
               e(k_eq_glm) =  1
         e(k_eq_model_glm) =  0
               e(k_dv_glm) =  1
          e(k_autoCns_glm) =  13
               e(df_m_glm) =  25
                 e(df_glm) =  6308
                e(phi_glm) =  16822.68582098085
                e(aic_glm) =  107364.1783727158
                e(bic_glm) =  66219638.84230246
                 e(ll_glm) =  -340022326.9063911
               e(chi2_glm) =  189.6021185551432
                  e(p_glm) =  3.02700642579e-27
           e(deviance_glm) =  66274857.101331
         e(deviance_s_glm) =  3939.612128919069
         e(deviance_p_glm) =  106117502.1587472
        e(deviance_ps_glm) =  6308
            e(dispers_glm) =  10506.47702938031
                   e(df_r) =  56
                   e(rank) =  52
                      e(p) =  3.18571719565e-25
                      e(F) =  122.1897826820111
                   e(df_m) =  25
                   e(k_eq) =  2
                 e(census) =  0
              e(singleton) =  0
          e(N_strata_omit) =  0
                  e(N_psu) =  112
               e(N_strata) =  56
                  e(N_pop) =  74493528.00646973
                      e(N) =  12388
                 e(stages) =  1
          e(dispers_p_glm) =  16822.68582098085
         e(dispers_ps_glm) =  1
               e(nbml_glm) =  0
                 e(vf_glm) =  1
              e(power_glm) =  0
               e(rank_glm) =  26
                 e(ic_glm) =  4
                 e(rc_glm) =  0
          e(converged_glm) =  1
               e(df_r_glm) =  55
               e(N_probit) =  11495
           e(N_cds_probit) =  0
           e(N_cdf_probit) =  0
               e(k_probit) =  39
            e(k_eq_probit) =  1
      e(k_eq_model_probit) =  1
            e(k_dv_probit) =  1
       e(k_autoCns_probit) =  13
            e(df_m_probit) =  25
            e(r2_p_probit) =  .3393799296638733
              e(ll_probit) =  -33581250.83605153
            e(ll_0_probit) =  -50832925.52550701
            e(chi2_probit) =  34503349.37891096
               e(p_probit) =  0
            e(rank_probit) =  26
              e(ic_probit) =  4
              e(rc_probit) =  0
       e(converged_probit) =  1
            e(df_r_probit) =  55
          e(dispers_s_glm) =  .6245421891120908
    
    macros:
                    e(cmd) : "twopm"
                e(cmdline) : "svy :twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.e.."
                 e(prefix) : "svy"
                e(cmdname) : "twopm"
                e(command) : "twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.educat.."
                   e(wexp) : "= new_weight"
                  e(wtype) : "pweight"
              e(estat_cmd) : "svy_estat"
                    e(vce) : "linearized"
                e(vcetype) : "Linearized"
                  e(title) : "Survey data analysis"
                   e(wvar) : "new_weight"
             e(singleunit) : "centered"
                    e(su1) : "raehsamp"
                e(strata1) : "raestrat"
             e(properties) : "b V"
                 e(depvar) : "oopdental_costs"
                e(predict) : "twopm_p"
                e(eqnames) : "probit glm"
              e(marginsok) : "default normal duan"
        e(chi2type_probit) : "LR"
             e(opt_probit) : "moptimize"
           e(which_probit) : "max"
       e(ml_method_probit) : "d2"
            e(user_probit) : "mopt__probit_d2()"
       e(technique_probit) : "nr"
      e(singularHmethod_p
        robit)             : "m-marquardt"
        e(crittype_probit) : "log likelihood"
            e(varfunc_glm) : "glim_v4"
           e(varfunct_glm) : "Gamma"
           e(varfuncf_glm) : "u^2"
               e(link_glm) : "glim_l03"
              e(linkt_glm) : "Log"
              e(linkf_glm) : "ln(u)"
                  e(m_glm) : "1"
           e(chi2type_glm) : "Wald"
            e(hac_lag_glm) : "6332"
                e(opt_glm) : "moptimize"
               e(opt1_glm) : "ML"
              e(which_glm) : "max"
          e(ml_method_glm) : "e2"
               e(user_glm) : "glim_lf"
          e(technique_glm) : "nr"
      e(singularHmethod_g
        lm)                : "m-marquardt"
           e(crittype_glm) : "log likelihood"
         e(properties_glm) : "b V"
            e(predict_glm) : "glim_p"
    
    matrices:
                      e(b) :  1 x 78
                      e(V) :  78 x 78
           e(V_modelbased) :  78 x 78
                  e(V_srs) :  78 x 78
      e(_N_strata_certain) :  1 x 1
       e(_N_strata_single) :  1 x 1
              e(_N_strata) :  1 x 1
    
    functions:
                 e(sample)


    Code:
    
    . 
    . *********Poisson model without transforming the outcome*********** 
    . svy:twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.education i.veteran i.mothered
    >  i.dentalinsurance_wave1 ///
    >  i.QuantHI_wave1 i.Quant_wealth_wave1, firstpart(logit) secondpart(glm, family(poisson) link(log)) 
    (running twopm on estimation sample)
    
    Survey data analysis
    
    Number of strata =  56                            Number of obs   =     12,428
    Number of PSUs   = 112                            Population size = 74,668,884
                                                      Design df       =         56
                                                      F(22, 35)       =      51.71
                                                      Prob > F        =     0.0000
    
    -----------------------------------------------------------------------------------------
                            |             Linearized
            oopdental_costs | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ------------------------+----------------------------------------------------------------
    logit                   |
                      inc_d |
                       Yes  |  -.0324669   .0870609    -0.37   0.711    -.2068708     .141937
                            |
                endentulism |
                       Yes  |  -1.647131   .0945394   -17.42   0.000    -1.836517   -1.457746
                            |
                       race |
                     Black  |  -.5998833   .0830412    -7.22   0.000    -.7662348   -.4335318
                  Hispanic  |  -.1958862   .1317737    -1.49   0.143    -.4598607    .0680883
                     Other  |  -.2581573   .1349011    -1.91   0.061    -.5283967    .0120821
                            |
                    age_cat |
                     60-69  |   .0637076   .0674241     0.94   0.349    -.0713591    .1987743
                     70-79  |   .3546848   .0845814     4.19   0.000     .1852479    .5241217
                       80+  |   .4868778   .0904655     5.38   0.000     .3056535     .668102
                            |
                       male |
                      Male  |  -.3350102   .0633724    -5.29   0.000    -.4619603     -.20806
                            |
                  education |
                     2.ged  |   .2863622   .1089153     2.63   0.011     .0681785    .5045458
    3.high-school graduate  |   .4090348   .0828033     4.94   0.000     .2431598    .5749098
            4.some college  |   .5503176   .1036014     5.31   0.000      .342779    .7578562
       5.college and above  |   .8449879    .101279     8.34   0.000     .6421018    1.047874
                            |
                    veteran |
                       Yes  |  -.0829022   .0743479    -1.12   0.270    -.2318389    .0660345
                            |
                   mothered |
     High School or Higher  |   .0747755   .0477941     1.56   0.123    -.0209675    .1705185
                            |
      dentalinsurance_wave1 |
                       Yes  |   .0780876   .0632878     1.23   0.222     -.048693    .2048682
                            |
              QuantHI_wave1 |
                         2  |    .274353    .091446     3.00   0.004     .0911647    .4575414
                         3  |   .5051813   .0857925     5.89   0.000     .3333181    .6770444
                         4  |   .6032085   .0997957     6.04   0.000     .4032937    .8031233
                            |
         Quant_wealth_wave1 |
                         2  |   .3379783   .0809602     4.17   0.000     .1757954    .5001611
                         3  |   .7228379   .0822469     8.79   0.000     .5580776    .8875982
                         4  |   .9272186   .1037654     8.94   0.000     .7193515    1.135086
                            |
                      _cons |  -.9422722   .1142121    -8.25   0.000    -1.171067   -.7134779
    ------------------------+----------------------------------------------------------------
    glm                     |
                      inc_d |
                       Yes  |  -.0423142   .1006743    -0.42   0.676     -.243989    .1593607
                            |
                endentulism |
                       Yes  |   .6211031   .2320901     2.68   0.010     .1561708    1.086035
                            |
                       race |
                     Black  |   .0925277   .1192009     0.78   0.441    -.1462604    .3313159
                  Hispanic  |   .4345403   .1278235     3.40   0.001     .1784791    .6906015
                     Other  |   .2232119    .137481     1.62   0.110    -.0521955    .4986194
                            |
                    age_cat |
                     60-69  |   .1127104   .0841904     1.34   0.186    -.0559433    .2813641
                     70-79  |   .0125822   .0844499     0.15   0.882    -.1565913    .1817556
                       80+  |     .13267   .1118596     1.19   0.241    -.0914116    .3567517
                            |
                       male |
                      Male  |  -.0801105   .0722479    -1.11   0.272    -.2248405    .0646195
                            |
                  education |
                     2.ged  |   .1624472   .1446892     1.12   0.266    -.1274001    .4522944
    3.high-school graduate  |    .028098   .1042183     0.27   0.788    -.1806764    .2368724
            4.some college  |   .0900582   .0977156     0.92   0.361    -.1056896    .2858061
       5.college and above  |   .1608487   .1147416     1.40   0.166    -.0690063    .3907036
                            |
                    veteran |
                       Yes  |   .0874433   .0781621     1.12   0.268    -.0691342    .2440208
                            |
                   mothered |
     High School or Higher  |   .1135479   .0485165     2.34   0.023     .0163577    .2107381
                            |
      dentalinsurance_wave1 |
                       Yes  |  -.3408696   .0607332    -5.61   0.000    -.4625328   -.2192064
                            |
              QuantHI_wave1 |
                         2  |    .206534   .0895251     2.31   0.025     .0271937    .3858744
                         3  |   .1986546   .0928171     2.14   0.037     .0127196    .3845896
                         4  |   .3119243   .1018714     3.06   0.003     .1078514    .5159972
                            |
         Quant_wealth_wave1 |
                         2  |    .209604   .1095054     1.91   0.061    -.0097616    .4289697
                         3  |   .1411713   .0853292     1.65   0.104    -.0297636    .3121062
                         4  |   .2905005   .0688465     4.22   0.000     .1525844    .4284166
                            |
                      _cons |   6.473532   .1593284    40.63   0.000     6.154359    6.792706
    -----------------------------------------------------------------------------------------
    
    . 
    end of do-file
    
    . ereturn list
    
    scalars:
                  e(N_glm) =  6356
                  e(k_glm) =  34
               e(k_eq_glm) =  1
         e(k_eq_model_glm) =  0
               e(k_dv_glm) =  1
          e(k_autoCns_glm) =  11
               e(df_m_glm) =  22
                 e(df_glm) =  6333
                e(phi_glm) =  1
                e(aic_glm) =  10979155.92527878
                e(bic_glm) =  69440000177.63506
                 e(ll_glm) =  -34891757507.53598
               e(chi2_glm) =  3780683391.037862
                  e(p_glm) =  0
           e(deviance_glm) =  69440055636.69467
         e(deviance_s_glm) =  69440055636.69467
         e(deviance_p_glm) =  125197877938.4511
        e(deviance_ps_glm) =  125197877938.4511
            e(dispers_glm) =  10964796.40560472
                   e(df_r) =  56
                   e(rank) =  46
                      e(p) =  1.61865195513e-20
                      e(F) =  51.71442009163804
                   e(df_m) =  22
                   e(k_eq) =  2
                 e(census) =  0
              e(singleton) =  0
          e(N_strata_omit) =  0
                  e(N_psu) =  112
               e(N_strata) =  56
                  e(N_pop) =  74668884.21630859
                      e(N) =  12428
                 e(stages) =  1
          e(dispers_p_glm) =  19769126.47062231
         e(dispers_ps_glm) =  19769126.47062231
               e(nbml_glm) =  0
                 e(vf_glm) =  1
              e(power_glm) =  0
               e(rank_glm) =  23
                 e(ic_glm) =  4
                 e(rc_glm) =  0
          e(converged_glm) =  1
               e(df_r_glm) =  55
                e(N_logit) =  11535
            e(N_cds_logit) =  0
            e(N_cdf_logit) =  0
                e(k_logit) =  34
             e(k_eq_logit) =  1
       e(k_eq_model_logit) =  1
             e(k_dv_logit) =  1
        e(k_autoCns_logit) =  11
             e(df_m_logit) =  22
             e(r2_p_logit) =  .1457996861128672
               e(ll_logit) =  -43527249.33444504
             e(ll_0_logit) =  -50956723.64760613
             e(chi2_logit) =  14858948.62632218
                e(p_logit) =  0
             e(rank_logit) =  23
               e(ic_logit) =  4
               e(rc_logit) =  0
        e(converged_logit) =  1
             e(df_r_logit) =  55
          e(dispers_s_glm) =  10964796.40560472
    
    macros:
                    e(cmd) : "twopm"
                e(cmdline) : "svy :twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.e.."
                 e(prefix) : "svy"
                e(cmdname) : "twopm"
                e(command) : "twopm oopdental_costs i.inc_d i.endentulism i.race i.age_cat i.male i.educat.."
                   e(wexp) : "= new_weight"
                  e(wtype) : "pweight"
              e(estat_cmd) : "svy_estat"
                    e(vce) : "linearized"
                e(vcetype) : "Linearized"
                  e(title) : "Survey data analysis"
                   e(wvar) : "new_weight"
             e(singleunit) : "centered"
                    e(su1) : "raehsamp"
                e(strata1) : "raestrat"
             e(properties) : "b V"
                 e(depvar) : "oopdental_costs"
                e(predict) : "twopm_p"
                e(eqnames) : "logit glm"
              e(marginsok) : "default normal duan"
         e(chi2type_logit) : "LR"
              e(opt_logit) : "moptimize"
            e(which_logit) : "max"
        e(ml_method_logit) : "d2"
             e(user_logit) : "mopt__logit_d2()"
        e(technique_logit) : "nr"
      e(singularHmethod_l
        ogit)              : "m-marquardt"
         e(crittype_logit) : "log likelihood"
            e(varfunc_glm) : "glim_v3"
           e(varfunct_glm) : "Poisson"
           e(varfuncf_glm) : "u"
               e(link_glm) : "glim_l03"
              e(linkt_glm) : "Log"
              e(linkf_glm) : "ln(u)"
                  e(m_glm) : "1"
           e(chi2type_glm) : "Wald"
            e(hac_lag_glm) : "6354"
                e(opt_glm) : "moptimize"
               e(opt1_glm) : "ML"
              e(which_glm) : "max"
          e(ml_method_glm) : "e2"
               e(user_glm) : "glim_lf"
          e(technique_glm) : "nr"
      e(singularHmethod_g
        lm)                : "m-marquardt"
           e(crittype_glm) : "log likelihood"
         e(properties_glm) : "b V"
            e(predict_glm) : "glim_p"
    
    matrices:
                      e(b) :  1 x 68
                      e(V) :  68 x 68
           e(V_modelbased) :  68 x 68
                  e(V_srs) :  68 x 68
      e(_N_strata_certain) :  1 x 1
       e(_N_strata_single) :  1 x 1
              e(_N_strata) :  1 x 1
    
    functions:
                 e(sample)

  • #2
    If you are using svy; BIC and AIC are not options either.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Richard Williams interesting, why not? svy: changes the estimation but you can't compare models when using svyset?

      Comment


      • #4
        See

        https://www3.nd.edu/~rwilliam/xsoc73...yCautionsX.pdf

        Basically, the assumptions required for BIC, AIC, Likelihood Ratio tests, are violated with svy data. There are sometimes alternatives you can use instead.

        This paper suggests how you can adapt things like BIC and AIC when using svyset data. I can't vouch for it or tell you if there is Stata code for it.

        https://www.stat.colostate.edu/grayb...ions/Scott.pdf

        Also see

        https://www.statalist.org/forums/for...arison-options

        https://www.stata-journal.com/sjpdf....iclenum=st0099

        The articles I am citing have been around for a while but it doesn't appear to me the methods have seen widespread use.

        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Here is another article (from 2015) that claims to have a BIC/AIC alternative for svy data:

          https://academic.oup.com/jssam/artic...t/3/1/1/915356
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment

          Working...
          X