Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two part model

    Hi,
    I have used two part model for my healthcare cost data and used the following code
    "twopm total_cost age i.age_grp i.sex i.comorb_cat ib2.health_insurance i.wealth_tertile i.facility1 i.level1 i.treatment1 i.flu1 ib2.sample_type_final ib2.Site, ///
    firstpart(logit, nolog) secondpart(glm, family(gamma) link(log) nolog)"

    but getting different number of observations in the first part, i dont understand why

    Code:
    . ta total_cost if total_cost==0
    
     total_cost |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |        575      100.00      100.00
    ------------+-----------------------------------
          Total |        575      100.00
    
    . sum total_cost
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      total_cost |      3,729    974.1922    1202.411          0   19366.55
    
     twopm total_cost age i.age_grp i.sex i.comorb_cat ib2.health_insurance i.wealth_tertile i.facility1 i.level1 i.treatment1 i.flu1
    >  ib2.sample_type_final ib2.Site, ///
    > firstpart(logit, nolog) secondpart(glm, family(gamma) link(log) nolog)
    
    Fitting logit regression for first part:
    note: 2.level1 != 0 predicts success perfectly
          2.level1 dropped and 117 obs not used
    
    note: 3.level1 != 0 predicts success perfectly
          3.level1 dropped and 53 obs not used
    
    note: 3.treatment1 != 0 predicts success perfectly
          3.treatment1 dropped and 30 obs not used
    
    
    Fitting glm regression for second part:
    
    Two-part model
    ------------------------------------------------------------------------------
    Log pseudolikelihood = -26187.565                 Number of obs   =       3529
    
    Part 1: logit
    ------------------------------------------------------------------------------
                                                      Number of obs   =       3529
                                                      LR chi2(17)     =     941.09
                                                      Prob > chi2     =     0.0000
    Log likelihood = -1098.1144                       Pseudo R2       =     0.3000
    
    Part 2: glm
    ------------------------------------------------------------------------------
                                                       Number of obs   =      3154
    Deviance         =  2317.199533                    (1/df) Deviance =  .7396104
    Pearson          =  2677.580772                    (1/df) Pearson  =   .854638
    
    Variance function: V(u) = u^2                      [Gamma]
    Link function    : g(u) = ln(u)                    [Log]
    
                                                       AIC             =  15.92292
    Log likelihood   = -25089.45093                    BIC             = -22923.59
    -----------------------------------------------------------------------------------
           total_cost |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------+----------------------------------------------------------------
    logit             |
                  age |  -.0057328   .0036571    -1.57   0.117    -.0129006    .0014349
                      |
              age_grp |
               65-69  |  -.0207899   .1372475    -0.15   0.880      -.28979    .2482103
        70 and above  |   .0716346   .1354971     0.53   0.597    -.1939347     .337204
                      |
                  sex |
                   M  |  -.2404908   .1097937    -2.19   0.028    -.4556825    -.025299
                      |
           comorb_cat |
                 One  |   .1405211   .1296914     1.08   0.279    -.1136694    .3947116
       More than one  |   .2788304   .1397698     1.99   0.046     .0048866    .5527743
                      |
     health_insurance |
                 Yes  |  -.1972704   .1624971    -1.21   0.225    -.5157588    .1212181
                      |
       wealth_tertile |
                   2  |   .4299188   .1629887     2.64   0.008     .1104669    .7493707
                   3  |   .2415955   .1526641     1.58   0.114    -.0576207    .5408117
                      |
            facility1 |
             Private  |   3.134358   .6235388     5.03   0.000     1.912245    4.356472
                      |
               level1 |
             Primary  |   .1779982   .2299553     0.77   0.439    -.2727059    .6287023
                      |
           treatment1 |
          Ambulatory  |   3.532288   .7454025     4.74   0.000     2.071326     4.99325
                      |
                 flu1 |
             flu/RSV  |    .727569   .2962698     2.46   0.014     .1468909    1.308247
                      |
    sample_type_final |
                ALRI  |   .8066968   .1361144     5.93   0.000     .5399174    1.073476
                      |
                 Site |
             Chennai  |   1.463906   .3477832     4.21   0.000     .7822639    2.145549
             Kolkata  |   2.587106   .3871506     6.68   0.000     1.828305    3.345907
                Pune  |   1.377833   .3393943     4.06   0.000     .7126324    2.043033
                      |
                _cons |  -.1149095   .1850908    -0.62   0.535    -.4776808    .2478617
    ------------------+----------------------------------------------------------------
    glm               |
                  age |  -.0031744   .0010904    -2.91   0.004    -.0053116   -.0010372
                      |
              age_grp |
               65-69  |  -.0194222   .0405161    -0.48   0.632    -.0988323    .0599878
        70 and above  |   .1175043   .0417505     2.81   0.005     .0356748    .1993339
                      |
                  sex |
                   M  |   .0591213   .0345553     1.71   0.087    -.0086058    .1268483
                      |
           comorb_cat |
                 One  |   .1126224   .0449867     2.50   0.012       .02445    .2007947
       More than one  |   .2371152   .0446565     5.31   0.000     .1495901    .3246403
                      |
     health_insurance |
                 Yes  |  -.0276218   .0589608    -0.47   0.639    -.1431829    .0879393
                      |
       wealth_tertile |
                   2  |  -.0503335   .0443209    -1.14   0.256    -.1372007    .0365338
                   3  |   -.026519   .0531616    -0.50   0.618    -.1307139    .0776758
                      |
            facility1 |
             Private  |   .2387775    .054445     4.39   0.000     .1320672    .3454877
                      |
               level1 |
             Primary  |  -.2222971   .0740025    -3.00   0.003    -.3673394   -.0772548
           Secondary  |  -.1561688   .1202328    -1.30   0.194    -.3918207    .0794831
            Tertiary  |   .1886037   .1490505     1.27   0.206    -.1035298    .4807372
                      |
           treatment1 |
          Ambulatory  |    .610182   .0699323     8.73   0.000     .4731172    .7472468
       Emergency/IPD  |   1.130927   .1708076     6.62   0.000     .7961499    1.465703
                      |
                 flu1 |
             flu/RSV  |   .1072076   .0673063     1.59   0.111    -.0247104    .2391255
                      |
    sample_type_final |
                ALRI  |   .4263658    .037677    11.32   0.000     .3525203    .5002113
                      |
                 Site |
             Chennai  |    .217353    .079374     2.74   0.006     .0617827    .3729232
             Kolkata  |   -.237404   .0842827    -2.82   0.005    -.4025952   -.0722129
                Pune  |   .0250936   .0820798     0.31   0.760    -.1357799    .1859672
                      |
                _cons |   6.444186    .063974   100.73   0.000     6.318799    6.569573
    -----------------------------------------------------------------------------------
    
    .
    end of do-file

  • #2
    Kusum:
    Code:
    Fitting logit regression for first part:
    note: 2.level1 != 0 predicts success perfectly
          2.level1 dropped and 117 obs not used
    
    note: 3.level1 != 0 predicts success perfectly
          3.level1 dropped and 53 obs not used
    
    note: 3.treatment1 != 0 predicts success perfectly
          3.treatment1 dropped and 30 obs not used
    They sum up to 200 observations (that is, the difference between the number of observations between the first and the second part of your regression model).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      The 200 observations dropped due to perfect prediction explains the reduction in sample size from 3,729 to 3,529 in part one.

      Comment


      • #4
        I ran these following post estimation commands
        where "Predict" gives me 3729 obsversations but "margin" and "margin, dydx(*)" gives 3529 observations.
        so i have follwing doubts
        1) margins command gives the predicted mean(if i'm correct), then why it is different after predict command (est mean is 977.46) and it's a two part model, so i do explain the 200 missing if i use the margins
        2) Interpretation of dy/dx, if someone can help me giving me an example.
        3) how do i get combined B- coefficient for this model, and again the combined B coefficient will be interpreted on log scale?
        Code:
         predict twopmhat
         sum twopmhat total_cost
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
            twopmhat |      3,729    977.4612    617.3496   176.1645   6136.087
          total_cost |      3,729    974.1922    1202.411          0   19366.55
        
        . margins
        Warning: cannot perform check for estimable functions.
        
        Predictive margins                              Number of obs     =      3,529
        
        Expression   : twopm combined expected values, predict()
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               _cons |   920.0791   17.81262    51.65   0.000      885.167    954.9912
        ------------------------------------------------------------------------------
        
        . margins, dydx(*)
        Warning: cannot perform check for estimable functions.
        
        Average marginal effects                        Number of obs     =      3,529
        
        Expression   : twopm combined expected values, predict()
        dy/dx w.r.t. : age 2.age_grp 3.age_grp 2.sex 1.comorb_cat 2.comorb_cat 1.health_insurance 2.wealth_tertile 3.wealth_tertile
                       2.facility1 1.level1 2.level1 3.level1 2.treatment1 3.treatment1 2.flu1 1.sample_type_final 1.Site 3.Site 4.Site
        
        -----------------------------------------------------------------------------------
                          |            Delta-method
                          |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
        ------------------+----------------------------------------------------------------
                      age |  -3.394332   1.051936    -3.23   0.001    -5.456088   -1.332576
                          |
                  age_grp |
                   65-69  |  -18.74307   37.27306    -0.50   0.615    -91.79693    54.31079
            70 and above  |   117.2507   41.37961     2.83   0.005     36.14818    198.3533
                          |
                      sex |
                       M  |   34.42272   33.42079     1.03   0.303    -31.08082    99.92626
                          |
               comorb_cat |
                     One  |    105.404   38.92887     2.71   0.007     29.10485    181.7032
           More than one  |   235.9656   41.05419     5.75   0.000     155.5008    316.4303
                          |
         health_insurance |
                     Yes  |  -41.54754   54.18024    -0.77   0.443    -147.7389    64.64379
                          |
           wealth_tertile |
                       2  |  -10.24832    43.0106    -0.24   0.812    -94.54756    74.05091
                       3  |   -3.28981    50.6795    -0.06   0.948    -102.6198    96.04018
                          |
                facility1 |
                 Private  |    393.412   65.35237     6.02   0.000     265.3237    521.5003
                          |
                   level1 |
                 Primary  |  -192.9078   71.31992    -2.70   0.007    -332.6923   -53.12336
               Secondary  |   -148.369   109.9265    -1.35   0.177    -363.8209    67.08292
                Tertiary  |   212.9941   180.4417     1.18   0.238    -140.6651    566.6534
                          |
               treatment1 |
              Ambulatory  |   867.8743   102.3894     8.48   0.000     667.1948    1068.554
           Emergency/IPD  |   1561.687   384.6989     4.06   0.000     807.6912    2315.683
                          |
                     flu1 |
                 flu/RSV  |   160.5259    66.9739     2.40   0.017     29.25945    291.7923
                          |
        sample_type_final |
                    ALRI  |   494.7202   44.08234    11.22   0.000     408.3204      581.12
                          |
                     Site |
                 Chennai  |    369.941   88.66665     4.17   0.000     196.1576    543.7244
                 Kolkata  |  -37.47339   74.12326    -0.51   0.613    -182.7523    107.8055
                    Pune  |   154.0903   82.19243     1.87   0.061    -7.003907    315.1845
        -----------------------------------------------------------------------------------
        Note: dy/dx for factor levels is the discrete change from the base level.

        Comment


        • #5
          On your point #1, predict and margins treat the estimation sample differently. predict will use all available information unless you wish to restrict the prediction to the estimation sample, in which case you can use
          Code:
          predict twopmhat if e(sample)
          Conversely margins will use only the estimation sample unless you instruct it not to do so
          Code:
          margins, noesample

          Comment


          • #6
            Hi,
            i was doing some explanatory analysis in R
            like i said in #1, i used twopm model for my cost data
            and Carlo Lazzaro suggested that cost data follows gamma distribution, but when i checked the distribution of my data excluding zeroes, it fits better as lognormal distribution(compared weibull, gamma and log normal) based on AIC and Q-Q plot comparison, but not sure what to make out of it?
            Is it ok to that my cost data follows log normal distribution?
            and if it is fine then what should i mention in glm as family instead of gamma in stata with twopm command?

            Comment


            • #7
              Kusum:
              as we know, omitting zero (or any othe value) is clearly arbitrary.
              As far as healthcare costs are concerned, you may have a (hopefully very small) fraction of your patients who passed away just after entering the study (zero costs).
              That said, sticking with -glm- you may want to explore -link(log)- with -family(Gaussian)-.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8

                i tried including zeroes in R, to check the distribution of my data but gamma and weibull throws some errors which means there are zeroes in the data, so i had to exclude them
                so, i'll ask the very basic question now
                how to check the distribution and skewness of my data in stata without excluding zeroes or without zeroes, because i need to show the distribution on graph and the distribution type as well that will support my choice of glm family

                Comment


                • #9
                  Whether you use glm or twopm it is strongly recommended that you specify some form of robust vce as an option, e.g. vce(robust). This would be especially true if you decided to use a one- or two-part glm model with a log link, as Carlo suggested in #7.

                  Comment

                  Working...
                  X