Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I use a zero inflated negative binomial regression?

    Hello everyone,

    this is my first post, so please be kind and understanding if I don't meet the forum norms.

    So my regression equation was: reg amount12 ib1.lng_origins c.pca_generaltrst1 ib6.religion controls
    The outcome variable is amount12=amount remitted in past 12 months, lng_origins=language origins in SA such as (Sotho, Venda, Tsonga etc) and religion=Atheists, christians etc.

    So the very first problem that I have is that if I only look at the amount remitted of those that remit I may have selection bias. I cannot use a heckman because my selection equation does not have a variable that is different from the second stage so the exclusion restriction is violated.

    Then I talked to a professor and he said that I should simply recode the missing values in the amount remitted to zeros because those people are not remitting any amount. So I did that and I also recoded two other variables with missings to zeros that I want to include as controls because I figured that otherwise stata only takes the values into account that are non-missing but to account for selection bias it has to take all the observations into account right? These are the control that I recoded: (1) relationship to remittance receiver (2) frequency of remittances.

    Now I cant use OLS because the error terms are not distributed normally and I have a loooot of zeros which is why I thought I may be able to use a zero inflated negative binomial regression. Then in inflate() I would plug in my logit regression (all variables & controls without the outcome variable) that estimated whether a person remits or not:

    zinb new_amount12 ib1.pop_lngorigins c.pca_generaltrst1 ib6.religion controls, inflate(ib1.pop_lngorigins c.pca_generaltrst1 ib6.religion other controls)

    Unfortunately, the inflate regression does not give me the same or similar results as the logit regression that I did already, why is that? Can I still use the coefficients that I get for amount remitted?

    Please note that this is a master thesis and that it does not have to be perfect (I would like it to be but I am pretty much new to these models so I think it is very normal that it will not be perfect right away).Thank you so much for your help in advance!!



  • #2
    Multiple imputation should be taken in consideration when dealing with missing values. If I understood right, the DV conveys expenses, hence it is not a count variable. Being this so, a glm model with gamma family and log link could be considered.
    Best regards,

    Marcos

    Comment


    • #3
      Camila: Several things. First, it is a misconception that you cannot use a linear model estimated by OLS "because the errors are not normally distributed." Unless the sample size is small, estimation of a linear model by OLS is always a good starting point, with one important caveat: your Y variable must be the variable you actually want to explain. To me, this is the real issue.

      If a zero is a true zero then your professor is correct, and this seems to be the case for remittances. Set it to zero. But do not treat a true missing value as a zero. At this point, if you drop truly missing data, how large is your sample size? How many zeros? How many different nonzero outcomes does Y take on?

      My recommendation is to use a two-part model. If you answer the above question, I can continue to respond.

      JW

      Comment


      • #4
        Dear JW,

        thank you so much for your help!

        So my total sample size is 28,464. And the non-zero observations for the amount12 are 1,887 (+4 observations are zero). A total of 26,573 are missing. So I don't know which of those missings are actually missings?

        I do know that in the survey they asked 22,706 people if they have sent remittances of any kind in the past 12 months and then if they said yes (2,169), they asked how much they sent. So probably 278 observations (2169-1891=278) are truly missing and the rest I can recode to zero?

        And I have also seen papers where they used a tobit model, I am just so confused what to use now...

        Thanks again, really appreciate any help!




        Comment


        • #5
          Your plan for recoding the response variable makes sense to me. Be sure to document what you did in writing up your findings.

          I would start with a linear regression and just include the zeros as they are. Then I would do Tobit. You need to be sure to compute the average marginal effects (which can be compared with OLS coefficients). Then Cragg's hurdle model using the -churdle linear- command, being sure to include all covariates in the selection and outcome equations.

          Jeff

          Comment


          • #6
            Thank you so much for your suggestions - I was really frustrated yesterday and panicking that I would not be able to figure out what is best! It is very helpful

            Just a few more questions:
            1) For which model would you then use the recoded amount12 variable then? For the tobit and Craggs hurdle only?
            2) For the tobit, do I have to put any upper or lower limit in the command? Or is it going to be just tobit amount12 ib1.lng_origins c.pca_generaltrst1 ib6.religion controls? I have run this command before (without specifying any lower or upper limit) but then it showed me that there were no censored values but shouldn't the zeros be censored values?
            3) And what do you mean by being sure to include all covariates in the selection? What exactly is your concern what I may do incorrectly?

            Sorry for asking so many questions, I just want to make sure that I understand everything correctly.

            Thanks, Jeff!

            Comment


            • #7
              No problem, Camila. Recode the variable the same for all approaches. Make sure a zero is, as best you can tell, a zero. If you’re not very sure, code it to missing. This seems like a fairly small fraction.

              In the Tobit and churdle, specify ll(0). What I mean in the churdle is don’t leave some variables out of the selection equation. I see this done fairly often. You have to specify the variables in each part. If you leave a variable out it generally causes inconsistency.

              Comment


              • #8
                Fantastic! I applied your suggestions but now I encountered new problems.

                While I was working on the tobit postestimation, when I want to estimate the AME for the censored sample using this command:
                margins, dydx(*) predict (ystar(0,.))
                Stata tells me:
                inconsistent estimation sample levels 1 and 2 of factor pop_lngorigins
                Do you have any idea what this may mean?

                when I use a normal margins, dydx(*) command it takes abnormally long - I am still waiting for the results.

                Regarding the churdle linear command, stata tells me:
                invalid selection model;
                no observations


                And if I try it with less variables just to see if maybe one variable is a problen it shows me:
                initial values not feasible

                Thanks so much for your advice so far, it is very helpful And even if I just get the tobit model to work, then I can compare that to the OLS results - I think that is already very helpful and adds value!

                Comment


                • #9
                  Camila:

                  I probably have to see at least the output, if not also a data extract. Please show at least summary statistics of amount12 and the "selection" variable, along with the Tobit output. For churdle, are you sure selectvar is defined for all observations? It should be set to missing whenever amount12 is missing. It should be one when amount12 > 0 and zero if amount12 = 0.

                  If you just use dydx(*) with Tobit it should simple return the coefficients, so I'm puzzled by your finding. The command margins, dydx(*) predict (ystar(0,.)) is correct. Unfortunately, in my opinion, Stata has reversed the proper notation. ystar should refer to the underlying latent variable, but in the margins command it appears to mean y (the observed variable). I've checked this with "by hand" calculations.

                  Comment


                  • #10
                    A couple more things. I just looked more closely at the churdle command, and I see that you don't put in a separate selection outcome. It is determined by the y variable. So it should be

                    Code:
                    churdle y x1 x2 ... xk, select(x1 x2 ... xk) ll(0)
                    When you use the tobit command, how many "censored" (unfortunate choice of word) observations does it report?

                    Comment


                    • #11
                      Dear Jeff,


                      1) So I guess I must go through each control and independent variables and recode it to missing if amount12 is missing recode it to zero if amount12=0 and leave the value if amount12>0?

                      And this is the information you asked me for. I hope this helps?

                      This is the tobit output:

                      Code:
                      Refining starting values:
                      
                      Refining starting values:
                      
                      Grid node 0:   log likelihood =  -41906192
                      
                      Fitting full model:
                      
                      Iteration 0:   log pseudolikelihood =  -41906192  
                      Iteration 1:   log pseudolikelihood =  -34233897  
                      Iteration 2:   log pseudolikelihood =  -32270815  
                      Iteration 3:   log pseudolikelihood =  -31054566  
                      Iteration 4:   log pseudolikelihood =  -31021777  
                      Iteration 5:   log pseudolikelihood =  -31010035  
                      Iteration 6:   log pseudolikelihood =  -31009764  
                      Iteration 7:   log pseudolikelihood =  -31009734  
                      Iteration 8:   log pseudolikelihood =  -31009731  
                      Iteration 9:   log pseudolikelihood =  -31009731  
                      Iteration 10:  log pseudolikelihood =  -31009730  
                      Iteration 11:  log pseudolikelihood =  -31009730  
                      Iteration 12:  log pseudolikelihood =  -31009730  
                      Iteration 13:  log pseudolikelihood =  -31009730  
                      Iteration 14:  log pseudolikelihood =  -31009730  
                      Iteration 15:  log pseudolikelihood =  -31009730  
                      
                      Tobit regression                                Number of obs     =      8,353
                                                                         Uncensored     =      1,169
                      Limits: lower = 0                                  Left-censored  =      7,184
                              upper = +inf                               Right-censored =          0
                      
                                                                      F(  66,   8288)   =          .
                                                                      Prob > F          =          .
                      Log pseudolikelihood =  -31009730               Pseudo R2         =     0.1676
                      
                      -------------------------------------------------------------------------------------------
                                                |               Robust
                                   new_amount12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      --------------------------+----------------------------------------------------------------
                                 pop_lngorigins |
                      West Germanic - coloured  |  -1745.101   3950.441    -0.44   0.659    -9488.953    5998.752
                         West Germanic - black  |   20534.04   12802.14     1.60   0.109    -4561.352    45629.44
                         West Germanic - other  |  -2832.269   4096.955    -0.69   0.489    -10863.33    5198.789
                                         Nguni  |  -110.8592   3523.101    -0.03   0.975    -7017.018      6795.3
                                         Sotho  |   328.1406    3542.09     0.09   0.926    -6615.241    7271.523
                                         Venda  |   6955.608   4870.478     1.43   0.153    -2591.747    16502.96
                                        Tsonga  |   2160.247   4687.433     0.46   0.645    -7028.295    11348.79
                                                |
                               pca_generaltrst1 |   180.8257   377.5668     0.48   0.632    -559.2998    920.9512
                                                |
                                       religion |
                                1. No religion  |   3886.939    1872.01     2.08   0.038     217.3303    7556.548
                                  2. Christian  |   1589.692   1502.533     1.06   0.290    -1355.648    4535.032
                                     3. Jewish  |   183.1857   5191.352     0.04   0.972    -9993.163    10359.53
                                     4. Muslim  |   133.1594   6014.309     0.02   0.982    -11656.39    11922.71
                                      5. Hindu  |   85236.76   9503.989     8.97   0.000     66606.56      103867
                                      7. Other  |   21882.63    6133.22     3.57   0.000     9859.986    33905.28
                                                |
                                        brnprov |
                               2. Eastern Cape  |  -2653.259   2568.617    -1.03   0.302    -7688.392    2381.874
                              3. Northern Cape  |  -7056.392   2898.603    -2.43   0.015    -12738.38   -1374.405
                                 4. Free State  |    -6136.3    3197.62    -1.92   0.055    -12404.43    131.8354
                              5. KwaZulu-Natal  |  -4319.346   2698.866    -1.60   0.110      -9609.8    971.1073
                                 6. North West  |  -1142.944   3206.951    -0.36   0.722    -7429.372    5143.483
                                    7. Gauteng  |  -261.0968   3063.668    -0.09   0.932    -6266.653    5744.459
                                 8. Mpumalanga  |  -3854.059    2888.41    -1.33   0.182    -9516.065    1807.947
                                    9. Limpopo  |  -7429.577   3192.173    -2.33   0.020    -13687.04   -1172.118
                                                |
                                       best_gen |
                                     2. Female  |  -1699.624   938.9882    -1.81   0.070    -3540.276     141.028
                                                |
                                        edu_lev |
                            incomplete primary  |   5325.529   4031.587     1.32   0.187     -2577.39    13228.45
                             primary completed  |  -5443.667    2396.17    -2.27   0.023    -10140.76   -746.5746
                          incomplete secondary  |  -1236.809   1255.305    -0.99   0.325     -3697.52    1223.903
                                lower tertiary  |   516.3127   1173.086     0.44   0.660    -1783.229    2815.855
                               higher tertiary  |  -5258.348   5123.291    -1.03   0.305    -15301.28    4784.585
                                         other  |   3607.203   3962.133     0.91   0.363    -4159.568    11373.97
                                            25  |  -1803.203   2746.166    -0.66   0.511    -7186.375    3579.969
                                                |
                                      empl_stat |
                                      employed  |    244.643   1654.367     0.15   0.882    -2998.331    3487.617
                                                |
                                    best_marstt |
                        2. Living with Partner  |   -846.198   1314.147    -0.64   0.520    -3422.256     1729.86
                              3. Widow/Widower  |  -2935.213   2191.087    -1.34   0.180    -7230.293    1359.867
                      4. Divorced or Seperated  |  -2572.785   2863.814    -0.90   0.369    -8186.577    3041.007
                              5. Never married  |   833.9811   1238.124     0.67   0.501    -1593.051    3261.014
                                                |
                                  age_intervals |
                                      6. 20-24  |  -5369.367   4015.577    -1.34   0.181     -13240.9    2502.168
                                      7. 25-29  |  -4645.755   3936.643    -1.18   0.238    -12362.56     3071.05
                                      8. 30-34  |  -1777.543   4106.531    -0.43   0.665    -9827.371    6272.285
                                      9. 35-39  |  -617.2596   4122.129    -0.15   0.881    -8697.665    7463.145
                                     10. 40-44  |  -3296.241   4234.272    -0.78   0.436    -11596.47    5003.991
                                     11. 45-49  |  -1063.552   4551.222    -0.23   0.815    -9985.087    7857.983
                                     12. 50-54  |  -687.5889   5074.855    -0.14   0.892    -10635.57    9260.397
                                     13. 55-59  |   4087.078   4724.316     0.87   0.387    -5173.764    13347.92
                                     14. 60-64  |  -45.82361   5443.571    -0.01   0.993    -10716.58    10624.94
                                                |
                                         hhsize |  -862.4076   229.6415    -3.76   0.000    -1312.562   -412.2529
                                        tot_ass |   .0010023   .0004453     2.25   0.024     .0001294    .0018753
                                                |
                               new_rel_receiver |
                                             0  |  -434741.6     158323    -2.75   0.006    -745094.3   -124388.9
                                             4  |  -15984.62    3104.15    -5.15   0.000    -22069.53   -9899.709
                                             5  |  -20006.91   3409.704    -5.87   0.000    -26690.78   -13323.03
                                             6  |  -24278.85   3561.181    -6.82   0.000    -31259.66   -17298.05
                                             8  |  -15933.78   3026.305    -5.27   0.000     -21866.1   -10001.47
                                             9  |  -13713.71   4298.144    -3.19   0.001    -22139.15    -5288.27
                                            12  |  -18911.23   2979.554    -6.35   0.000    -24751.91   -13070.56
                                            13  |  -19981.52   14545.46    -1.37   0.170    -48494.27    8531.226
                                            14  |  -17180.63   3892.493    -4.41   0.000    -24810.89    -9550.37
                                            15  |  -25784.32   3970.853    -6.49   0.000    -33568.18   -18000.45
                                            16  |  -16750.65   5063.353    -3.31   0.001    -26676.09   -6825.206
                                            17  |  -19335.63   8281.317    -2.33   0.020    -35569.09    -3102.18
                                            18  |  -16162.21   3323.644    -4.86   0.000    -22677.38   -9647.037
                                            19  |  -16157.45   4917.914    -3.29   0.001    -25797.79   -6517.104
                                            20  |  -23051.83   3519.221    -6.55   0.000    -29950.38   -16153.28
                                            21  |  -12692.27   3236.184    -3.92   0.000       -19036   -6348.539
                                            25  |   -10808.1   6086.545    -1.78   0.076    -22739.25    1123.054
                                            26  |  -21748.88   3608.557    -6.03   0.000    -28822.55    -14675.2
                                            30  |  -12915.42   3478.922    -3.71   0.000    -19734.97   -6095.859
                                                |
                                      new_frq12 |    187.705    103.522     1.81   0.070     -15.2241    390.6341
                               new_inkind12_frq |  -254.8363   112.8775    -2.26   0.024    -476.1045    -33.5681
                                          _cons |   30518.05   6701.564     4.55   0.000     17381.31    43654.79
                      --------------------------+----------------------------------------------------------------
                             var(e.new_amount12)|   1.28e+08   1.76e+07                      9.79e+07    1.68e+08
                      -------------------------------------------------------------------------------------------
                      
                      
                      
                      
                      sum new_amount12, detail
                      
                            f3_7_1 - Total amount of remittance in money sent
                                           in past 12 months:1
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%          200              0
                       5%          600              0
                      10%         1000              0       Obs               1,891
                      25%         2800              0       Sum of Wgt.       1,891
                      50%         6000                      Mean           10043.85[INDENT=2]                        Largest       Std. Dev.      13972.29[/INDENT]75%        12000         150000
                      90%        24000         150000       Variance       1.95e+08
                      95%        30000         180000       Skewness       6.083797
                      99%        60000         240000       Kurtosis       69.18961
                      
                      
                                            pca_generaltrst1
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%    -3.277674      -3.277674
                       5%    -2.927758      -3.277674
                      10%    -2.463739      -3.277674       Obs              22,458
                      25%    -1.468497      -3.277674       Sum of Wgt.      22,458
                      
                      50%    -.6174005                      Mean          -.6743963
                                              Largest       Std. Dev.      1.260248
                      75%     .2374917       2.521248
                      90%     .8311156       2.521248       Variance       1.588224
                      95%     1.190528       2.521248       Skewness      -.1316909
                      99%     2.255755       2.521248       Kurtosis       2.577096
                      
                                m8 - Religious affiliation of respondent
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            1              1
                       5%            1              1
                      10%            2              1       Obs              22,659
                      25%            2              1       Sum of Wgt.      22,659
                      
                      50%            2                      Mean           2.305486
                                              Largest       Std. Dev.      1.252439
                      75%            2              7
                      90%            4              7       Variance       1.568604
                      95%            6              7       Skewness       2.408399
                      99%            6              7       Kurtosis       7.722633
                      
                                   b11_3 - Province respondent born in
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            1              1
                       5%            1              1
                      10%            2              1       Obs               9,121
                      25%            3              1       Sum of Wgt.       9,121
                      
                      50%            5                      Mean           5.011622
                                              Largest       Std. Dev.      2.356839
                      75%            7              9
                      90%            9              9       Variance       5.554689
                      95%            9              9       Skewness       .0462005
                      99%            9              9       Kurtosis       2.091913
                      
                                               Best gender
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            1              1
                       5%            1              1
                      10%            1              1       Obs              28,445
                      25%            1              1       Sum of Wgt.      28,445
                      
                      50%            2                      Mean           1.547864
                                              Largest       Std. Dev.      .4977125
                      75%            2              2
                      90%            2              2       Variance       .2477177
                      95%            2              2       Skewness      -.1923405
                      99%            2              2       Kurtosis       1.036995
                      
                                                 edu_lev
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            1              1
                       5%            1              1
                      10%            2              1       Obs              25,044
                      25%            3              1       Sum of Wgt.      25,044
                      
                      50%            3                      Mean           4.281025
                                              Largest       Std. Dev.      4.466597
                      75%            4             25
                      90%            5             25       Variance       19.95049
                      95%            6             25       Skewness       4.098736
                      99%           25             25       Kurtosis       19.27936
                      
                                     Employment status - Adult only
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            0              0
                       5%            0              0
                      10%            0              0       Obs              22,721
                      25%            0              0       Sum of Wgt.      22,721
                      
                      50%            0                      Mean           .4150786
                                              Largest       Std. Dev.      .4927464
                      75%            1              1
                      90%            1              1       Variance        .242799
                      95%            1              1       Skewness       .3446938
                      99%            1              1       Kurtosis       1.118814
                      
                                           Best marital status
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            1              1
                       5%            1              1
                      10%            1              1       Obs              23,957
                      25%            2              1       Sum of Wgt.      23,957
                      
                      50%            5                      Mean           3.689819
                                              Largest       Std. Dev.      1.746261
                      75%            5              5
                      90%            5              5       Variance       3.049426
                      95%            5              5       Skewness      -.6991092
                      99%            5              5       Kurtosis        1.62883
                      
                                              Age Intervals
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            5              5
                       5%            5              5
                      10%            5              5       Obs              28,464
                      25%            6              5       Sum of Wgt.      28,464
                      
                      50%            8                      Mean           8.463673
                                              Largest       Std. Dev.      2.741433
                      75%           11             14
                      90%           13             14       Variance       7.515456
                      95%           14             14       Skewness       .4851433
                      99%           14             14       Kurtosis       2.083575
                      
                                      Number of household residents
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            1              1
                       5%            1              1
                      10%            2              1       Obs              23,900
                      25%            3              1       Sum of Wgt.      23,900
                      
                      50%            5                      Mean           5.261339
                                              Largest       Std. Dev.      3.355386
                      75%            7             30
                      90%           10             30       Variance       11.25862
                      95%           11             30       Skewness       1.538315
                      99%           16             30       Kurtosis       7.322676
                      
                                              Total Assets
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%         2500            401
                       5%     7563.433            401
                      10%        14000            401       Obs              22,104
                      25%      39726.6            500       Sum of Wgt.      22,104
                      
                      50%     103280.7                      Mean           586326.6
                                              Largest       Std. Dev.       5004999
                      75%       304500       2.23e+08
                      90%       940700       2.23e+08       Variance       2.51e+13
                      95%      1818718       3.50e+08       Skewness       48.26468
                      99%      7372000       3.50e+08       Kurtosis       2877.197
                      
                                            new_rel_receiver
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            0              0
                       5%            0              0
                      10%            0              0       Obs              28,461
                      25%            0              0       Sum of Wgt.      28,461
                      
                      50%            0                      Mean           .7711957
                                              Largest       Std. Dev.      3.476242
                      75%            0             30
                      90%            0             30       Variance       12.08426
                      95%            4             30       Skewness       5.998841
                      99%           25             30       Kurtosis       43.04409
                      
                                                new_frq12
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            0              0
                       5%            0              0
                      10%            0              0       Obs              28,426
                      25%            0              0       Sum of Wgt.      28,426
                      
                      50%            0                      Mean           .6865897
                                              Largest       Std. Dev.      3.737786
                      75%            0            200
                      90%            0            200       Variance       13.97104
                      95%            7            200       Skewness       35.34731
                      99%           12            300       Kurtosis       2303.839
                      
                                            new_inkind12_frq
                      -------------------------------------------------------------
                            Percentiles      Smallest
                       1%            0              0
                       5%            0              0
                      10%            0              0       Obs              28,430
                      25%            0              0       Sum of Wgt.      28,430
                      
                      50%            0                      Mean           .1471333
                                              Largest       Std. Dev.       1.25058
                      75%            0             15
                      90%            0             20       Variance        1.56395
                      95%            0             48       Skewness       13.21677
                      99%            6             60       Kurtosis       327.3163

                      I hope you have a splendid Sunday evening! Thank you for being so helpful

                      Comment

                      Working...
                      X