Hello - I am working on an analysis of over-dispersed count data using zero-inflated binomial regression and am having difficulty figuring out the appropriate goodness-of-fit test(s) to use and selecting the parameters for the model. Here is an overview of my data:
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
phys_hlth | 1,727 4.10249 8.11829 0 30
Because of the size of the variance relative to the mean, I moved from a ZIP model to ZINB. Putting some of the predictors mentioned above into the ZINB model returns good numbers for overall chi2, alpha, and the Vuong test, as well as for my main predictor (DOV_LGBT):
. zinb phys_hlth DOV_LGBT discrim PPAGE, inflate (PPAGE DOV_LGBT) vuong zip
Zero-inflated negative binomial regression Number of obs = 1,554
Nonzero obs = 634
Zero obs = 920
Inflation model = logit LR chi2(3) = 12.45
Log likelihood = -3113.399 Prob > chi2 = 0.0060
------------------------------------------------------------------------------
phys_hlth | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
phys_hlth |
DOV_LGBT | .2159001 .096089 2.25 0.025 .0275691 .404231
discrim | .2479829 .1694438 1.46 0.143 -.0841209 .5800867
PPAGE | .0087431 .0030843 2.83 0.005 .0026981 .0147882
_cons | 1.592725 .1875341 8.49 0.000 1.225165 1.960286
-------------+----------------------------------------------------------------
inflate |
PPAGE | .0126378 .0039695 3.18 0.001 .0048578 .0204179
DOV_LGBT | -.4053969 .1249087 -3.25 0.001 -.6502134 -.1605805
_cons | -.4047859 .2509455 -1.61 0.107 -.8966301 .0870582
-------------+----------------------------------------------------------------
/lnalpha | .2658189 .104051 2.55 0.011 .0618826 .4697553
-------------+----------------------------------------------------------------
alpha | 1.304499 .1357345 1.063837 1.599603
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 3673.90 Pr>=chibar2 = 0.0000
Vuong test of zinb vs. standard negative binomial: z = 6.47 Pr>z = 0.0000
What I am struggling with is the following:
1) Are there other goodness-of-fit tests that I should be running to ensure that the ZINB model is a good fit? I have used the margins command to estimate the predicted means after doing a robust ZINB regression, and these estimates are close to the actual means (even without the analytical weight), but I want to make sure I'm not stumbling blindly into the ZINB model because I can't think of any other approaches:
. margins DOV_LGBT
Predictive margins Number of obs = 1,554
Model VCE : Robust
Expression : Predicted number of events, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
DOV_LGBT |
0 | 3.432679 .2549347 13.46 0.000 2.933016 3.932342
LGBT | 5.237526 .35971 14.56 0.000 4.532507 5.942545
------------------------------------------------------------------------------
Actual means:
. mean phys_hlth [aw=weight_1], over(DOV_LGBT)
Mean estimation Number of obs = 1,727
_subpop_1: DOV_LGBT = 0
LGBT: DOV_LGBT = LGBT
--------------------------------------------------------------
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
phys_hlth |
_subpop_1 | 3.555177 .2547231 3.055579 4.054776
LGBT | 4.984303 .3125431 4.3713 5.597306
--------------------------------------------------------------
2) Is there a method to selecting which variables to inflate in the ZINB model?
3) Relatedly, what is the best method to use with ZINB regression to select which variables to include in the model at all? E.g., I have taken gender out because it was consistently showing up as nonsignificant no matter whether I inflated it or not, but is testing each of my ~15 possible independent variable one-by-one like that my only option? Can I use something like <gvselect> or forwards/backwards selection with a ZINB model, and if so, how?
Thank you!
- Dependent variable: "phys_hlth," which is the number of days in the last month when respondent's self-reported physical health was not good (0-30)
- Main predictor: "DOV_LGBT," which is LGBT identity (0=not LGBT-identified, 1=LGBT-identified)
- Other possible predictors/controls: age (continuous; "PPAGE" in the commands below), and recent experience of discrimination (binary; "discrim" in the commands below), as well as others such as race (5 categories), gender (binary), insurance status (binary), etc.
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
phys_hlth | 1,727 4.10249 8.11829 0 30
Because of the size of the variance relative to the mean, I moved from a ZIP model to ZINB. Putting some of the predictors mentioned above into the ZINB model returns good numbers for overall chi2, alpha, and the Vuong test, as well as for my main predictor (DOV_LGBT):
. zinb phys_hlth DOV_LGBT discrim PPAGE, inflate (PPAGE DOV_LGBT) vuong zip
Zero-inflated negative binomial regression Number of obs = 1,554
Nonzero obs = 634
Zero obs = 920
Inflation model = logit LR chi2(3) = 12.45
Log likelihood = -3113.399 Prob > chi2 = 0.0060
------------------------------------------------------------------------------
phys_hlth | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
phys_hlth |
DOV_LGBT | .2159001 .096089 2.25 0.025 .0275691 .404231
discrim | .2479829 .1694438 1.46 0.143 -.0841209 .5800867
PPAGE | .0087431 .0030843 2.83 0.005 .0026981 .0147882
_cons | 1.592725 .1875341 8.49 0.000 1.225165 1.960286
-------------+----------------------------------------------------------------
inflate |
PPAGE | .0126378 .0039695 3.18 0.001 .0048578 .0204179
DOV_LGBT | -.4053969 .1249087 -3.25 0.001 -.6502134 -.1605805
_cons | -.4047859 .2509455 -1.61 0.107 -.8966301 .0870582
-------------+----------------------------------------------------------------
/lnalpha | .2658189 .104051 2.55 0.011 .0618826 .4697553
-------------+----------------------------------------------------------------
alpha | 1.304499 .1357345 1.063837 1.599603
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 3673.90 Pr>=chibar2 = 0.0000
Vuong test of zinb vs. standard negative binomial: z = 6.47 Pr>z = 0.0000
What I am struggling with is the following:
1) Are there other goodness-of-fit tests that I should be running to ensure that the ZINB model is a good fit? I have used the margins command to estimate the predicted means after doing a robust ZINB regression, and these estimates are close to the actual means (even without the analytical weight), but I want to make sure I'm not stumbling blindly into the ZINB model because I can't think of any other approaches:
. margins DOV_LGBT
Predictive margins Number of obs = 1,554
Model VCE : Robust
Expression : Predicted number of events, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
DOV_LGBT |
0 | 3.432679 .2549347 13.46 0.000 2.933016 3.932342
LGBT | 5.237526 .35971 14.56 0.000 4.532507 5.942545
------------------------------------------------------------------------------
Actual means:
. mean phys_hlth [aw=weight_1], over(DOV_LGBT)
Mean estimation Number of obs = 1,727
_subpop_1: DOV_LGBT = 0
LGBT: DOV_LGBT = LGBT
--------------------------------------------------------------
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
phys_hlth |
_subpop_1 | 3.555177 .2547231 3.055579 4.054776
LGBT | 4.984303 .3125431 4.3713 5.597306
--------------------------------------------------------------
2) Is there a method to selecting which variables to inflate in the ZINB model?
3) Relatedly, what is the best method to use with ZINB regression to select which variables to include in the model at all? E.g., I have taken gender out because it was consistently showing up as nonsignificant no matter whether I inflated it or not, but is testing each of my ~15 possible independent variable one-by-one like that my only option? Can I use something like <gvselect> or forwards/backwards selection with a ZINB model, and if so, how?
Thank you!
Comment