ello, I am trying to model four variables: Cshame_tot, Cshame_totchrt, Cshame_totbhv, and Cshame_totbody. These variables were constructed by summing batteries of survey questions and measure the following phenomena: Cshame_tot represents total experiences of shame (the sum of the three subscale variables), Cshame_totchrt represents characterological shame, Cshame_totbhv represents behavioral shame, and Cshame_totbody represents bodily shame.
The summary statistics for all four variables are shown below. The variables are right-skewed, and as shown beneath the summary table, I also created four dummy variables where 0 denotes participants who reported zero across the entire battery of questions and 1 denotes any non-zero response.
I am deciding between using a zero-inflated negative binomial model or a two-part model, where the first equation is a logit and the second is a gamma model. However, I am unsure which model is more appropriate. I attempted to use the Park test, but the health econometrics textbook only describes how to choose among Gaussian, Poisson, Gamma, or inverse Gaussian models. Since the negative binomial model is a dispersion-adjusted Poisson, can I use the same cutoff used for Poisson to justify a negative binomial specification?
Additionally, because my data come from a survey, they include sampling weights. I conducted the Park test as shown below—does this appropriately account for the survey design and weights? Any guidance would be greatly appreciated.
The summary statistics for all four variables are shown below. The variables are right-skewed, and as shown beneath the summary table, I also created four dummy variables where 0 denotes participants who reported zero across the entire battery of questions and 1 denotes any non-zero response.
I am deciding between using a zero-inflated negative binomial model or a two-part model, where the first equation is a logit and the second is a gamma model. However, I am unsure which model is more appropriate. I attempted to use the Park test, but the health econometrics textbook only describes how to choose among Gaussian, Poisson, Gamma, or inverse Gaussian models. Since the negative binomial model is a dispersion-adjusted Poisson, can I use the same cutoff used for Poisson to justify a negative binomial specification?
Additionally, because my data come from a survey, they include sampling weights. I conducted the Park test as shown below—does this appropriately account for the survey design and weights? Any guidance would be greatly appreciated.
Code:
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
Cshame_tot | 1,218 11.86289 13.40673 0 75
Cshame_tot~t | 1,218 4.83908 6.627825 0 36
Cshame_tot~v | 1,218 4.820197 5.300317 0 27
Cshame_tot~y | 1,218 2.203612 2.910258 0 12
-> tabulation of Cshame_tot_zero
Cshame_tot_ |
zero | Freq. Percent Cum.
------------+-----------------------------------
0 | 194 15.93 15.93
1 | 1,024 84.07 100.00
------------+-----------------------------------
Total | 1,218 100.00
-> tabulation of Cshame_totchrt_zero
Cshame_totc |
hrt_zero | Freq. Percent Cum.
------------+-----------------------------------
0 | 411 33.74 33.74
1 | 807 66.26 100.00
------------+-----------------------------------
Total | 1,218 100.00
-> tabulation of Cshame_totbhv_zero
Cshame_totb |
hv_zero | Freq. Percent Cum.
------------+-----------------------------------
0 | 287 23.56 23.56
1 | 931 76.44 100.00
------------+-----------------------------------
Total | 1,218 100.00
-> tabulation of Cshame_totbody_zero
Cshame_totb |
ody_zero | Freq. Percent Cum.
------------+-----------------------------------
0 | 526 43.19 43.19
1 | 692 56.81 100.00
------------+-----------------------------------
Total | 1,218 100.00
Code:
****Park test
local contrls i.gad_2 i.phq_2 i.lifetimeptsdcriteria_positive ///
i.pastyeardud c.SF8MCS c.SF8PCS c.ppage i.race i.biosex ///
i.education i.income i.maritalstatus i.employment ///
i.anylifetimemhsa_treatment i.any_arrested
glm Cshame_tot `contrls' [pw=weights] if Cshame_tot > 0 , family(gamma) link(log)
****generate ln(raw residuals squared) and xbetahat for Park test
predict double rawresid, response
generate lnrawresid2 = ln(rawresid^2)
predict double xbetahat, xb
regress lnrawresid2 xbetahat

Comment