Hello, good morning. I am predicting health utilization from two variables: number of times received health care (g5) and emergency care usage (g6).
Below are the frequency distribution of both variables, and their summary indicators:
As you can see there are a lot of "meaningful" zeros in the data, therefore taking logs will do not be appropriate as it will just generate a number of undefined values.
I am thinking of using a Poisson regression model or zero inflated regression. Is there any test I can use to check which of these models could be used to fit the data?
Thanks - cY
Below are the frequency distribution of both variables, and their summary indicators:
Code:
g5r -- RECODE of g5 (Number of times received health care in last 12 months) ----------------------------------------------------------- | Freq. Percent Valid Cum. --------------+-------------------------------------------- Valid 0 | 977 45.74 48.83 48.83 1 | 284 13.30 14.19 63.02 2 | 236 11.05 11.79 74.81 3 | 141 6.60 7.05 81.86 4 | 79 3.70 3.95 85.81 5 | 61 2.86 3.05 88.86 6 | 64 3.00 3.20 92.05 7 | 12 0.56 0.60 92.65 8 | 12 0.56 0.60 93.25 10 | 43 2.01 2.15 95.40 12 | 19 0.89 0.95 96.35 15 | 14 0.66 0.70 97.05 16 | 2 0.09 0.10 97.15 20 | 24 1.12 1.20 98.35 24 | 5 0.23 0.25 98.60 30 | 12 0.56 0.60 99.20 40 | 2 0.09 0.10 99.30 45 | 1 0.05 0.05 99.35 48 | 1 0.05 0.05 99.40 50 | 8 0.37 0.40 99.80 60 | 1 0.05 0.05 99.85 80 | 1 0.05 0.05 99.90 90 | 1 0.05 0.05 99.95 96 | 1 0.05 0.05 100.00 Total | 2001 93.68 100.00 Missing . | 135 6.32 Total | 2136 100.00 ----------------------------------------------------------- g6r -- RECODE of g6 (Number of times visited emergency room or hospital for own health) ----------------------------------------------------------- | Freq. Percent Valid Cum. --------------+-------------------------------------------- Valid 0 | 1050 49.16 90.44 90.44 1 | 82 3.84 7.06 97.50 2 | 18 0.84 1.55 99.05 3 | 5 0.23 0.43 99.48 4 | 2 0.09 0.17 99.66 5 | 1 0.05 0.09 99.74 9 | 1 0.05 0.09 99.83 12 | 1 0.05 0.09 99.91 20 | 1 0.05 0.09 100.00 Total | 1161 54.35 100.00 Missing . | 975 45.65 Total | 2136 100.00 ----------------------------------------------------------- . summarize g5r g6r, detail RECODE of g5 (Number of times received health care in last 12 months) ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 2,001 25% 0 0 Sum of Wgt. 2,001 50% 1 Mean 2.56022 Largest Std. Dev. 6.505882 75% 3 60 90% 6 80 Variance 42.3265 95% 10 90 Skewness 6.804919 99% 30 96 Kurtosis 68.22297 RECODE of g6 (Number of times visited emergency room or hospital for own health) ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,161 25% 0 0 Sum of Wgt. 1,161 50% 0 Mean .161068 Largest Std. Dev. .8564563 75% 0 5 90% 0 9 Variance .7335175 95% 1 12 Skewness 14.62739 99% 2 20 Kurtosis 292.1931
As you can see there are a lot of "meaningful" zeros in the data, therefore taking logs will do not be appropriate as it will just generate a number of undefined values.
I am thinking of using a Poisson regression model or zero inflated regression. Is there any test I can use to check which of these models could be used to fit the data?
Thanks - cY
Comment