Choice of regression model, excess 0 and bounded data

Janet Hill

Join Date: Apr 2015

Posts: 38
#1

Choice of regression model, excess 0 and bounded data

22 May 2022, 09:47

My data is the confidence of 3 groups of clinicians in assessing a medical emergency and their response was measured on a 100 mm VAS scale. Thus the data is bounded on 0~100, there is an excess of 0s (41%), and the other scores have low frequencies.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long group byte(cmoe _freq) 1 0 5 1 6 3 1 9 1 1 11 1 1 12 1 1 14 1 1 18 1 1 24 1 1 26 2 1 27 1 1 34 1 1 73 1 2 0 12 2 5 3 2 10 1 2 17 1 2 18 1 2 20 2 2 25 1 2 30 1 2 31 1 2 32 1 2 38 1 2 40 1 3 0 12 3 8 1 3 9 1 3 14 1 3 18 1 3 19 1 3 22 1 3 23 2 3 24 1 3 29 1 3 32 1 3 34 1 3 44 1 3 62 1 end label values group group label def group 1 "A", modify label def group 2 "B", modify label def group 3 "C", modify

I am uncertain on the correct approach for the analysis of this data. I considered zero-inflated poisson but am concerned that the data may not be considered count data.

Would the new Zero-inflated ordered logit regression, ziologit, be appropriate, or are there any other methods that could be recommended.

Thank you.
Janet
Tags: None

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2400

22 May 2022, 10:18

Thank you for providing a reproducible data example and description of your data. I would go with simple linear regression (with optional use of robust sandwich variance estimation).

Code:

. reg cmoe i.group [fw=_freq]

      Source |       SS           df       MS      Number of obs   =        71
-------------+----------------------------------   F(2, 68)        =      0.36
       Model |  185.250128         2  92.6250642   Prob > F        =    0.6967
    Residual |  17333.2287        68  254.900423   R-squared       =    0.0106
-------------+----------------------------------   Adj R-squared   =   -0.0185
       Total |  17518.4789        70  250.263984   Root MSE        =    15.966

------------------------------------------------------------------------------
        cmoe | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       group |
          B  |  -3.983806    4.81868    -0.83   0.411    -13.59933     5.63172
          C  |  -1.483806    4.81868    -0.31   0.759    -11.09933     8.13172
             |
       _cons |   15.36842    3.66276     4.20   0.000     8.059497    22.67735
------------------------------------------------------------------------------

The above is approach is reasonable because most people would regard a VAS scale as providing a reasonably continuous data scale. Yes it is bounded, but you won't run into those issues here as you don't have any other covariates that could make the predicted mean VAS score fall outside of that range. Secondly, you are likely interested in mean differences between groups, and this model directly addresses this effect measure.

Ordinal models generally won't work because those models tend not to perform well (or can't be fit) with ordinal outcomes that have more than about 10 levels. Poisson models in the traditional sense fail as these are count data, and even though it is easy to relax this assumption, you will end up with identical results to the regression model above.

Comment

Janet Hill

Join Date: Apr 2015

Posts: 38
#3

22 May 2022, 11:10

Thank you for your advice, especially concerning the ordinal models. I will follow the regression route; for some reason I thought that the excess number of 0s would cause problems with the regression.

Janet
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3008
#4

22 May 2022, 11:29

Dear Janet Hill,

The mass point at zero could cause problems if you had some regressors with a large support, but just with dummies that is not an issue as Leonardo noted. In case you had other kinds of regressors, I would suggest Poisson regression because you do not have any observations near the upper bound. Finally, it is not correct to talk about excess zeros in this context; excess with respect to what?

Best wishes,

Joao
Comment
Janet Hill

Join Date: Apr 2015

Posts: 38
#5

22 May 2022, 11:55

Thank you Joao - 'excess' was sloppy terminology, I just meant that the majority of results,~40%, were 0 which was not unexpected given the level of training of the clinicians. If the project organisers decide to look at additional regressors then I will have to consider Poisson.

Janet
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35671
#6

22 May 2022, 15:32

Here's a plot of the sample data: a quantile-box plot with an extra line for the means.

Just about any not too wild model does not find much difference here, including Poisson with robust standard errors.

Last edited by Nick Cox; 22 May 2022, 15:57.
Comment
Janet Hill

Join Date: Apr 2015

Posts: 38
#7

23 May 2022, 04:53

Thank you for the plot, I was wondering what the 'best' way to show the data and distribution was.

Janet
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35671

23 May 2022, 09:34

Here's code for the graph in #6 and one similar. The box plots make sense once you see that in each case more than 25% of values are zero, so that is the report for the minimum and the lower quartile.

stripplot is from SSC.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long group byte(cmoe _freq)
1  0  5
1  6  3
1  9  1
1 11  1
1 12  1
1 14  1
1 18  1
1 24  1
1 26  2
1 27  1
1 34  1
1 73  1
2  0 12
2  5  3
2 10  1
2 17  1
2 18  1
2 20  2
2 25  1
2 30  1
2 31  1
2 32  1
2 38  1
2 40  1
3  0 12
3  8  1
3  9  1
3 14  1
3 18  1
3 19  1
3 22  1
3 23  2
3 24  1
3 29  1
3 32  1
3 34  1
3 44  1
3 62  1
end
label values group group
label def group 1 "A", modify
label def group 2 "B", modify
label def group 3 "C", modify

expand _freq 

set scheme s1color  

stripplot cmoe, over(group) vertical centre cumul cumprob box refline yla(, ang(h))

stripplot cmoe, over(group) vertical cumul cumprob box(barw(0.05)) pctile(0) boffset(-0.1) refline yla(, ang(h))

Comment

Adelio Antunes

Join Date: Jun 2022

Posts: 4
#9

23 Jan 2023, 05:44

Dear all,
Thanks for this interesting exchange. I am facing a similar issue and wondered if I could ask for your advice on the matter: https://www.statalist.org/forums/for...es#post1694672
Comment

Announcement