Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choice of regression model, excess 0 and bounded data

    My data is the confidence of 3 groups of clinicians in assessing a medical emergency and their response was measured on a 100 mm VAS scale. Thus the data is bounded on 0~100, there is an excess of 0s (41%), and the other scores have low frequencies.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long group byte(cmoe _freq)
    1  0  5
    1  6  3
    1  9  1
    1 11  1
    1 12  1
    1 14  1
    1 18  1
    1 24  1
    1 26  2
    1 27  1
    1 34  1
    1 73  1
    2  0 12
    2  5  3
    2 10  1
    2 17  1
    2 18  1
    2 20  2
    2 25  1
    2 30  1
    2 31  1
    2 32  1
    2 38  1
    2 40  1
    3  0 12
    3  8  1
    3  9  1
    3 14  1
    3 18  1
    3 19  1
    3 22  1
    3 23  2
    3 24  1
    3 29  1
    3 32  1
    3 34  1
    3 44  1
    3 62  1
    end
    label values group group
    label def group 1 "A", modify
    label def group 2 "B", modify
    label def group 3 "C", modify
    I am uncertain on the correct approach for the analysis of this data. I considered zero-inflated poisson but am concerned that the data may not be considered count data.

    Would the new Zero-inflated ordered logit regression, ziologit, be appropriate, or are there any other methods that could be recommended.


    Thank you.
    Janet

  • #2
    Thank you for providing a reproducible data example and description of your data. I would go with simple linear regression (with optional use of robust sandwich variance estimation).

    Code:
    . reg cmoe i.group [fw=_freq]
    
          Source |       SS           df       MS      Number of obs   =        71
    -------------+----------------------------------   F(2, 68)        =      0.36
           Model |  185.250128         2  92.6250642   Prob > F        =    0.6967
        Residual |  17333.2287        68  254.900423   R-squared       =    0.0106
    -------------+----------------------------------   Adj R-squared   =   -0.0185
           Total |  17518.4789        70  250.263984   Root MSE        =    15.966
    
    ------------------------------------------------------------------------------
            cmoe | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           group |
              B  |  -3.983806    4.81868    -0.83   0.411    -13.59933     5.63172
              C  |  -1.483806    4.81868    -0.31   0.759    -11.09933     8.13172
                 |
           _cons |   15.36842    3.66276     4.20   0.000     8.059497    22.67735
    ------------------------------------------------------------------------------
    The above is approach is reasonable because most people would regard a VAS scale as providing a reasonably continuous data scale. Yes it is bounded, but you won't run into those issues here as you don't have any other covariates that could make the predicted mean VAS score fall outside of that range. Secondly, you are likely interested in mean differences between groups, and this model directly addresses this effect measure.

    Ordinal models generally won't work because those models tend not to perform well (or can't be fit) with ordinal outcomes that have more than about 10 levels. Poisson models in the traditional sense fail as these are count data, and even though it is easy to relax this assumption, you will end up with identical results to the regression model above.

    Comment


    • #3
      Thank you for your advice, especially concerning the ordinal models. I will follow the regression route; for some reason I thought that the excess number of 0s would cause problems with the regression.

      Janet

      Comment


      • #4
        Dear Janet Hill,

        The mass point at zero could cause problems if you had some regressors with a large support, but just with dummies that is not an issue as Leonardo noted. In case you had other kinds of regressors, I would suggest Poisson regression because you do not have any observations near the upper bound. Finally, it is not correct to talk about excess zeros in this context; excess with respect to what?

        Best wishes,

        Joao

        Comment


        • #5
          Thank you Joao - 'excess' was sloppy terminology, I just meant that the majority of results,~40%, were 0 which was not unexpected given the level of training of the clinicians. If the project organisers decide to look at additional regressors then I will have to consider Poisson.

          Janet

          Comment


          • #6
            Here's a plot of the sample data: a quantile-box plot with an extra line for the means.

            Just about any not too wild model does not find much difference here, including Poisson with robust standard errors.

            Click image for larger version

Name:	cmoe.png
Views:	1
Size:	30.3 KB
ID:	1665862

            Last edited by Nick Cox; 22 May 2022, 15:57.

            Comment


            • #7
              Thank you for the plot, I was wondering what the 'best' way to show the data and distribution was.

              Janet

              Comment


              • #8
                Here's code for the graph in #6 and one similar. The box plots make sense once you see that in each case more than 25% of values are zero, so that is the report for the minimum and the lower quartile.

                stripplot is from SSC.


                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input long group byte(cmoe _freq)
                1  0  5
                1  6  3
                1  9  1
                1 11  1
                1 12  1
                1 14  1
                1 18  1
                1 24  1
                1 26  2
                1 27  1
                1 34  1
                1 73  1
                2  0 12
                2  5  3
                2 10  1
                2 17  1
                2 18  1
                2 20  2
                2 25  1
                2 30  1
                2 31  1
                2 32  1
                2 38  1
                2 40  1
                3  0 12
                3  8  1
                3  9  1
                3 14  1
                3 18  1
                3 19  1
                3 22  1
                3 23  2
                3 24  1
                3 29  1
                3 32  1
                3 34  1
                3 44  1
                3 62  1
                end
                label values group group
                label def group 1 "A", modify
                label def group 2 "B", modify
                label def group 3 "C", modify
                
                expand _freq 
                
                set scheme s1color  
                Click image for larger version

Name:	cmoe2.png
Views:	1
Size:	32.3 KB
ID:	1665969
                stripplot cmoe, over(group) vertical centre cumul cumprob box refline yla(, ang(h)) stripplot cmoe, over(group) vertical cumul cumprob box(barw(0.05)) pctile(0) boffset(-0.1) refline yla(, ang(h))

                Comment


                • #9
                  Dear all,
                  Thanks for this interesting exchange. I am facing a similar issue and wondered if I could ask for your advice on the matter: https://www.statalist.org/forums/for...es#post1694672

                  Comment

                  Working...
                  X