Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Help Request] Interpreting Zero-Truncated Negative Binomial Regression

    Stata Version: Stata/MP 15.1
    Operating System: Windows 7 Enterprise (SP1)

    Hi Statalist members,

    I am trying to build a zero-truncated negative binomial regression model, but am having difficulties interpreting the results. I am not experienced with this type of modelling, and have followed an online tutorial in building my model. However, the result I am getting is not making sense to me. Here is my problem:

    My outcome variable is number of visits by a patient to a clinic in a year (count variable starting from 1; range 1-66, mean 4.2, SD 4.7, Var 22.5 Sk 4.6, Kur 37.8)
    My co-variates are age (continuous from 0 to 100), gender (binary), and size of clinic (categorical; 4 categories)

    The syntax I use is
    Code:
     nbreg noofvisitspy age i.gender_v i.clinicsize , ll(0)
    followed by
    Code:
     margins gender_v, atmeans
    to obtain estimates.

    The model output is as follows:

    Click image for larger version

Name:	model_output.png
Views:	1
Size:	39.4 KB
ID:	1444697


    and the estimate output is as follows:

    Click image for larger version

Name:	estimates_output.png
Views:	1
Size:	13.9 KB
ID:	1444698



    0.75 visits per patient in a year for males ( or 0.9 for females) doesn't make sense when the lowest number of visits is 1. Why am I getting less than 1? Is there a step that I am missing (changing distribution, etc.) , or am I doing something completely wrong?


  • #2
    Originally posted by Jess Florian View Post

    0.75 visits per patient in a year for males ( or 0.9 for females) doesn't make sense when the lowest number of visits is 1. Why am I getting less than 1? Is there a step that I am missing (changing distribution, etc.) , or am I doing something completely wrong?
    Those are the average number of visits at the means of all the variables. It's in the nature of averages that some people will have more than the average, some will have less, and in the end, if the model is close enough, things should work out such that everyone gets ... the average.

    But wait!!!!!, you say. This is a truncated negative binomial! Everyone has at least one visit. Well, perhaps an example will help. Of note, better to put results in code delimiters as outlined in my signature, and also, you report that you ran a truncated negative binomial, but you gave us syntax for a regular negative binomial regression.

    In any case, let's simulate a population whose rate of visits is Poisson distributed with means as given in your margins (rounded a bit because I'm lazy).

    Code:
    clear
    set obs 100000
    set seed 1092
    gen gender = rbinomial(1,0.5)
    gen visits  = .
    replace visits = rpoisson(0.9086) if gender == 0
    replace visits = rpoisson(0.7490) if gender == 1
    label define gender 0 "Female" 1 "Male"
    label values gender gender
    twoway histogram visits, discrete by(gender) percent
    mean visits, over(gender)
    So, with a mean number of visits as defined by the Poisson parameters I used, you can have a large number of visits. I don't get 66 visits in this example, but it's a much simplified one, and 4 people had 7 visits (the max). You can verify with the mean command that the mean visits are what they are.

    Say you run a regular Poisson regression:

    Code:
    quietly poisson visits i.gender
    margins gender
    
    Adjusted predictions                            Number of obs     =    100,000
    Model VCE    : OIM
    
    Expression   : Predicted number of events, predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          gender |
         Female  |    .918369   .0042929   213.93   0.000     .9099552    .9267828
           Male  |   .7477973   .0038609   193.69   0.000     .7402301    .7553645
    ------------------------------------------------------------------------------
    You get what you expect. Now, let's truncate the number of visits at 0, then run a Poisson regression:

    Code:
    gen trunc_visits = visits
    replace trunc_visits = . if visits == 0
    quietly poisson trunc_visits i.gender
    margins gender
    
    Adjusted predictions                            Number of obs     =     56,555
    Model VCE    : OIM
    
    Expression   : Predicted number of events, predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          gender |
         Female  |   1.520314   .0071066   213.93   0.000     1.506385    1.534242
           Male  |   1.418191   .0073221   193.69   0.000      1.40384    1.432543
    ------------------------------------------------------------------------------
    Now, the predicted mean visits are way off, but that's because we failed to account for truncation. Remember, truncation means that people can have 0 visits (in this example), but we do not observe them at all (note the number of obs in the Poisson regression on the truncated visit count). However, if you run the -tpoisson- command (whose lower limit defaults to 0 if you don't specify anything):

    Code:
    quietly tpoisson trunc_visits i.gender
    margins gender
    Adjusted predictions                            Number of obs     =     56,555
    Model VCE    : OIM
    
    Expression   : Predicted number of events, predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          gender |
         Female  |   .9057349   .0068196   132.81   0.000     .8923686    .9191011
           Male  |   .7447777   .0067287   110.69   0.000     .7315897    .7579657
    ------------------------------------------------------------------------------
    Now you get the correct parameters, despite the fact that in this regression, you didn't observe the people with 0 visits. This is a simplified version of what you have, but the same concept applies - you have assumed that 0 visits are possible but not in your dataset, that the number of visits has a negative binomial distribution with parameters to be estimated, and that the covariates you entered predict those parameters well enough. Even though you see visit counts ranging from 1 to 66, if we assume what I said in the last sentence, it makes sense that males on average could have an average of 0.91 visits.
    Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

    Code:
    ssc install dataex

    Comment


    • #3
      Originally posted by Weiwen Ng View Post

      Those are the average number of visits at the means of all the variables. It's in the nature of averages that some people will have more than the average, some will have less, and in the end, if the model is close enough, things should work out such that everyone gets ... the average.
      That makes sense, thanks for explaining.

      but you gave us syntax for a regular negative binomial regression.
      Oops, my fault. I meant to write tnbreg .

      you have assumed that 0 visits are possible but not in your dataset
      I want to make sure I have understood this. The model accounts for the possibility that there could be zero visits (which is true), but we don't observe that in our dataset (because we don't have the data for it). In which case, it makes sense that there could be <1 visits.


      Thanks for going over it in detail, I really appreciate it.


      Comment


      • #4
        Originally posted by Jess Florian View Post

        I want to make sure I have understood this. The model accounts for the possibility that there could be zero visits (which is true), but we don't observe that in our dataset (because we don't have the data for it). In which case, it makes sense that there could be <1 visits.

        Thanks for going over it in detail, I really appreciate it.
        That's correct, the model accounts for the fact that people could have had 0 visits, but they weren't observed or detected.
        Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

        Code:
        ssc install dataex

        Comment


        • #5
          Originally posted by Weiwen Ng View Post

          That's correct, the model accounts for the fact that people could have had 0 visits, but they weren't observed or detected.
          Thank you very much. My estimates make much more sense now.

          Comment

          Working...
          X