[Help Request] Interpreting Zero-Truncated Negative Binomial Regression

Jess Florian

Join Date: Oct 2015

Posts: 10
#1

[Help Request] Interpreting Zero-Truncated Negative Binomial Regression

17 May 2018, 21:04

Stata Version: Stata/MP 15.1
Operating System: Windows 7 Enterprise (SP1)

Hi Statalist members,

I am trying to build a zero-truncated negative binomial regression model, but am having difficulties interpreting the results. I am not experienced with this type of modelling, and have followed an online tutorial in building my model. However, the result I am getting is not making sense to me. Here is my problem:

My outcome variable is number of visits by a patient to a clinic in a year (count variable starting from 1; range 1-66, mean 4.2, SD 4.7, Var 22.5 Sk 4.6, Kur 37.8)
My co-variates are age (continuous from 0 to 100), gender (binary), and size of clinic (categorical; 4 categories)

The syntax I use is

Code:

nbreg noofvisitspy age i.gender_v i.clinicsize , ll(0)

followed by

Code:

margins gender_v, atmeans

to obtain estimates.

The model output is as follows:

and the estimate output is as follows:

0.75 visits per patient in a year for males ( or 0.9 for females) doesn't make sense when the lowest number of visits is 1. Why am I getting less than 1? Is there a step that I am missing (changing distribution, etc.) , or am I doing something completely wrong?
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

17 May 2018, 23:54

Originally posted by Jess Florian View Post

0.75 visits per patient in a year for males ( or 0.9 for females) doesn't make sense when the lowest number of visits is 1. Why am I getting less than 1? Is there a step that I am missing (changing distribution, etc.) , or am I doing something completely wrong?

Those are the average number of visits at the means of all the variables. It's in the nature of averages that some people will have more than the average, some will have less, and in the end, if the model is close enough, things should work out such that everyone gets ... the average.

But wait!!!!!, you say. This is a truncated negative binomial! Everyone has at least one visit. Well, perhaps an example will help. Of note, better to put results in code delimiters as outlined in my signature, and also, you report that you ran a truncated negative binomial, but you gave us syntax for a regular negative binomial regression.

In any case, let's simulate a population whose rate of visits is Poisson distributed with means as given in your margins (rounded a bit because I'm lazy).

Code:

clear set obs 100000 set seed 1092 gen gender = rbinomial(1,0.5) gen visits = . replace visits = rpoisson(0.9086) if gender == 0 replace visits = rpoisson(0.7490) if gender == 1 label define gender 0 "Female" 1 "Male" label values gender gender twoway histogram visits, discrete by(gender) percent mean visits, over(gender)

So, with a mean number of visits as defined by the Poisson parameters I used, you can have a large number of visits. I don't get 66 visits in this example, but it's a much simplified one, and 4 people had 7 visits (the max). You can verify with the mean command that the mean visits are what they are.

Say you run a regular Poisson regression:

Code:

quietly poisson visits i.gender margins gender Adjusted predictions Number of obs = 100,000 Model VCE : OIM Expression : Predicted number of events, predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | Female | .918369 .0042929 213.93 0.000 .9099552 .9267828 Male | .7477973 .0038609 193.69 0.000 .7402301 .7553645 ------------------------------------------------------------------------------

You get what you expect. Now, let's truncate the number of visits at 0, then run a Poisson regression:

Code:

gen trunc_visits = visits replace trunc_visits = . if visits == 0 quietly poisson trunc_visits i.gender margins gender Adjusted predictions Number of obs = 56,555 Model VCE : OIM Expression : Predicted number of events, predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | Female | 1.520314 .0071066 213.93 0.000 1.506385 1.534242 Male | 1.418191 .0073221 193.69 0.000 1.40384 1.432543 ------------------------------------------------------------------------------

Now, the predicted mean visits are way off, but that's because we failed to account for truncation. Remember, truncation means that people can have 0 visits (in this example), but we do not observe them at all (note the number of obs in the Poisson regression on the truncated visit count). However, if you run the -tpoisson- command (whose lower limit defaults to 0 if you don't specify anything):

Code:

quietly tpoisson trunc_visits i.gender margins gender Adjusted predictions Number of obs = 56,555 Model VCE : OIM Expression : Predicted number of events, predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | Female | .9057349 .0068196 132.81 0.000 .8923686 .9191011 Male | .7447777 .0067287 110.69 0.000 .7315897 .7579657 ------------------------------------------------------------------------------

Now you get the correct parameters, despite the fact that in this regression, you didn't observe the people with 0 visits. This is a simplified version of what you have, but the same concept applies - you have assumed that 0 visits are possible but not in your dataset, that the number of visits has a negative binomial distribution with parameters to be estimated, and that the covariates you entered predict those parameters well enough. Even though you see visit counts ranging from 1 to 66, if we assume what I said in the last sentence, it makes sense that males on average could have an average of 0.91 visits.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Jess Florian

Join Date: Oct 2015

Posts: 10
#3

18 May 2018, 00:41

Originally posted by Weiwen Ng View Post

Those are the average number of visits at the means of all the variables. It's in the nature of averages that some people will have more than the average, some will have less, and in the end, if the model is close enough, things should work out such that everyone gets ... the average.

That makes sense, thanks for explaining.

but you gave us syntax for a regular negative binomial regression.

Oops, my fault. I meant to write tnbreg .

you have assumed that 0 visits are possible but not in your dataset

I want to make sure I have understood this. The model accounts for the possibility that there could be zero visits (which is true), but we don't observe that in our dataset (because we don't have the data for it). In which case, it makes sense that there could be <1 visits.

Thanks for going over it in detail, I really appreciate it.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

18 May 2018, 06:40

Originally posted by Jess Florian View Post

I want to make sure I have understood this. The model accounts for the possibility that there could be zero visits (which is true), but we don't observe that in our dataset (because we don't have the data for it). In which case, it makes sense that there could be <1 visits.

Thanks for going over it in detail, I really appreciate it.

That's correct, the model accounts for the fact that people could have had 0 visits, but they weren't observed or detected.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Jess Florian

Join Date: Oct 2015

Posts: 10
#5

20 May 2018, 22:27

Originally posted by Weiwen Ng View Post

That's correct, the model accounts for the fact that people could have had 0 visits, but they weren't observed or detected.

Thank you very much. My estimates make much more sense now.
Comment

Announcement

[Help Request] Interpreting Zero-Truncated Negative Binomial Regression

Comment

Comment

Comment

Comment