Analyzing "counts" containing decimals

Amy Mericle

Join Date: Feb 2017

Posts: 2
#1

Analyzing "counts" containing decimals

22 Feb 2017, 14:46

Dear Stata Users-I am using Stata v14.1 to analyze a count variable but some of the values for the counts are estimates and contain decimals. Stata allows me to run model but produces the following note: "you are responsible for interpretation of non-count dep. variable". How is Stata treating the values with decimals, say the value 4.23333? Does it drop the decimal, round to the nearest whole number, or treat the value in the example of 4.23333 as a count in between 4 and 5. Where is the best place to look to find something that would guide my interpretation of model coeffecients produced when a dependent variable with non-whole number integers?
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4494
#2

22 Feb 2017, 14:57

you don't say what command you are using; note, however, that -poisson- for example is not limited to count variables; for an example, see #3 of http://www.statalist.org/forums/foru...xplainable-low
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35792
#3

22 Feb 2017, 15:19

Rich is quite right. In general, Stata won't force anything to integer unless so instructed.
Comment
Amy Mericle

Join Date: Feb 2017

Posts: 2
#4

22 Feb 2017, 15:35

Important bit of information I just noticed that I left out of the above post...I am using Stata v14.1 to analyze a count variable with negative binomial regression. The dependent variable is a count, but some of these counts are estimates and contain decimals. Hopefully that makes answering the question about what Stata in the nbreg routines easier.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35792
#5

22 Feb 2017, 16:23

It is easy to try experiments like this:

1. Do negative binomial regression with a count response.

2. Generate a new response by adding 0.1 to that.

3. See that the results change. If Stata rounded the response, that would not happen.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2205
#6

22 Feb 2017, 22:04

Both

Code:

poisson

and

Code:

nbreg

leave the values unchanged, which is why it gives you the warning. Of course, an exponential functional form for E(y|x) still makes sense for non-count variables provided y >= 0, so that's not an issue. But I would opt for Poisson regression with robust standard errors because we know the Poisson estimator is fully robust to distributional misspecification. The negative binomial is not, and since y cannot have a negative binomial distribution, technically it is inconsistent. Having said that, Poisson and NegBin often give similar estimates of the mean parameters, as they should of the NegBin model is correct.
Comment
Evan Sommer

Join Date: Jun 2015

Posts: 18
#7

11 Mar 2019, 12:35

I found the above to be very helpful! Does anyone happen to have a reference about using Poisson regression with non-integer "count-like" outcomes? The excerpt below from the Statalist archives is the best I could find, but I would like to be able to cite something formally published as well: https://www.stata.com/statalist/arch.../msg00213.html

I'm concerned that someone might interpret what David wrote to mean:

1. There may be practical problems using -poisson- to run
log-linear regressions, depending on whether the LHS variable
contains noninteger values.

2. There may be theoretical problems using -poisson- to run
log-linear regressions.

Neither would be true. My short-and-quick response is,

1. -poisson- can handle non-discrete (non-integer) data values.
Left-hand-side values do not have to be large to ammelorate any
problem.

2. The formulas in the blog are as intended and are correct.

Let me explain.

Concerning #1, -poisson- does not round values when run on noninteger
data. Instead, it gives the warning message "you are responsible for
interpreation of noncount dep. variable."

An implication of that is that the objective function with non-integer
data may not be a true likelihood function. Actually, I suspect that
it is, but that's irrelevant because we in the blog entry are doing M
estimation and I recommended you obtain standard errors using the
-vce(robust)- option.

When -poisson- calculates the likelihood value associated with a
noninteger value, it does that using the standard formulas, but
substituting the Gamma function for factorial function. That is
appropriate for M estimation.

This generalization means that you can run -poisson- using a LHS
variable with noninteger values and there will be no problems. All
the values, in fact, can even be less than 1! Whether you run on y,
y/10, y/100, y/1000, ..., all that will change will be the intercept.

There are a few posts on stackexchange criticizing the use of Poisson regression on non-integer count data, but they don't seem to be as well-thought-out as the approaches taken here.

Thank you!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35792
#8

11 Mar 2019, 12:46

With a nod to #6 why not look at https://mitpress.mit.edu/books/econo...second-edition ?
Comment
Evan Sommer

Join Date: Jun 2015

Posts: 18
#9

12 Mar 2019, 15:16

Just wanted to thank you for the reply, Nick. I've obtained the first edition of the book you linked and will post anything relevant for others to use in the future.
Comment

Announcement

Analyzing "counts" containing decimals

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment