Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analyzing "counts" containing decimals

    Dear Stata Users-I am using Stata v14.1 to analyze a count variable but some of the values for the counts are estimates and contain decimals. Stata allows me to run model but produces the following note: "you are responsible for interpretation of non-count dep. variable". How is Stata treating the values with decimals, say the value 4.23333? Does it drop the decimal, round to the nearest whole number, or treat the value in the example of 4.23333 as a count in between 4 and 5. Where is the best place to look to find something that would guide my interpretation of model coeffecients produced when a dependent variable with non-whole number integers?

  • #2
    you don't say what command you are using; note, however, that -poisson- for example is not limited to count variables; for an example, see #3 of http://www.statalist.org/forums/foru...xplainable-low

    Comment


    • #3
      Rich is quite right. In general, Stata won't force anything to integer unless so instructed.

      Comment


      • #4
        Important bit of information I just noticed that I left out of the above post...I am using Stata v14.1 to analyze a count variable with negative binomial regression. The dependent variable is a count, but some of these counts are estimates and contain decimals. Hopefully that makes answering the question about what Stata in the nbreg routines easier.

        Comment


        • #5
          It is easy to try experiments like this:

          1. Do negative binomial regression with a count response.

          2. Generate a new response by adding 0.1 to that.

          3. See that the results change. If Stata rounded the response, that would not happen.

          Comment


          • #6
            Both
            Code:
            poisson
            and
            Code:
            nbreg
            leave the values unchanged, which is why it gives you the warning. Of course, an exponential functional form for E(y|x) still makes sense for non-count variables provided y >= 0, so that's not an issue. But I would opt for Poisson regression with robust standard errors because we know the Poisson estimator is fully robust to distributional misspecification. The negative binomial is not, and since y cannot have a negative binomial distribution, technically it is inconsistent. Having said that, Poisson and NegBin often give similar estimates of the mean parameters, as they should of the NegBin model is correct.

            Comment


            • #7
              I found the above to be very helpful! Does anyone happen to have a reference about using Poisson regression with non-integer "count-like" outcomes? The excerpt below from the Statalist archives is the best I could find, but I would like to be able to cite something formally published as well: https://www.stata.com/statalist/arch.../msg00213.html

              I'm concerned that someone might interpret what David wrote to mean:

              1. There may be practical problems using -poisson- to run
              log-linear regressions, depending on whether the LHS variable
              contains noninteger values.

              2. There may be theoretical problems using -poisson- to run
              log-linear regressions.

              Neither would be true. My short-and-quick response is,

              1. -poisson- can handle non-discrete (non-integer) data values.
              Left-hand-side values do not have to be large to ammelorate any
              problem.

              2. The formulas in the blog are as intended and are correct.

              Let me explain.

              Concerning #1, -poisson- does not round values when run on noninteger
              data. Instead, it gives the warning message "you are responsible for
              interpreation of noncount dep. variable."

              An implication of that is that the objective function with non-integer
              data may not be a true likelihood function. Actually, I suspect that
              it is, but that's irrelevant because we in the blog entry are doing M
              estimation and I recommended you obtain standard errors using the
              -vce(robust)- option.

              When -poisson- calculates the likelihood value associated with a
              noninteger value, it does that using the standard formulas, but
              substituting the Gamma function for factorial function. That is
              appropriate for M estimation.

              This generalization means that you can run -poisson- using a LHS
              variable with noninteger values and there will be no problems. All
              the values, in fact, can even be less than 1! Whether you run on y,
              y/10, y/100, y/1000, ..., all that will change will be the intercept.
              There are a few posts on stackexchange criticizing the use of Poisson regression on non-integer count data, but they don't seem to be as well-thought-out as the approaches taken here.

              Thank you!

              Comment


              • #8
                With a nod to #6 why not look at https://mitpress.mit.edu/books/econo...second-edition ?

                Comment


                • #9
                  Just wanted to thank you for the reply, Nick. I've obtained the first edition of the book you linked and will post anything relevant for others to use in the future.

                  Comment

                  Working...
                  X