Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GLM Fracreg or Count model

    Hello,
    I have received contradictory suggestions on how to analyze certain data and would love to get some feedback.
    My dependent variable is the number of bills initiated by each legislator that was approved by the chamber (my independent variables are a series of legislator traits and contextual features). I also have data on the total number of bills each legislator initiated.
    The suggestions I received were:
    1) The data appears to be grouped binary data. Use GLM family (binomial) link(logit), with the number of approved bills as the dependent variable and the total number of bills initiated by the legislator as the number of trails.
    2) Use fractional logistic regression. Use the number of bills approved and the total number of bills initiated to create a percentage approved variable and use that as the dependent variable and run a Fracreg logit regression.
    3) Use a count model. Poisson or NBREG with the total number of bills passed as the dependent variable and include the total number of initiated variables as an independent variable.

    What model appears to be more appropriate?
    I have a lot of zeros (half of the legislators who initiated bills don't have any that was approved). Should this affect my choice of model? (e.g. zero-inflated count or beta regression)

    Thank you,
    Eduardo

  • #2
    I would not go with 3 because that doesn’t impose the logical upper bound. In theory, under a full distributional assumption, #1 is more efficient than #2 because the binomial uses info on the upper bound. The binomial and fractional response are both robust to any kind of distributional misspecification, something I discuss in Chapter 18 of my MIT Press book. Trying both would be good. If you want to settle on one, I’d go with the binomial quasi-MLE.

    Comment


    • #3
      Thank you very much Professor Wooldridge.
      A clarification: by QMLE, do you mean glm with family (binomial) ling(logit) robust AND the irls option?

      I was also wondering whether the number of zeros a problem.
      Last edited by Eduardo Aleman; 12 Sep 2020, 22:15.

      Comment


      • #4
        Eduardo: I believe that what we mostly want to do is estimate effects on means (and, more increasingly, medians and quantiles -- but that's harder with count data). If that is your main interest, I wouldn't worry about the zeros. These methods can be applied even if y is not a count variable. Because you have a natural upper bound, it seems sensible to use that information. If you don't want to condition on the upper bound -- the number of bills initiated -- and directly model the pass rate, then I would use fractional logit.

        No reason to use anything other than the usual maximizing the log likelihood. The binomial distribution is in the linear exponential family and so it delivers consistent estimates of the conditional mean regardless of the actual distribution.

        Comment


        • #5
          Thank you very much.

          Comment

          Working...
          X