Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Research Setup: Count Model (ZINB) or totally different approach?

    Hey there,

    I would be delighted to get your opnion on the following question:

    I am a little stuck at this moment (and I did not find a similar case in the forum) and am not sure if I am on track with the regression I choose.

    - The dependent variable of the data is: USD Funding of companies (Hence, it is either 0 or can be VERY high (up to 20 Mio, USD) Also the "0" can be differentiated between: Company did not even apply for funding, and 0 company applied for funding but did not receive any)

    After doing my research, I thought the answer of the right setup must be in the Count-Regression-Family.
    -> Also, the dependent variable shows data overdisperssion (indicator for better fit of neg. binominal)
    -> Furthermore, the data contains " excess zeros" for which zero-inflated models intend to account for.

    --> Hence, my initial conclusion was the to use the ZINB model

    However, the Dependent variable is not a classical count variable, correct (USD)? Hence, it is not that we had 1,2,3,4 or 5 trials or separate draws. So I am wondering whether it is "okay" to treat the USD funding as a count variable.

    Any guidance, thoughts and comments are highly welcome.

    Already thanks a lot in advance.
    Kind regards
    Dan














  • #2
    Dan: As I read it, your case is not really a classical count-data situation but rather what has been called in the literature a "double hurdle" model in which the zeros can arise for two (possibly correlated) reasons. An early reference is this paper by Deaton and Irish, but if you search around on double-hurdle I'm sure you'll find lots more and more recent.

    http://www.princeton.edu/~deaton/dow...penditures.pdf

    Comment


    • #3
      Hey John,

      thanks so much for this swift reply - very helpful food for thought! If I may ask a followup question right away:

      - I did some research on the double-hurdle model and in my understanding it work as follows:
      - A probit/logit model in the first tier and a truncated normal model (possion/binominal) in the second.

      Now, in my mind the DV funding cannot be considered a zero-truncated variable in the second part of the model. Why?- The data contains companies that did NOT apply for funding and hence are "certain zeros". The rest of the companies applied for funding and either received it (non-zero values) or did not receive it (sampling zeros). Hence, in my understanding hurdle models are not applicable for that?

      But I also agree that the data is also not really suitable for a countmodel like ZINB.

      Would you share my opinion above or did I overlook something?

      Many thanks and kind regards
      Dan

      Comment


      • #4
        Dan: I may be misinterpreting, but here is my sense of your setup. First, there is a latent variable (y1*) describing the propensity to apply that is manifested as one if an application takes place, zero else (y1). Second, there is a latent variable (y2*) describing the magnitude of expenditure, that is manifested as some positive value if positive, zero else (y2). Often, although not necessarily, it is assumed that (y1*,y2*) are distributed as a bivariate normal distribution conditional on x.

        I would emphasize that there are different types of "hurdle" models that have appeared, but the one I have in mind is the one exposited in the Deaton-Irish paper I mentioned in #2. In essence, the observed outcome is y1*y2.

        The "truncated normal" version of the hurdle model to which you refer is probably a reference to a different "hurdle" characterization, sometimes called Craggit after a 1971 Econometrica paper by John Cragg.

        I hope this is a reasonable interpretation of what you are considering.

        Comment


        • #5
          Hey John,

          first of all: Thank you so much for your detailled answers. These are the best/most helpful comments I have received thus far reg. this question .

          You are one step ahead of me then - because your explanaition reg. the data (research setup) is spot on and absolutely correct! I am saying "one step ahead" because I unfortunately do not quite understand how to set up a (double)-hurdle model that is NOT based on truncated assumptions /Cragg model.

          I did read the paper you suggested (and yes the theoretical approach sounds very good), but I was not able to find a concrete path how to calculate/implement it in Stata. Is there any source that clearly differentiates between both types of Hurdle models (A) being the Craggit and B) being the one for our purpose that distinguishes excess zeros and sampling zeros)?

          I am sorry to ask another followup question but your advise is just fantastic (and very much appreciated)

          Kind regards
          Daniel





          Comment


          • #6
            Dan: I actually haven't used Stata for hurdle modeling, although it looks like it should be straightforward: http://www.stata.com/new-in-stata/hurdle-models/

            Comment


            • #7
              Hey John,

              thanks for the swift reply. DOnt want to take more from your time - but just to respond to that: Exactly, but all the examples are Cragg versions /zero-truncated ones. I could not find one which replicates the theoretical setup of the "Irishpaper"/my research setup.

              Hence, from a practical perspective it is not quite clear to me what the actual levers are to set up a Hurdle model in either way.

              Kind regards
              Daniel



              Comment


              • #8
                Dan: At this point it's above my pay grade. I guess I'd google around on "stata hurdle model estimation" or some such. Should that not turn up anything, then programming the likelihood function directly and using ml.... or some other Stata optimizer might be the best bet.

                Comment

                Working...
                X