Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assessing the Impact of too high Zero-Inflation on PPML Model

    Dear Statalist community,

    I am currently working on a research project that involves estimating a Poisson Pseudo-Maximum Likelihood (PPML) model in Stata. My dataset exhibits a high proportion of zero values in the dependent variable -- as it is in a dyadic format, origin-to-destination firm migration.

    Is there a rule of thumb or guideline for assessing the suitability of using the PPML model when dealing with high proportions of zeros in the dependent variable? What factors should I consider when evaluating the impact of zero inflation on the model's performance? For instance, 90% of zeros are considered too high? 99% of zeros cause severe problems?

    I am already aware that PPML is consistent and well-behaved with many zeros, but how big is "many"?

    Thank you for your help and insights.

    Kind Regards,
    Mauricio.

  • #2
    Do you think that there are two distinct populations within the zeros?

    Otherwise, the short answer is no, that is not a problem because these observations will not greatly influence the scores. The reference is Santos Silva and Tenreyro (2022).

    Comment


    • #3
      Hi Maxence,

      Thank you for your help.

      When you mention 'two distinct populations,' are you referring to two different Data Generating Processes (DGPs)?

      If you don't mind, could you provide more details regarding how having 90% to 99% of zeros is not a problem? I did read Santos Silva and Tenreyro (2020) as you mentioned, but the authors also refer to the paper Santos Silva and Tenreyro (2011a) titled 'Further simulation...'. If I'm not mistaken, the simulation results with the highest percentages of zero observations are set at 83%. Is that correct?

      Perhaps Professor Joao Santos Silva could offer some guidance here?

      Thank you in advance!

      Comment


      • #4
        Dear Mauricio Carvalho,

        Following up on Maxence helpful advice, notice that a Poisson distribution is compatible with any proportion of zeros. So, the percentage of zeros, cannot be used to choose between PPML and ZIP. As Maxence noted, a ZI model only makes sense if for some sub-population the dependent variable is equal to zero with probability 1, for any value of the covariates. For example, if you are modelling how many times someone eats meet in a week, you may want to use a ZI model to account for vegetarians. In your context, it is difficult to justify using a ZI model unless it is impossible to migrate from some origins to some destinations.

        Moreover, ZI models have other issues: they are less robust than other approaches, are affected by the incidental parameter problem, and should not be used if the data are not counts, but these may not be relevant in your application.

        Best wishes,

        Joao

        Comment


        • #5
          Dear Professor Santos Silva,

          Thank you very much for your help!

          Kind regards,
          Mauricio.

          Comment

          Working...
          X