Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low value of estimated Pareto shape

    Dear STATA community,

    I am trying to estimate the parameters of a productivity distribution. My prior is it follows a Pareto distribution with a shape parameter higher than 1 (so that the mean exists). However, when I use the Paretofit command, it yields a low Pareto shape (around 0.3). I was wondering what the problem is. I realized that when I choose a high scale parameter, the shape parameter comes closer to 1, but that requires me to eliminate a large amount of my observations. Any suggestions are greatly appreciated.
    Thanks,

    Tuan Luong

  • #2
    You should undoubtedly plot the data and the fitted distribution to see why you get puzzling results: the best kind of plot in my view is a quantile-quantile plot.

    The paretofit program (SSC; see http://www.statalist.org/forums/help#stata on giving references) does not come with any dedicated plotting programs, but the principles are explained in e.g. http://www.stata-journal.com/sjpdf.h...iclenum=gr0027

    In my experience

    1. The Pareto is a tricky distribution to fit.

    2. The applicability of the Pareto is wildly exaggerated by those who prefer to think of it as some kind of universal distribution. Otherwise put, it is often fitted to qualitatively different right-skewed distributions with a mode greater than the minimum. The parameter estimates may well be surprising in such cases.

    3. Omitting a subset to ensure that the distribution fits is a dubious undertaking. It is better to find a more appropriate distribution.

    See also http://www.statalist.org/forums/help#spelling
    Last edited by Nick Cox; 15 Oct 2015, 02:07.

    Comment


    • #3
      I second Nick's remarks (nothing new there). The Pareto distribution is mainly used to describe the right tails of skewed distributions, I believe, because it is "useful" rather than because it provides a very good fit. In addition to the diagnostics suggested by Nick, you could plot the log of the survivor curve against the log of "productivity" -- the so-called Zipf plot -- and see at which "productivity" threshold (if any) the curve is linear. A nice article relevant to this topic is: "Are your data really Pareto distributed?" by Pasquale Cirillo, Physica A, 392 (2013), 5947-5962.

      Comment

      Working...
      X