Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • r square

    Dear all,

    I checked if the cumulative distribution of a variable x is consistent with a power law or a log-normal distribution. The final result is presented in the figure below. Now I need to determine a R suqare measure of fit. How can I do this in Stata? I am using Stata 12.1

    Thanks in advance!

    Best regards,

    Liza Vieira



    Attached Files

  • #2
    I doubt that R-square is a good measure here. The correlation between observed and fitted quantiles is guaranteed to be very high, as monotonicity alone guarantees a high value. Besides correlation tells you about linearity, not agreement.

    If this were my problem, I would fit both distributions by maximum likelihood and that would be my starting point for assessing fit numerically.

    As it is, your graph implies that power law is a lousy fit and lognormal better. That's good to hear: power laws are vastly oversold and lognormals often neglected, so it matches my prejudices.

    Comment


    • #3
      Dear Nick Cox,

      thanks for your comments. I fitted both distributions by maximum likelihood ( I used the command lognfit and paretofit). How can I assess the fit numerically?Can you suggest some references?

      Best regards,

      Liza Vieira

      Comment


      • #4
        Liza,
        If you want to compare cumulative distribution functions, please look at the Kolmogorov-Smirnov tests (ksmirnov in Stata).

        Hope this helps,

        Comment


        • #5
          I don't think any kind of test matches what I assume to be the major research question.

          Furthermore, Kolmogorov-Smirnov is notoriously problematic when parameters are estimated from the data, precisely the case here (and usually!). What would the null hypothesis be?

          The two programs used to fit distributions are both based on maximum likelihood, so any intermediate mathematical statistics book focused on likelihood should give some ideas.

          More to the scientific point should be where each distribution fits well and where badly. Some ideas on how to show that at http://www.stata-journal.com/sjpdf.h...iclenum=gr0027

          Comment


          • #6
            Liza,

            Will you post the code you used to generate the empirical distribution function and then plot it along with the other two distribution functions?

            Thanks
            Richard

            Comment


            • #7
              Dear Richard Hofler,

              I used the following code to generate the cumulative distribution functions:

              Emprirical data :

              Code:
               cumul Staff, gen(cum)
              Log-normal fit:

              Code:
               lognfit  Staff, cdf (cdfname) pdf(pdfname)
              Power distribution:

              Code:
              paretofit Staff, cdf (cdfname) pdf(pdfname)
              Then I represented the cumulative distribution functions on the same plot, and I can easly see that the log-normal fit the best approximation. The problem is that one of the referee of the paper is asking for a measure of fit. I would like to know which measure of fit is the most suitable in this case.

              Best regards,

              Liza Vieira

              Comment

              Working...
              X