r square

Liza

Join Date: Nov 2014

Posts: 32
#1

r square

29 Sep 2015, 11:01

Dear all,

I checked if the cumulative distribution of a variable x is consistent with a power law or a log-normal distribution. The final result is presented in the figure below. Now I need to determine a R suqare measure of fit. How can I do this in Stata? I am using Stata 12.1

Thanks in advance!

Best regards,

Liza Vieira

Attached Files

Graph_Cumulative Distribution.gph (25.0 KB, 1 view)
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35642
#2

29 Sep 2015, 11:35

I doubt that R-square is a good measure here. The correlation between observed and fitted quantiles is guaranteed to be very high, as monotonicity alone guarantees a high value. Besides correlation tells you about linearity, not agreement.

If this were my problem, I would fit both distributions by maximum likelihood and that would be my starting point for assessing fit numerically.

As it is, your graph implies that power law is a lousy fit and lognormal better. That's good to hear: power laws are vastly oversold and lognormals often neglected, so it matches my prejudices.
Comment
Liza

Join Date: Nov 2014

Posts: 32
#3

30 Sep 2015, 03:05

Dear Nick Cox,

thanks for your comments. I fitted both distributions by maximum likelihood ( I used the command lognfit and paretofit). How can I assess the fit numerically?Can you suggest some references?

Best regards,

Liza Vieira
Comment
Charlie Joyez

Join Date: Dec 2014

Posts: 418
#4

30 Sep 2015, 06:05

Liza,
If you want to compare cumulative distribution functions, please look at the Kolmogorov-Smirnov tests (ksmirnov in Stata).

Hope this helps,
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35642
#5

30 Sep 2015, 06:23

I don't think any kind of test matches what I assume to be the major research question.

Furthermore, Kolmogorov-Smirnov is notoriously problematic when parameters are estimated from the data, precisely the case here (and usually!). What would the null hypothesis be?

The two programs used to fit distributions are both based on maximum likelihood, so any intermediate mathematical statistics book focused on likelihood should give some ideas.

More to the scientific point should be where each distribution fits well and where badly. Some ideas on how to show that at http://www.stata-journal.com/sjpdf.h...iclenum=gr0027
Comment
Richard Hofler

Join Date: Apr 2014

Posts: 12
#6

01 Oct 2015, 07:24

Liza,

Will you post the code you used to generate the empirical distribution function and then plot it along with the other two distribution functions?

Thanks
Richard
Comment
Liza

Join Date: Nov 2014

Posts: 32
#7

06 Oct 2015, 04:11

Dear Richard Hofler,

I used the following code to generate the cumulative distribution functions:

Emprirical data :

Code:

cumul Staff, gen(cum)

Log-normal fit:

Code:

lognfit Staff, cdf (cdfname) pdf(pdfname)

Power distribution:

Code:

paretofit Staff, cdf (cdfname) pdf(pdfname)

Then I represented the cumulative distribution functions on the same plot, and I can easly see that the log-normal fit the best approximation. The problem is that one of the referee of the paper is asking for a measure of fit. I would like to know which measure of fit is the most suitable in this case.

Best regards,

Liza Vieira
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment