I am interested in comparing the distribution of some observed counts and its theoretical distribution. One popular law is Benford's and Ben Jann wrote a Stata paper along with the command
mgof at http://www.stata-journal.com/sjpdf.h...iclenum=st0142
I am not sure if it is appropriate to post here but I would like to check if my understanding of the paper is right.
First, I was unable to get the digits.dta mentioned in the paper, following:
Second, using the consequent example using
The result is as follows (page 153)
I was trying to manually calculate the Pearson's X2 reported by mgof using the formula in page 152 as below:
But the manual calculation does not yield the same result as the reported one (2.000 compared with 6.2266). What I do is I take the observed frequency as f_j and the expected frequency as h_j, then use the provided formula.
This formula works fine with normal distribution and all examples listed in help mgof but somehow it does not work the example provided in this paper.
Have I done anything wrong?
My motivation for the manual calculation is that I feel visual graph and tests provided mgof regarding Benford's distribution are not inline. Using digdis another command following mgof, I produced this table of results:
I look at the observed and expected frequency and reckon that they are pretty close, in fact the MAD are small too. The produced graph also shows a resemblance between the observed and expected frequency. I expect the Pearson's X2 should not reject the null hypothesis that the data do follow Benford's distribution.
Yet, the reported Pearson's X2 is huge and the respective p-value is insignificant.
If I apply the formula as in the Stata article, I couldn't get the same reported Pearson's X2.
My questions are:
1. Can anyone please explain to me how the reported Pearson's X2 is calculated in mgof
2. What should I conclude from my dataset? From the graph, it is clear that the observed frequency follows the theoretical distribution, yet the Pearson's X2 concludes in the opposite way.
Many thanks in advance.
mgof at http://www.stata-journal.com/sjpdf.h...iclenum=st0142
I am not sure if it is appropriate to post here but I would like to check if my understanding of the paper is right.
First, I was unable to get the digits.dta mentioned in the paper, following:
Code:
mata mata mlib index
use digits, clear
Code:
mgof firstdigit = log10(1+1/firstdigit), cr percent
I was trying to manually calculate the Pearson's X2 reported by mgof using the formula in page 152 as below:
But the manual calculation does not yield the same result as the reported one (2.000 compared with 6.2266). What I do is I take the observed frequency as f_j and the expected frequency as h_j, then use the provided formula.
This formula works fine with normal distribution and all examples listed in help mgof but somehow it does not work the example provided in this paper.
Have I done anything wrong?
My motivation for the manual calculation is that I feel visual graph and tests provided mgof regarding Benford's distribution are not inline. Using digdis another command following mgof, I produced this table of results:
I look at the observed and expected frequency and reckon that they are pretty close, in fact the MAD are small too. The produced graph also shows a resemblance between the observed and expected frequency. I expect the Pearson's X2 should not reject the null hypothesis that the data do follow Benford's distribution.
Yet, the reported Pearson's X2 is huge and the respective p-value is insignificant.
If I apply the formula as in the Stata article, I couldn't get the same reported Pearson's X2.
My questions are:
1. Can anyone please explain to me how the reported Pearson's X2 is calculated in mgof
2. What should I conclude from my dataset? From the graph, it is clear that the observed frequency follows the theoretical distribution, yet the Pearson's X2 concludes in the opposite way.
Many thanks in advance.
Comment