Testing for a uniform distribution

harry smith

Join Date: Dec 2016

Posts: 11
#1

Testing for a uniform distribution

01 Dec 2017, 06:33

Hi,

I have a bar chart of some data which clearly visually demonstrates that the data is not uniform.

Is there a statistical test within Stata that can show this?

I have tried the following:

"ksmirnov x = runiform()"

However, I do not think this is correct.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35720
#2

01 Dec 2017, 06:40

I'd use

Code:

quantile

for this purpose unless someone in power obliges you to wave a P-value back at them.
2 likes
Comment
harry smith

Join Date: Dec 2016

Posts: 11
#3

01 Dec 2017, 09:21

Thanks Nick. Quantile seems the easiest method to show the non-uniformity. However, you're correct - ideally I need a p-value.
Comment
Dave Airey

Join Date: Apr 2014

Posts: 398
#4

01 Dec 2017, 09:59

You need ksmirnov x = x.

https://www.statalist.org/forums/for...-ksmirnov-test
1 like
Comment
harry smith

Join Date: Dec 2016

Posts: 11
#5

01 Dec 2017, 10:26

Hi Dave, thanks for the response. Apologies but I still don't quite understand. The variable I am trying to for its uniformity is "diceroll", which is a standard dice between 1 -6 (where the proportion of 5's and 6's is far higher than 1's and 2's, and so is quite clearly non-uniform).

What would the command be here?
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35720

01 Dec 2017, 10:54

I think in statistical circles most people would read "uniform" as implying "continuous uniform": the qualifier "discrete" is thus essential in your case.

Kolmogorov-Smirnov isn't (especially) appropriate for discrete uniforms. (It seems vastly oversold to me in any case, but we won't go there.)

A chi-square test on the frequencies should satisfy most devotees of the P-value here.

Oddly official Stata seems to fall short in this territory, but community-contributed efforts can be found. Consider

Code:

. clear

. set obs 100
number of observations (_N) was 0, now 100

. gen y = runiformint(1, 6)

. tab y

          y |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         19       19.00       19.00
          2 |         18       18.00       37.00
          3 |         19       19.00       56.00
          4 |         13       13.00       69.00
          5 |         10       10.00       79.00
          6 |         21       21.00      100.00
------------+-----------------------------------
      Total |        100      100.00

. chitest y, count sep(0)

observed frequencies of y; expected frequencies equal

         Pearson chi2(5) =   5.3600   Pr =  0.374
likelihood-ratio chi2(5) =   5.7589   Pr =  0.330

  +-----------------------------------------------+
  | y   observed   expected   obs - exp   Pearson |
  |-----------------------------------------------|
  | 1         19     16.667       2.333     0.572 |
  | 2         18     16.667       1.333     0.327 |
  | 3         19     16.667       2.333     0.572 |
  | 4         13     16.667      -3.667    -0.898 |
  | 5         10     16.667      -6.667    -1.633 |
  | 6         21     16.667       4.333     1.061 |
  +-----------------------------------------------+

where chitest can be used after

Code:

ssc inst tab_chi

Comment

Dave Airey

Join Date: Apr 2014

Posts: 398
#7

01 Dec 2017, 11:18

Oops, yes thanks Nick.

Dave
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35720
#8

01 Dec 2017, 11:24

Pedantry corner. One die, two dice.
1 like
Comment
harry smith

Join Date: Dec 2016

Posts: 11
#9

01 Dec 2017, 11:24

Thanks Nick & Dave. Utilised the 'chitest', and inevitably got the needed p-value.
Comment

Ayaz Zeynalov

Join Date: Aug 2018
Posts: 1

#10

01 Aug 2018, 04:20

Originally posted by Nick Cox View Post

Code:

. clear

. set obs 100
number of observations (_N) was 0, now 100

. gen y = runiformint(1, 6)

. tab y

y | Freq. Percent Cum.
------------+-----------------------------------
1 | 19 19.00 19.00
2 | 18 18.00 37.00
3 | 19 19.00 56.00
4 | 13 13.00 69.00
5 | 10 10.00 79.00
6 | 21 21.00 100.00
------------+-----------------------------------
Total | 100 100.00

. chitest y, count sep(0)

observed frequencies of y; expected frequencies equal

Pearson chi2(5) = 5.3600 Pr = 0.374
likelihood-ratio chi2(5) = 5.7589 Pr = 0.330

+-----------------------------------------------+
| y observed expected obs - exp Pearson |
|-----------------------------------------------|
| 1 19 16.667 2.333 0.572 |
| 2 18 16.667 1.333 0.327 |
| 3 19 16.667 2.333 0.572 |
| 4 13 16.667 -3.667 -0.898 |
| 5 10 16.667 -6.667 -1.633 |
| 6 21 16.667 4.333 1.061 |
+-----------------------------------------------+

where chitest can be used after

Code:

ssc inst tab_chi

Dear Nick,

Is it possible to define "expected" values not equally but pre-defined in your example? Thanks!

Regards,
Ayaz

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35720
#11

01 Aug 2018, 04:26

Code:

help chitest

makes explicit that you can and gives a detailed worked example. See also chitesti, tabchi and tabchii in the same package.
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 502
#12

27 May 2021, 11:20

I'm looking for a statistic that would quantify the goodness of fit to the discrete uniform in a way that is independent of sample size. The chi-square test for uniformity will reject the uniform given small departures in large samples, but accept the uniform given large departures in small sample.

A couple of possibilities come to mind:
(1) Divide the chi-square statistic by N.
(2) See if the mean and SD of the empirical distribution are close to those predicted for the discrete uniform.

But maybe there are better options....
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#13

27 May 2021, 15:08

Maybe a *IC measure?

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#14

28 May 2021, 12:31

There is a published "effect size" measure for the chi-squared goodness of fit test, and I believe this fits what is wanted here:

Johnston, J.E., Berry, K.J. and Mielke Jr, P.W., 2006. Measures of effect size for chi-squared and likelihood-ratio goodness-of-fit tests. Perceptual and motor skills, 103(2), pp.412-414.
1 like
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 702
#15

29 May 2021, 03:43

Another thing that comes to my mind is mgof, an ado by Ben Jann. Never used it myself but it has many options, maybe it is helpful, see https://journals.sagepub.com/doi/10....867X0800800201

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment

Announcement