Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing for a uniform distribution

    Hi,

    I have a bar chart of some data which clearly visually demonstrates that the data is not uniform.

    Is there a statistical test within Stata that can show this?

    I have tried the following:

    "ksmirnov x = runiform()"

    However, I do not think this is correct.

  • #2
    I'd use

    Code:
    quantile
    for this purpose unless someone in power obliges you to wave a P-value back at them.

    Comment


    • #3
      Thanks Nick. Quantile seems the easiest method to show the non-uniformity. However, you're correct - ideally I need a p-value.

      Comment


      • #4
        You need ksmirnov x = x.

        https://www.statalist.org/forums/for...-ksmirnov-test

        Comment


        • #5
          Hi Dave, thanks for the response. Apologies but I still don't quite understand. The variable I am trying to for its uniformity is "diceroll", which is a standard dice between 1 -6 (where the proportion of 5's and 6's is far higher than 1's and 2's, and so is quite clearly non-uniform).

          What would the command be here?

          Comment


          • #6
            I think in statistical circles most people would read "uniform" as implying "continuous uniform": the qualifier "discrete" is thus essential in your case.

            Kolmogorov-Smirnov isn't (especially) appropriate for discrete uniforms. (It seems vastly oversold to me in any case, but we won't go there.)

            A chi-square test on the frequencies should satisfy most devotees of the P-value here.

            Oddly official Stata seems to fall short in this territory, but community-contributed efforts can be found. Consider

            Code:
            . clear
            
            . set obs 100
            number of observations (_N) was 0, now 100
            
            . gen y = runiformint(1, 6)
            
            . tab y
            
                      y |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      1 |         19       19.00       19.00
                      2 |         18       18.00       37.00
                      3 |         19       19.00       56.00
                      4 |         13       13.00       69.00
                      5 |         10       10.00       79.00
                      6 |         21       21.00      100.00
            ------------+-----------------------------------
                  Total |        100      100.00
            
            . chitest y, count sep(0)
            
            observed frequencies of y; expected frequencies equal
            
                     Pearson chi2(5) =   5.3600   Pr =  0.374
            likelihood-ratio chi2(5) =   5.7589   Pr =  0.330
            
              +-----------------------------------------------+
              | y   observed   expected   obs - exp   Pearson |
              |-----------------------------------------------|
              | 1         19     16.667       2.333     0.572 |
              | 2         18     16.667       1.333     0.327 |
              | 3         19     16.667       2.333     0.572 |
              | 4         13     16.667      -3.667    -0.898 |
              | 5         10     16.667      -6.667    -1.633 |
              | 6         21     16.667       4.333     1.061 |
              +-----------------------------------------------+
            where chitest can be used after

            Code:
            ssc inst tab_chi

            Comment


            • #7
              Oops, yes thanks Nick.

              Dave

              Comment


              • #8
                Pedantry corner. One die, two dice.

                Comment


                • #9
                  Thanks Nick & Dave. Utilised the 'chitest', and inevitably got the needed p-value.

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    I think in statistical circles most people would read "uniform" as implying "continuous uniform": the qualifier "discrete" is thus essential in your case.

                    Kolmogorov-Smirnov isn't (especially) appropriate for discrete uniforms. (It seems vastly oversold to me in any case, but we won't go there.)

                    A chi-square test on the frequencies should satisfy most devotees of the P-value here.

                    Oddly official Stata seems to fall short in this territory, but community-contributed efforts can be found. Consider

                    Code:
                    . clear
                    
                    . set obs 100
                    number of observations (_N) was 0, now 100
                    
                    . gen y = runiformint(1, 6)
                    
                    . tab y
                    
                    y | Freq. Percent Cum.
                    ------------+-----------------------------------
                    1 | 19 19.00 19.00
                    2 | 18 18.00 37.00
                    3 | 19 19.00 56.00
                    4 | 13 13.00 69.00
                    5 | 10 10.00 79.00
                    6 | 21 21.00 100.00
                    ------------+-----------------------------------
                    Total | 100 100.00
                    
                    . chitest y, count sep(0)
                    
                    observed frequencies of y; expected frequencies equal
                    
                    Pearson chi2(5) = 5.3600 Pr = 0.374
                    likelihood-ratio chi2(5) = 5.7589 Pr = 0.330
                    
                    +-----------------------------------------------+
                    | y observed expected obs - exp Pearson |
                    |-----------------------------------------------|
                    | 1 19 16.667 2.333 0.572 |
                    | 2 18 16.667 1.333 0.327 |
                    | 3 19 16.667 2.333 0.572 |
                    | 4 13 16.667 -3.667 -0.898 |
                    | 5 10 16.667 -6.667 -1.633 |
                    | 6 21 16.667 4.333 1.061 |
                    +-----------------------------------------------+
                    where chitest can be used after

                    Code:
                    ssc inst tab_chi
                    Dear Nick,

                    Is it possible to define "expected" values not equally but pre-defined in your example? Thanks!

                    Regards,
                    Ayaz

                    Comment


                    • #11
                      Code:
                      help chitest 
                      makes explicit that you can and gives a detailed worked example. See also chitesti, tabchi and tabchii in the same package.

                      Comment


                      • #12
                        I'm looking for a statistic that would quantify the goodness of fit to the discrete uniform in a way that is independent of sample size. The chi-square test for uniformity will reject the uniform given small departures in large samples, but accept the uniform given large departures in small sample.

                        A couple of possibilities come to mind:
                        (1) Divide the chi-square statistic by N.
                        (2) See if the mean and SD of the empirical distribution are close to those predicted for the discrete uniform.

                        But maybe there are better options....

                        Comment


                        • #13
                          Maybe a *IC measure?
                          ---------------------------------
                          Maarten L. Buis
                          University of Konstanz
                          Department of history and sociology
                          box 40
                          78457 Konstanz
                          Germany
                          http://www.maartenbuis.nl
                          ---------------------------------

                          Comment


                          • #14
                            There is a published "effect size" measure for the chi-squared goodness of fit test, and I believe this fits what is wanted here:

                            Johnston, J.E., Berry, K.J. and Mielke Jr, P.W., 2006. Measures of effect size for chi-squared and likelihood-ratio goodness-of-fit tests. Perceptual and motor skills, 103(2), pp.412-414.

                            Comment


                            • #15
                              Another thing that comes to my mind is mgof, an ado by Ben Jann. Never used it myself but it has many options, maybe it is helpful, see https://journals.sagepub.com/doi/10....867X0800800201
                              Best wishes

                              (Stata 16.1 MP)

                              Comment

                              Working...
                              X