Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • interpretation of two sample Kolmogorov-Smirnov test results


    I would like to know the interpretation of two sample Kolmogorov-Smirnov test results.
    I have two sample groups- 0 and 1. I entered Stata command,

    ksmirnov lpcry if a1==2006, by(group1)

    The test results are composed of three lines.

    Smaller Group D P-value Corrected
    0: 0.1385 0.000
    1: -0.0041 0.955
    Combined K-S 0.1385 0.000 0.000

    I read the explanation of KS test in Stata manual. In this manual the first line is to test whether the first group(group 0) is smaller than the second group(group 1). The second line is to test whether the first group is bigger than the second group.

    The problem is what is the null hypothesis in the first, second, and third line.
    Is it the null hypothesis in the first line that the first group(group 0) is smaller than the second group(group1)? And is it the null hypothesis in the second line that the first group(group 0) is bigger than the second group(group 1)?

    The test results above is that we can not accept the first null hypothesis and that we can accept the second null hypothesis. Is it right? If so, the second group(group 1) dominates first order stochastically over the first group(group 0)?

    In addition, what is the null hypothesis in the third line?

    Please explain to me!

    Jaimin Lee

  • #2
    The null hypothesis for the final line is that the distributions are equal. It is a two-tailed test; the first two lines are the two uni-directional one-tailed test.

    And, the absence of a statistically significant p-value does not endorse the null hypothesis. That can be done only if you have first ascertained you have enough power.

    Comment


    • #3
      I'm going to be more of a purist here: part of the DEFINITION of a p-value is "assuming that the null hypothesis is true" and thus it makes no sense to talk about "accepting" the null hypothesis; you can reject it or fail to reject it

      Comment


      • #4
        Dear Clyde and Rich

        Thank you for your kind comments.

        However, what are the nul hypotheses of the first and second lines in the test?
        As Clyde told me, the null hypothesis of the last line is that two distribution is equal. In my test, p-value is almost 0.
        We can rejet the null hypothesis, so these two distributions are not equal.
        I appreciate Clyde!

        The first and second line tests are uni-directional and one-tailed test.
        What are the null hypotheses in these tests?
        The first line test has low p-values but the second line test has very big p-values.
        So we can reject the null hypothesis in the first line test but can not reject the null hypothesis in the second line test.

        Is it the null hypothesis in the first test that the first group(group 0 in my test) is smaller group than the second group(group 1 in my test)?
        Is it the null hypothesis in the second test that the second group(group 1 in my test) is smaller group than the first group(group 0 in my test)?

        Please explain to me!

        I will appreciate your quick response.

        Jaimin

        Comment


        • #5
          Hello Jaimin,

          Welcome to the Stata Forum.

          With regards to several of your questions, you can reap the "quick response" you wish, just by taking a look at the manual (http://www.stata.com/manuals14/rksmirnov.pdf).

          Below, you may read an extract which applies to your needs:

          We wish to use the two-sample Kolmogorov –Smirnov test to determine if there are any differences in the distribution of x for these two groups: . ksmirnov x, by(group) Two-sample Kolmogorov-Smirnov test for equality of distribution functions

          Smaller group D P-value
          1: 0.5000 0.424
          2: -0.1667 0.909
          Combined K-S: 0.5000 0.785

          The first line tests the hypothesis that x for group 1 contains smaller values than for group 2. The largest difference between the distribution functions is 0.5. The approximate asymptotic p-value for this is 0.424, which is not significant. The second line tests the hypothesis that x for group 1 contains larger values than for group 2. The largest difference between the distribution functions in this direction is 0.1667. The approximate asymptotic p-value for this small difference is 0.909. Finally, the approximate asymptotic p-value for the combined test is 0.785. The approximate p-values ksmirnov calculates are based on the five-term approximation of the asymptotic distributions derived by Smirnov (1933). These approximations are not good for small samples (n < 50). They are too conservative.
          Hopefully that helps.
          Best regards,

          Marcos

          Comment


          • #6
            Dear Marcos,

            Thank you for your explanation!

            Jaimin

            Comment


            • #7
              Orthogonally to this, these tests appear fairly useless in practice. Either there's a significant difference in which case it's vital to move to establish what it is, or there isn't, in which case it's likely that the sample size is not large enough. I exaggerate slightly, but only slightly.

              I'd proceed to looking at distributions directly.

              Comment


              • #8
                I absolutely agree with the issue over such tests, Nick. Also, with the need to underline the distribution graphically. Anyway, when demanded (sometimes, hard-pressed), I tend to stick to Shapiro-Francia's instead of Kolmogorov-Smirnov's.
                Best regards,

                Marcos

                Comment


                • #9
                  Originally posted by Marcos Almeida View Post
                  Hello Jaimin,

                  Welcome to the Stata Forum.

                  With regards to several of your questions, you can reap the "quick response" you wish, just by taking a look at the manual (http://www.stata.com/manuals14/rksmirnov.pdf).

                  Below, you may read an extract which applies to your needs:



                  Hopefully that helps.
                  What if you get small p-value in both cases? (or this scenario can not happen?) I am a bit confused with this test now.

                  Comment


                  • #10
                    Two-sample Kolmogorov-Smirnov test for equality of distribution functions

                    Smaller group D P-value
                    -----------------------------------
                    1: 0.0148 0.003
                    2: -0.0155 0.002
                    Combined K-S: 0.0155 0.004

                    Comment


                    • #11
                      You must read the manual, otherwise you'd risk asking about issues well clarified.

                      Please also read the FAQ, particulary the topic about sharing command/data/output.

                      Also, please read this FAQ advice:

                      3. What should I do before I post?

                      Before posting, consider other ways of finding information:
                      • the online help for Stata
                      • Stata's search command, which can tell you about all built-in Stata commands, all ado-files published in the Stata Journal, all FAQs on the Stata website, www.stata.com, and user-written Stata programs available on the Internet (if you have Stata 12 or earlier, you can use findit to search all these sources at once)
                      • the manuals, accessible in .pdf form to all
                      • [...]
                      That being said, you don't need to be confused at all, for the very first example of - ksmirnov - command in the manual fully clarifies the interpretation of the 3 p-values.

                      If this is not enough, in #5 there is a full explanation about this matter.

                      Hopefully that helps.
                      Last edited by Marcos Almeida; 23 Nov 2018, 17:05.
                      Best regards,

                      Marcos

                      Comment


                      • #12
                        Hello Marcos,
                        Thanks for your reply. I actually read the manual and the post #5. Here is the question again: What if you get small p-value in both cases? (or this scenario can not happen?) See the below result again: Two-sample Kolmogorov-Smirnov test for equality of distribution functions

                        Smaller group D P-value
                        -----------------------------------
                        1: 0.0148 0.003
                        2: -0.0155 0.002
                        Combined K-S: 0.0155 0.004
                        I think, the manual does not state the case like this. if so, please let me know. Thanks

                        Comment


                        • #13
                          Surely the Manual explains:

                          We wish to use the two-sample Kolmogorov –Smirnov test to determine if there are any differences in the distribution of x for these two groups. [...] The first line tests the hypothesis that x for group 1 contains smaller values than for group 2. [...] The approximate asymptotic p-value for this is 0.424, which is not significant. The second line tests the hypothesis that x for group 1 contains larger values than for group 2. [...]. The approximate asymptotic p-value for this small difference is 0.909. Finally, the approximate asymptotic p-value for the combined test is 0.785. The approximate p-values ksmirnov calculates are based on the five-term approximation of the asymptotic distributions derived by Smirnov (1933).
                          Best regards,

                          Marcos

                          Comment


                          • #14
                            Nothing will be as informative here as a plot of your two distribution functions -- or in my view preferably the quantile functions. Can you post the data? What is the sample size?

                            Comment


                            • #15
                              Considering we get p < 0.05 for all three lines, I believe the interpreation is as follows.

                              The first line tests the hypothesis that the variable X for group 1 "contains smaller values" (nota bene: it is not saying "is smaller", as you remarked in #1) than for group 2, hence there is a between-group difference in the distribution concerning this aspect. The second line does this test for larger values, hence there is a between-group difference in the distribuition concerning this aspect as well.. The third line concerns the combined test. The p-value for the combined test is statistically significant. Statiscally speaking, if we adopt this strategy (and not the graphical approach, pointed by Nick in #14, which seems to be more informative), we can say the distribution of the variable X failed to present a "normal" pattern.

                              Hopefully that helps.
                              Best regards,

                              Marcos

                              Comment

                              Working...
                              X