interpretation of two sample Kolmogorov-Smirnov test results

Million Tadesse

Join Date: Oct 2018

Posts: 13
#16

27 Nov 2018, 16:56

Hi Marcos and Nick,
Thank you very much for your responses. The number of observation is 128,745 (relatively big sample and hard to publish due to confidentiality issues also,sorry). The kdensity graph produced looks ok (pasted as new post due to space limitation) but looks like it does not have big difference (between the two groups) in terms of its visual pattern. I pasted the table with its note below. It says "ties exist in combined dataset....". From the third line (combined test), it looks like the two groups differ as you rightly stated but a significant p-value, both in (line 1 and 2) was not clear for me since we are talking about the same group (I guess). That means, if we reject the null in line 1 for group 1 being smaller than group 2, it seems not correct to reject the null in line 2 for group 1 larger than group 2. Sorry if I get lost with this test principle.

Two-sample Kolmogorov-Smirnov test for equality of distribution functions

Smaller group D P-value
-----------------------------------
1: 0.0148 0.003
2: -0.0155 0.002
Combined K-S: 0.0155 0.004

Note: Ties exist in combined dataset;
there are 15106 unique values out of 128745 observations.
Comment
Million Tadesse

Join Date: Oct 2018

Posts: 13
#17

27 Nov 2018, 17:12

Here is the graph as attachment. I have to avoid legends but assume group 1 as in red color.
Attached Files

kdensity.pdf (177.8 KB, 1 view)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#18

27 Nov 2018, 19:40

With a large sample size, it is well known that such tests will potentially provide a "significant p-value", yet the pattern of distribution is quite normal, so to speak. That is the reason, I guess, for Nick having asked you to present a graphical display of the data which, by the way, wouldn't pose any threats to confidentiality issues, since we'd have just a curve of, say, X variable.

The graph you shared as a pdf (please do read the FAQ, please do take some time to learn the best approach to share graphs in this forum, thanks) shows exactly this.

With regards to the 3 p-values, as I explained, it is not saying the one group is bigger than other, but has smaller|bigger values if compared to the other. I really fail to envisage why you think this is a contradiction. Maybe the reason is that you keep thinking the hypothesis underlines one group being smaller than the other, whereas the correct term was explained a couple of times.

Best regards,

Marcos
Comment
Million Tadesse

Join Date: Oct 2018

Posts: 13
#19

28 Nov 2018, 15:59

Thanks a lot Marcos. I will correct my posts with attachment next time. Thanks. In between, when I say group 1 smaller than group 2, I mean the value in group is smaller (for line 1 test) /bigger (line 2 test) than the value in group 2 . That means we are in the same page (I think).

Last edited by Million Tadesse; 28 Nov 2018, 16:10.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#20

28 Nov 2018, 19:05

I am afraid we may not yet be "in the same page".

When it is said, quite appropriately, that "group 1 contains smaller values than for group 2", as we read in the Stata manual, we shall neither necessarily translate it as group 1 being smaller than group 2 nor "the value (what value?) in group" 1 being smaller in group 2,

Last edited by Marcos Almeida; 28 Nov 2018, 19:20.

Best regards,

Marcos
Comment

Announcement

Comment

Comment

Comment

Comment

Comment