fisher's exact test does not reproduce the results in the literature

Jasmine Xu

Join Date: Jul 2019

Posts: 33
#1

fisher's exact test does not reproduce the results in the literature

23 Dec 2019, 07:19

Hello,

I am trying to reproduce the results in the literature using fisher's exact test to compare the distribution of two independent samples.

There is data description:

Let's call the data from one paper 'sample one', and the data from another paper "sample two".
Both of the two papers are measuring the same thing.
In both of two samples, there are 6 types of subjects are identified: level 0, level 1, level 2, level 3, level 4, unidentified.
In sample one, there are 116 subjects, the proportions of types are 5.17%, 23.28%, 26.72%, 21.55%, 22.41%, 0.86% , respectively.
In sample two, there are 179 subjects, the proportions of types are 3.91%, 14.53%, 27.93%, 21.23%, 17.32%, 15.08% respectively.

The paper itself says "If the unidentified subjects are excluded, the Fisher's exact test comparing these two categorical distributions yields a p-value of 0.926, suggesting that they are statistically not different."

Thus, I assume that the Fisher's exact test will reject the null when unidentified subjects are included, which I am able to get, but I am not able to get "p-value of 0.926" to not reject the null excluding unidentified, so I am thinking the command I am using is not right.

Here is the code I am using:

Code:

set obs 179 gen jin = 0 in 1/7 replace jin = 1 in 8/33 replace jin = 2 in 34/83 replace jin = 3 in 84/121 replace jin = 4 in 122/152 replace jin = -1 in 153/179 //unidentified proportion jin gen k=-1 in 1 //unidentified replace k=0 in 2/7 replace k =1 in 8/34 replace k=2 in 35/65 replace k =3 in 66/90 replace k = 4 in 91/116 proportion k tabulate jin k , all exact //reject the null tabulate jin k if jin!=-1 & k != -1, all exact// reject

My question what the right way is to reproduce the results. And I am wondering if sample sized matter as if I don't using option -missing-, the table it produces look like the larger sample is truncated, and if for example, shuffle the data, the larger sample will be truncated in a different way. so should we account for missing values if two samples are not balanced?

I also tried other tests to compare two samples which give different results:

Code:

set obs 295 gen group = 1 in 1/179 replace group =0 in 180/295 gen jin_k=jin in 1/179 forvalues i = 1(1)116{ replace jin_k = k[`i'] if _n == `i'+179 } ranksum jin_k, by(group)//not reject at 5% median jin_k, by(group) exact//not reject ksmirnov jin_k, by(group) exact //not reject

Further, I just realised from this topic, that level 0, level 1, level 2, level 3, level 4 are likely to be ordered category. (I am not sure actually, the category in the paper is like education taking values of high school, undergraduate, postgraduate.) Thus I am wondering if it is indeed ordered category, then fisher's exact test is not appropriate, then what about other test I have used?

Finally, I have my own data measuring the same thing with 157 subjects. When comparing my sample to either sample one or two, I cannot reject the null using Fisher's exact test, but I can reject the null using all other tests -ranksum-, -median-, -ksmirnov-, and -ttest-. It seems that all these give different results from fisher exact test or chi square test, when either comparing sample one and two, or comparing my sample and sample one or two. I am really confused by those different results.

Thanks for any help!!
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

23 Dec 2019, 08:00

I do not see how Fisher's Exact Test can be used to compare two different populations. Certainly as used in the tabulate command it compares two different measures within the same population.
Comment

Dave Airey

Join Date: Apr 2014
Posts: 398

23 Dec 2019, 08:09

Notwithstanding William's comment, when I tried entering your data I got a p-value of 0.626...

Code:

clear
local study1 "5.17 23.28 26.72 21.55 22.41 0.86"
local study2 "3.91 14.53 27.93 21.23 17.32 15.08"
local study1_nv ""
foreach n of local study1 {
    local study1_n = round(116*`n'/100)
    local study1_nv = "`study1_nv'" + "`study1_n' "
}
local study2_nv ""
foreach n of local study2 {
    local study2_n = round(179*`n'/100)
    local study2_nv = "`study2_nv'" + "`study2_n' "
}
display "`study1_nv'"
display "`study2_nv'"
tabi 6 27 31 25 26 \ 7 26 50 38 31, exact

Comment

German Rodriguez

Join Date: Feb 2017

Posts: 169
#4

23 Dec 2019, 08:43

I can confirm the p-value of 0.626 that Dave Airey y reports. I used a similar approach to enter the data, but then reshaped long and used -tabulate- with frequency weights. I get the exact same result using R's fisher.test.

Regarding William Lisowski comment, turns out the important question is what you are conditioning on.

A two-way table may represent a cross-tabulation of two variables, in which case only the total is fixed, a multinomial distribution is appropriate, and one would usually test for independence.

It may also represent the distribution of one variable in two groups, in which case the appropriate model is a product binomial distribution and one would test for homogeneity.

But the chi-square tests of independence and homogeneity are exactly equivalent. Same test, different language.

Fisher's exact test, on the other hand, considers both margins fixed. The appropriate distribution in this case is hypergeometric. The test is really conditional on both margins fixed.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

23 Dec 2019, 09:12

German Rodriguez

Regarding my post #2, it was (too narrowly) based on on the post #1 presentation using tabulate twoway on a dataset of observations of two variables representing the same measurement in two populations, with unequal numbers of observations for the variables. I stated my concern too narrowly; reorganizing the data (as was done in post #1 for ksmirnov) would have solved the problem for tabulate twoway as well, as you correctly point out, as would using tabi to input just the margins, as Dave Airey demonstrated.
Comment
Jasmine Xu

Join Date: Jul 2019

Posts: 33
#6

23 Dec 2019, 09:55

Thanks for all the replies!

Just want to be clear: so what I need to do when using fisher exact test is to re-organise the data to have fixed margin like what post #3 and #4 did?

There is another comparison in the same literature:
Treatment 1 has 80 subjects, with the frequency of level 0 to 4 of 6, 16, 23, 19, 16
treatment 2 has 36 subjects with the frequency of level 0 to 4 of 1, 11, 8, 6, 10

I used the method in #3 and got p value of 0.486 which is different from p value of 0.58 in the paper.

Any ideas of different p values?
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#7

23 Dec 2019, 13:26

No, "fixed margins" is not something that has to do with how you organize your data in Stata. "Fixed margins" refers to part of the theoretical model on which Fisher's Exact Test is based. I also get p = 0.486 for the data you show (see below). Could be a mistake in the original paper, or could be something different about the particular program (not Stata?) used in that paper. Fisher's Exact Test gets quite complicated beyond a 2 X 2 table.

Code:

input group score freq 1 0 6 1 1 16 1 2 23 1 3 19 1 4 16 2 0 1 2 1 11 2 2 8 2 3 6 2 4 10 end expand freq tab2 score group, exact
1 like
Comment

Announcement

fisher's exact test does not reproduce the results in the literature

Comment

Comment

Comment

Comment

Comment

Comment