Mann Whitney Rank sum test

harry smith

Join Date: Dec 2016

Posts: 11
#1

Mann Whitney Rank sum test

04 Dec 2017, 08:13

Hi, I just wanted clarification if what I am doing is indeed correct.

I have experimental data. The thing of interest is whether the treatment has had an affect on the total number of safe choices a person chooses. Thus I have a 2 by 2 table:

Under the treatment label: 0 is the control group, 1 being the treatment group. Under the "holt5" 0 being the total number of non-safe choices. 1 being the total number of safe choices.

I can tell they are similar by the percentages (both approximately 50-55%), however I need a p-value.

I decided to use the Mann Whitney Rank sum test, with the command something along the lines of:

ranksum holt5, by(treatment)

Just wondering if this is the correct procedure to utilised to satisfy the need for a p-value?

Last edited by harry smith; 04 Dec 2017, 08:44.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#2

04 Dec 2017, 08:33

A more conventional approach to this would be -tab choice_safety group, chi2- (substitute the real names of your variables, evidently.)
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35730
#3

04 Dec 2017, 08:34

I'd use Fisher's exact test here.
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#4

04 Dec 2017, 08:50

notwithstanding the answers in #2 and #3 (which are sensible given the table you provide), I wonder if this is your final table given the following part of your text:

Under the "holt5" 0 being the total number of non-safe choices. 1 being the total number of safe choices.

I find your text completely confusing here; is it the case that there are multiple observations per person? or is it the case that there will eventually be actual counts? or ...? if your quote actually means what it says, I would not use either chi-squared or Fisher as they expect independent data and that is not what you appear to have; if your table is correct and your language is not, then fine
Comment
harry smith

Join Date: Dec 2016

Posts: 11
#5

04 Dec 2017, 08:56

Thanks Clyde & Nick (again)! Rich, I have two independent samples (between subject design). Holt5 is an option between two lotteries. Each subject chooses between a safe option (1) and a risky option (0). So I calculated the total number of safe/risky options in each experiment and then am trying to work out whether the treatment is having an affect on the safe/risky options they are selecting.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1134
#6

04 Dec 2017, 09:33

Originally posted by Nick Cox View Post

I'd use Fisher's exact test here.

Why Fisher's exact test, Nick? The minimum expected frequency for that table is 10.5, so the Chi-square approximation should be quite good. Also, FET is notoriously conservative.

If the expected counts were too low for Pearson's Chi-square, I would use the N-1 Chi-square (if all expected counts were equal to or greater than 1). Anyone unfamiliar with the N-1 Chi-square can find more info here:
http://www.iancampbell.co.uk/twobytwo/twobytwo.htm

Cheers,
Bruce

p.s. - I shared Rich's confusion about the wording in #1, but see that Harry clarified things in #5.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35730

04 Dec 2017, 09:47

If chi-square is as good as Fisher's test here, that works the other way too.

Harry wants a P-value here, but the substantive conclusion is clear whatever you do.

Code:

. tabi 11 16 \ 13 22

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |        11         16 |        27 
         2 |        13         22 |        35 
-----------+----------------------+----------
     Total |        24         38 |        62 

           Fisher's exact =                 0.798
   1-sided Fisher's exact =                 0.489


. tabi 11 16 \ 13 22, chi

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |        11         16 |        27 
         2 |        13         22 |        35 
-----------+----------------------+----------
     Total |        24         38 |        62 

          Pearson chi2(1) =   0.0832   Pr = 0.773

Comment

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#8

04 Dec 2017, 14:33

I've heard that a couple of times when attending very good courses on stats, then I started to adopt this strategy whenever possible, and I hope it won't be taken as heresy.

Considering: the chi-square test depends on a (theoretical asymptotic approximation) as well as some sort of balance between levels of the categorical variables; on the other hand, Fisher's exact test wouldn't depend on it; in spite of being time-consuming, modern PCs handle this task easily; it wouldn't increase type I error; when estimating dozens of contingency tables, it can be mind-boggling to pick those we wish to present one test and those we wish to present the other test; considering that chi-square perfoms at best with large samples, but Fisher's exact test performs nice with small and (not extremely) large samples; well, apart from a situation that the sample is so large that the computer wouldn't handle this without many complaints, I tend to use Fisher's test overall.

Best regards,

Marcos
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#9

04 Dec 2017, 15:07

What Marcos and Nick say about the Fisher exact test is true. But it leaves unsaid the fact that it is also conditional on the marginals of the table. If the study design that generated the data does not in fact constrain all four marginals (in the case of a 2x2 results table) to their observed values, then this assumption is not met. In a typical experiment, the total number of observations in each experimental group are usually fixed by design, but the total numbers of responses in each outcome category are usually not constrained by design but are random variables. In that case, the assumptions are not met and the sampling distribution of the Fisher exact p-value is not in fact a uniform distribution, because the sampling space is actually much larger than what the Fisher calculations assume.

The chi square distribution is not conditional on the marginals and does not have this limitation.

(All of that said, when the expected cell counts are large, the two statistics usually provide very similar results, so I don't think this issue has much practical importance. And then there is my heresy, where I often question the usefulness of p-values from any of these analyses, depending on context. But that's a long discussion, for another day, and probably not even for this forum.)
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1134
#10

04 Dec 2017, 15:47

As I noted earlier (in #6), I believe that for 2x2 tables where (all 4 of) the marginal totals are not fixed in advance, and where all expected counts are 1 or more, the N-1 Chi-square test is a better choice than Fisher's exact test. Ian Campbell's nice simulation study in Statistics in Medicine convinced me of this. Here's the citation:

Campbell, I. (2007), Chi-squared and Fisher–Irwin tests of two-by-two tables with small sample recommendations. Statist. Med., 26: 3661–3675. doi:10.1002/sim.2832

Here are some key excerpts from the article. Note that the test often called Fisher's exact test was also developed by independently of Fisher by Joseph Oscar Irwin, another British statistician. Campbell tries to give Iriwn his due by calling the test the Fisher-Irwin test.

On page 3662:

Barnard [2] was the first to observe that such 2 × 2 tables can arise through at least three distinct research designs. In one, usually termed a comparative trial, there are two populations (denoted here by A and not-A), and we take a sample of size m from the first population, and a sample of size n from the second population. We observe the numbers of B and not-B in the two samples, and the research question is whether the proportions of B in the two populations are the same (the common proportion being denoted here by [pi]). In the second research design, termed cross-sectional or naturalistic [3], or the double dichotomy [2], a single sample of total size N is drawn from one population, and each member of the sample is classified according to two binary variables, A and B. Like comparative trials, the results can be displayed in the form of Table I(a), but the row totals, m and n are not determined by the investigator. The research question is whether there is an association between the two binary variables. The proportions in the population of A and B will be denoted here by [pi₁] and [pi₂] respectively.

In the third research design, sometimes termed the 2 × 2 independence trial [2, 4], both sets of marginal totals are fixed by the investigator. Here, there is no dispute that the Fisher–Irwin test (or Yates’s approximation to it) should be used. This last research design is rarely used and will not be discussed in detail.

On page 3674:

The current recommendations on the restriction of the chi-squared test to tables with a minimum expected number of at least 5 date back to Cochran [14, 15] and before, but Cochran [14] noted that the number 5 appeared to have been arbitrarily chosen, and could require modification once new evidence became available. This paper provides such new evidence and allows Cochran's guidelines to be updated. The data and arguments presented here provide a compelling body of evidence that the best policy in the analysis of 2 × 2 tables from either comparative trials or cross-sectional studies is:

(1) Where all expected numbers are at least 1, analyse by the ‘N - 1’ chi-squared test (the K. Pearson chi-squared test but with N replaced by N-1).
(2) Otherwise, analyse by the Fisher–Irwin test, with two-sided tests carried out by Irwin’s rule (taking tables from either tail as likely, or less, as that observed).

This policy extends the use of the chi-squared test to smaller samples (where the current practice is to use the Fisher–Irwin test), with a resultant increase in the power to detect real differences.

I apologize if I have strayed too far off-topic with this post. And I confess that I have become a bit of an evangelist for the N-1 Chi-square, because it seems that so few people have heard of it. I do hope that some members will find Campbell's article interesting and useful.

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment

Announcement

Mann Whitney Rank sum test

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment