using Fishers Exact with 7 Race Categories?

Ellen Kiley

Join Date: Dec 2023

Posts: 25
#1

using Fishers Exact with 7 Race Categories?

10 Jan 2024, 08:27

So when using Fisher's exact to tabulate association between Race and Screening I got an error message saying "exceeded memory limits" when it gets to the final enumeration of 'sample-space combinations' and in the help it says to try using exact(2), which I did and got the same error message. Any ideas?

tab RaceEthnicityCoding ScreeningDone_Dummy, chi column exact(2)

+-------------------+
| Key |
|-------------------|
| frequency |
| column percentage |
+-------------------+

Enumerating sample-space combinations:
stage 7: enumerations = 1
stage 6: enumerations = 11
stage 5: enumerations = 321
stage 4: enumerations = 9730
stage 3: enumerations = 257651
stage 2: exceeding 1x10^6 enumerations
exceeded memory limits using exact(2); try again with larger #; see help tabulate for details

Race + |
Ethnicity | Screening Done_Dummy
Coding | 0 1 | Total
-----------+----------------------+----------
1 | 36 44 | 80
| 11.01 8.10 | 9.20
-----------+----------------------+----------
2 | 19 38 | 57
| 5.81 7.00 | 6.55
-----------+----------------------+----------
3 | 17 18 | 35
| 5.20 3.31 | 4.02
-----------+----------------------+----------
4 | 131 300 | 431
| 40.06 55.25 | 49.54
-----------+----------------------+----------
5 | 5 5 | 10
| 1.53 0.92 | 1.15
-----------+----------------------+----------
6 | 76 116 | 192
| 23.24 21.36 | 22.07
-----------+----------------------+----------
7 | 43 22 | 65
| 13.15 4.05 | 7.47
-----------+----------------------+----------
Total | 327 543 | 870
| 100.00 100.00 | 100.00

Pearson chi2(6) = 37.2129 Pr = 0.000
r(910);
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35743

10 Jan 2024, 08:38

I doubt you're missing much -- except a different very small P-value.

The chi-square test is best taken forward to look at the pattern of discrepancies. Here's tabchii from tab_chi on SSC with extra Pearson residuals, namely (observed MINUS expected) / sqrt(expected). First off, the P-values are of the order of 1 in a million and so clear-cut. I would bet that the Fisher test wouldn't contradict that. Second, rows 4 and 7 are those most out of line with a null.

.

Code:

 tabchii 36 44 \ 19 38 \ 17 18 \ 131 300 \ 5 5 \ 76 116 \ 43 22, pearson

          observed frequency
          expected frequency
          Pearson residual

----------------------------
          |       col      
      row |       1        2
----------+-----------------
        1 |      36       44
          |  30.069   49.931
          |   1.082   -0.839
          |
        2 |      19       38
          |  21.424   35.576
          |  -0.524    0.406
          |
        3 |      17       18
          |  13.155   21.845
          |   1.060   -0.823
          |
        4 |     131      300
          | 161.997  269.003
          |  -2.435    1.890
          |
        5 |       5        5
          |   3.759    6.241
          |   0.640   -0.497
          |
        6 |      76      116
          |  72.166  119.834
          |   0.451   -0.350
          |
        7 |      43       22
          |  24.431   40.569
          |   3.757   -2.915
----------------------------

1 cell with expected frequency < 5

         Pearson chi2(6) =  37.2129   Pr = 0.000
likelihood-ratio chi2(6) =  36.4749   Pr = 0.000

. ret li

scalars:
                  r(N) =  870
                  r(r) =  7
                  r(c) =  2
               r(chi2) =  37.21292707219945
                  r(p) =  1.60034046607e-06
            r(chi2_lr) =  36.4749305559922
               r(p_lr) =  2.22847158301e-06

Comment

Ellen Kiley

Join Date: Dec 2023

Posts: 25
#3

10 Jan 2024, 09:21

Thanks. I"m doing this for a med student who has to present the data at a conference this weekend and I can tell her not to worry about it but I just want to be able to explain to her what is happening and what to say if anyone asks her about it. Should she just report the p val of .000 for fishers exact and leave it at that? I don't want anyone to get called out for an error.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35743
#4

10 Jan 2024, 10:35

You don't have a P-value for FIsher's exact test. At most you have my guess that it would be very small.

I'd never report P-values of 0,000 because they are all too likely to be misunderstood. I would report the attained value for chi-square P-value, but using 3 or 4 significant figures. You could say P < 0.0005.

It should be much more interesting to comment on why there is an apparent departure from null expectation and how far it is clinically interesting.

PS Not a medic. Not a medical statistician.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35743
#5

10 Jan 2024, 11:33

Ellen Kiley posted this in another thread

Or I can just use Pearson's (which runs without error) because while 2 cells have only 5 observations, only 1 of the expected has <5 and that is way < 20%?

That's a rather ancient rule of thumb (Cochran 1952???). I tend to go with a simpler rule, which can be found in the work of Harold Jeffreys and more recently of Stephen Fienberg, which is to worry only if expected frequencies fall below 1.
Comment

Announcement