Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • r(910) exceeded memory limits using exact(1); try again with larger #

    Hi,

    I am getting the r(910) error when attempting a contingency table with the exact option. From what I've learned, newer versions of Stata don't need you to set memory, it just does this on the fly and the r(910) error suggests I'm out of memory (?).

    Any thoughts on how to fix this?

    This is the code I'm using:
    tab ic_full Tuboovarian, exact

    Most
    granular
    view of
    income Tuboovarian
    category No access Always Occasiona Notdep Total

    Low Income 5 60 98 40 203
    Low-Middle 1 0 0 7 8
    Upper-Middle 23 27 43 42 135
    High Income 14 23 56 41 134

    Total 43 110 197 130 480


    I have a number of other 4 x4 tables, also with small cells, but this one trips the error. Any ideas on what to do?

  • #2
    Well, as the help file for -tabulate twoway- tells you, you can try again specifying -exact(2)- or an even larger number. But I suspect this one is a lost cause. The memory required to calculate Fisher's exact is of the order of magnitude of the number of 4x4 tables having the same marginals as your table. With a 4x4 table that has 7 out of its 8 marginals in the hundreds, that is going to be some enormous number. It wouldn't surprise me if it were more than the number of particles in the universe. I'll be very surprised if you can get this to run. Fisher's exact is really only practical for tables with small marginals.

    I'd just go with the usual chi square instead. If you really won't live with that due to the small cells in the second row, consider redefining the income variable by comgining the Low-Middle category with either the Low Income or Upper-Middle, depending on which seems more appropriate from a substantive perspective, and then doing a chi square analysis instead.

    Comment


    • #3
      The levels of income are ordered. Are the three rightmost categories of tubo-ovarian something also ordered? That is, are the outcomes either "No access" (unable to determine) or one of three degrees of frequency?

      If so then a chi-square test statistic from a nominal-by-nominal tabulation might not be the most effective way to assess association between the two measurements.

      Perhaps a two-part model with the first part a logistic regression model having as outcome variable the ability to make a determination ("No access" versus the three other categories) and the second part an ordered-logistic regression model having as outcome variable the three rightmost categories (omitting the "No access" cases).

      With regression models, you can take advantage of the contrast postestimation command and examine the first (i.e., linear) component of the set of orthogonal polynomial contrasts of the income predictor variable as a way to get at the strength of association. And even if the outcome categories are not strictly ordered, you can still use contrast after, say, mlogit Tuboovarian i.ic_full although its use is a little more involved with multiple-equation models..

      If you don't want to combine the first two income categories as Clyde suggests, then you could look into resampling statistics (help permute) or fitting a Bayesian model with a regularizing prior on the parameter for the "Low-Middle" income level.

      Comment


      • #4
        Adding some details to excellent replies from Clyde Schechter and Joseph Coveney

        Although chi-square testing doesn't pay attention to category order, neither if I understand correctly does the Fisher exact test. The interesting questions here are not about P-values -- chi-square alone implies that there is an association and the P-value is overwhelming.

        Why are so few people categorised as low middle? Is this about researcher's categories or people identifying themselves?

        tabchii from tab_chi on SSC offers some bells and whistiles beyond official commands, e.g. Pearson residuals.

        Code:
        . tabchii 5 60 98 40 203 \ 1 0 0 7 8 \ 23 27 43 42 135 \ 14 23 56 41 134 , pearson
        
                  observed frequency
                  expected frequency
                  Pearson residual
        
        -------------------------------------------------------
                  |                     col                    
              row |       1        2        3        4        5
        ----------+--------------------------------------------
                1 |       5       60       98       40      203
                  |  18.185   46.521   83.315   54.979  203.000
                  |  -3.092    1.976    1.609   -2.020    0.000
                  | 
                2 |       1        0        0        7        8
                  |   0.717    1.833    3.283    2.167    8.000
                  |   0.335   -1.354   -1.812    3.284    0.000
                  | 
                3 |      23       27       43       42      135
                  |  12.094   30.938   55.406   36.562  135.000
                  |   3.136   -0.708   -1.667    0.899    0.000
                  | 
                4 |      14       23       56       41      134
                  |  12.004   30.708   54.996   36.292  134.000
                  |   0.576   -1.391    0.135    0.782    0.000
        -------------------------------------------------------
        
        4 cells with expected frequency < 5
        1 cell with expected frequency < 1
        
                 Pearson chi2(12) =  52.9650   Pr = 0.000
        likelihood-ratio chi2(12) =  56.1831   Pr = 0.000
        
        . ret li
        
        scalars:
                          r(N) =  960
                          r(r) =  4
                          r(c) =  5
                       r(chi2) =  52.96502689780613
                          r(p) =  4.17880035795e-07
                    r(chi2_lr) =  56.18306576039558
                       r(p_lr) =  1.10951184182e-07
        
        .
        In your case you could use tabchi from the same package.

        Comment


        • #5
          Thanks everyone for your excellent replies. I went with collapsing income categories, but was still intruiged why this one variable (of about 60 I was looking at) refused to work. And it was interesting - these tables with lots of exact tests specified took several minutes to run; I've never encountered that in my days as a stata coder, so I was worried something was wrong. I appreciate now how computationally intense asking my poor machine to run through dozens of exact tests!

          Thanks again!

          Comment

          Working...
          X