Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata chi test - how to include continuity (Yates) correction

    Hi there. Please could I ask how to do a chi-square test w continuity correction on Stata?

    A colleague brought to my attention that entering " tab A B, chi" runs a chi-square test that does not include continuity correction. I compared the output on Stata with chi-square test on
    http://graphpad.com/quickcalcs/contingency1.cfm
    and the results seem to show that my colleague is right, ie the Stata test does not include Yates/ continuity correction.

    Is there a way to do chi-square test with continuity correction on Stata?

    I couldn't find the answer online or on searching this forum. Trying mhodds etc commands didn't resolve this.
    Many thanks in advance for your help.

    Sui

  • #2
    Sui:
    as per Richard Williams' teaching notes (https://www3.nd.edu/~rwilliam/stats1...ical-Stata.pdf), there's seemingly no way to do it in Stata. But Stata offers Fisher exact test as a suitable option (especially for small samples).
    Kind regards,
    Carlo
    Last edited by Carlo Lazzaro; 16 Oct 2014, 00:20.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      The user-written exactcc claims to offer Yates's continuity correction for a test of odds ratio equal to 1. You can find it on the Stata Journal download site: at the command line in Stata, type search continuity and you'll find its hyperlink first among entries in the pop-up window (or just type search exactcc at the command line).

      Comment


      • #4
        Carlo remembers my notes better than I do. But heed the advice given on the graphpad page that was linked to:

        "There are three ways to compute a P value from a contingency table. Fisher's test is the best choice as it always gives the exact P value, while the chi-square test only calculates an approximate P value. Only choose chi-square if someone requires you to. The Yates' continuity correction is designed to make the chi-square approximation better. With large sample sizes, the Yates' correction makes little difference. With small sample sizes, chi-square is not accurate, with or without the correction."
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 17.0 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Richard's notes, I think, echo a widespread although not universal consensus in statistical science on this point. It was pretty much the consensus 30 years ago when Stata was first released. There is no illusion: official commands here do not support Yates's correction, but that would certainly be documented if it were true.

          Comment


          • #6
            Only choose chi-square if someone requires you to.
            Fair enough in general. But if the contingency table has a large number of rows and columns, the calculation of Fisher's exact explodes in terms of compute time and memory requirements. When faced with large numbers of cells and small cell sizes, there are no really good options that I am aware of (short of combining levels of the variables if that's feasible). If anybody does know of a good solution for that situation, I'd be thrilled to learn of it.

            Comment


            • #7
              To clarify, the graphpad page was only referring to 2 x 2 contingency tables. You wouldn't use Yates in a bigger table anyway, right? But yes, I don't know what you can do in the situations Clyde describes.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Thank you, for all your responses. Sorry I should have clarified this was for 2x2 table. I did indeed read the paper by Richard and found it helpful (thanks) - so for my 2x2 contingency analysis I have been using Fisher exact in place of chi. The 2x2 contingency test was part of a prelimary analysis to select suitable variables for a multiple logistic test.

                However, as I write up my results for publication in a medical journal, I am conscious that I am stating my methods utilise Fisher Exact instead of Chi-square test (all similar studies to mine have used chi-square test), and wonder if it'll just complicate the review process... I was thinking in passing of reanalysing all 2x2 by chi using another programme eg the online graphpad calculator. However it'll be a lot of work...hmm.

                Comment


                • #9
                  I don't know what discipline you work in and what its conventions and traditions are. But I can tell you that the Fisher exact test is widely known and understood. Certainly in health care/epidemiology it would not raise any review problems at all, assuming it is properly used.

                  Comment


                  • #10
                    Yes indeed that's reassuring. I used to think (from medical school) that Fisher exact was only reserved for 2x2 table when one cell is <5. However after reading Richard's paper I am now better informed! I am working on a prognostic model to predict disease outcome.

                    Comment


                    • #11
                      If you really really really want/need the Yates correction, the exactcc program mentioned by Joseph earlier seems to work fine. I just updated my handout to reflect that.
                      -------------------------------------------
                      Richard Williams, Notre Dame Dept of Sociology
                      Stata Version: 17.0 MP (2 processor)

                      EMAIL: [email protected]
                      WWW: https://www3.nd.edu/~rwilliam

                      Comment


                      • #12
                        thanks!

                        Comment


                        • #13
                          Originally posted by Sui Wong View Post
                          I am working on a prognostic model to predict disease outcome.
                          Originally posted by Sui Wong View Post
                          The 2x2 contingency test was part of a prelimary analysis to select suitable variables for a multiple logistic test.
                          I'm curious--why are you doing preliminary variable selection via fourfold tables when your objective is prediction via multiple logistic regression? For a purely predictive objective, I was always led to believe that you throw everything into the pot and stir. Is it because you have missing-data problems among the predictors and you need to do triage in order to minimize the overall listwise data attrition in the multiple regression model? If so, keep in mind that a series of independent univariate associations (fourfold tables) won't necessarily reflect the variables' joint predictive power . . .

                          Comment


                          • #14
                            Also, if you are going to do a bunch of 2 by 2 tables, I'm not sure that you wouldn't be about as well off just looking at the correlation matrix. It would certainly be easier.
                            -------------------------------------------
                            Richard Williams, Notre Dame Dept of Sociology
                            Stata Version: 17.0 MP (2 processor)

                            EMAIL: [email protected]
                            WWW: https://www3.nd.edu/~rwilliam

                            Comment


                            • #15
                              Just managed to get internet access after 24 hours... noted further comments with thanks; apologies for delay in reply.

                              Richard - Correlation matrix? Am not familiar with that - just did a quick search online and looks at my stata course books - my understanding is that it was useful for linear regression/ linear outcome (eg blood pressure). Is it useful for binary outcome? My variables are categorical, and the outcome is binary. I also just installed corrtable stata program after searching 'help correlation matrix' on Stata - going to read about it now to see if it's applicable to my work.

                              Joseph - from my reading of prognostic papers for (binary) disease outcome (eg cardiovascular/ cancer research), it looks like the methods described have been univariate analysis of factors identified from a priori hypothesis, and then selecting potentially useful predictors --> putting all into a multiavariable logistic model --> then 'backward' (not sure if technical terminology correct or not!) elimination to get to simplest model.

                              Steep learning curve...am learning as I work through my data... so much to read and absorb. Love the ease of Stata programme and learning coding on Stata.
                              Am particularly grateful for so many useful resources online and this forum.

                              Kind regards
                              Sui

                              Comment

                              Working...
                              X