Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Confidence Interval estimation using invchi2

    Hi,
    I have data on cancer cases and population size for a rare cancer. So, the case count is low. I have calculated the crude rate and want to estimate its 95% CI. For low counts, it is recommended to use the Poisson exact method for CI estimation, the formula I am using are:




    To implement this in STATA, I write the following command:

    Code:
    generate LB= (100000/populationsize)*0.5*invchi2(2*casesyear, .05/2 )
    
    generate UB= (100000/populationsize)*0.5*invchi2((2*casesyear)+1, 1-(.05/2))


    Here is a sample of my data:
    _ID casesyear populationsize
    A 6.1 10712
    B 5.1 13393



    My concern is that the estimated CIs do not match with the ones I calculated using Excel for validation purposes. In a thread for another function, I read this may be due to STATA using the approximation method, and not the exact method. My question is how can I know what is STATA using for the inchi2 function and how to ensure, through options, to ask it to calculate using the exact method?

    Thanks

  • #2
    I just looked at what Stata and Excel return for invchi2(20000, 0.975) and invchi2(20000, 0025), and both returned the same number up to 10 significant digits.

    Notice that the order of the arguments is different in Stata and Excel. Stata expects that the first argument to be the degrees of freedom and the second the probability: invchi2(df,p), whereas Excel expects the first argument to be the probability and the second argument the degrees of freedom: CHISQ.INV(p,df)
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks Maarten for addressing my query.

      I tried your way to check for the result using invchi2 in both Excel and Stata. They return the same value as you have noted. But when I start introducing fraction/decimal points in the degree of freedom for example 20.13 (in my case it is linked to the average number of cases per year), the result starts varying. When incorporating the full formula for the CI, because of adding and multiplying with other values) the resultant values are very different. For instance, the UB for one of the countries is 121.9 in Excel vs 116.4 in Stata. What do you suggest in this case?

      Thanks
      Josna

      Comment


      • #4
        I would suggest performing computations in Stata, as I trust its formulas much more than Excel.

        Comment


        • #5
          I would discuss this with Stata's tech support: https://www.stata.com/support/tech-support/contact/ . This is really about the internals of how this function was implemented, and they know that much better than anyone else. My experience with them is that they are very friendly and helpful.

          If you get an answer from them , it would be nice if you posted that answer (or a summary thereof) here to close this thread. If someone in the future has the same problem, and comes across this thread, then I am sure they would appreciate that very much.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            First, I note a difference between the equation and calculation you show for the upper CI bound.

            Second, documentation for the Excel functions are helpful here.
            * CHISQ.INV (inverse Chi2)
            * CHISQ.INV.RT (inverse Chi2 of right-tail)

            The first link states that non-integer degrees of freedom are truncated to integers. This right here is the reason for your discrepancy.

            Further notes: The second link suggests that Excel uses an iterative search method, though nothing is specifically stated for the CHISQ.INV() function. Nor does Stata document the algorithm it uses.

            Comment

            Working...
            X