Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with sampsi to calculate sample size

    Hi,

    I'm having a problem reconciling the sample size calculations with sampsi. which seem to be a bit off.
    I'm using STATA 12 for Mac.

    I am reviewing a research protocol where Drug A and Drug B are being used to look at chronic post-operative pain 1 year out. The expectation is that Drug A will result in pain 30% of the time, and the Drug B 10% of the time, with 80% power and a 95%confidence level.

    Thus :

    _____________________________________

    Code:
    . sampsi .30 0.10, power(0.8) alpha(0.05)
    
    Estimated sample size for two-sample comparison of proportions
    
    Test Ho: p1 = p2, where p1 is the proportion in population 1
                        and p2 is the proportion in population 2
    Assumptions:
    
             alpha =   0.0500  (two-sided)
             power =   0.8000
                p1 =   0.3000
                p2 =   0.1000
             n2/n1 =   1.00
    
    Estimated required sample sizes:
    
                n1 =       72
                n2 =       72
    ________________________________________________

    However, after doing some back of the enveloppe calculations where n = 2 *(zα/2 + zβ ) ^2*p∗ (1 − p∗ ) / delta^2

    and I got after some rounding n=63.

    Confused by this discrepancy, I checked my math with R and got:

    Code:
    > power.prop.test(p1=0.3, p2=0.1, power=0.8)
    
         Two-sample comparison of proportions power calculation
    
                  n = 61.5988
                 p1 = 0.3
                 p2 = 0.1
          sig.level = 0.05
              power = 0.8
        alternative = two.sided
    
     NOTE: n is number in *each* group
    I cannot understand why the sample size estimate with sampsi is larger than either my hand calculations or R. I tried looking at the base code with viewsource but nothing obvious jumped out at me.

    I would appreciate if anyone could point out to me where the error is. If I have done something very silly please be kind.

    Thanks,

    Chris Labos

    Last edited by Chris Labos; 27 Aug 2014, 18:41.

  • #2
    I am guessing it has to do with correction for continuity. If you tell sampsi not to make the correction,

    Code:
    . sampsi .30 0.10, power(0.8) alpha(0.05) nocont
    
    Estimated sample size for two-sample comparison of proportions
    
    Test Ho: p1 = p2, where p1 is the proportion in population 1
                        and p2 is the proportion in population 2
    Assumptions:
    
             alpha =   0.0500  (two-sided)
             power =   0.8000
                p1 =   0.3000
                p2 =   0.1000
             n2/n1 =   1.00
    
    Estimated required sample sizes:
    
                n1 =       62
                n2 =       62
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 16.0MP (2 processor)

    EMAIL: rwilliam@ND.Edu
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Also, if you use the power command in Stata 13, nocont is the default, and continuity is the option. I assume the change in defaults reflects something about either common practice or what is now believed to be the best approach.

      Code:
      . power twoprop .30 0.10, power(0.8) alpha(0.05)
      
      Performing iteration ...
      
      Estimated sample sizes for a two-sample proportions test
      Pearson's chi-squared test 
      Ho: p2 = p1  versus  Ha: p2 != p1
      
      Study parameters:
      
              alpha =    0.0500
              power =    0.8000
              delta =   -0.2000  (difference)
                 p1 =    0.3000
                 p2 =    0.1000
      
      Estimated sample sizes:
      
                  N =       124
        N per group =        62
      
      . power twoprop .30 0.10, power(0.8) alpha(0.05) continuity
      
      Performing iteration ...
      
      Estimated sample sizes for a two-sample proportions test
      Pearson's chi-squared test 
      Ho: p2 = p1  versus  Ha: p2 != p1
      
      Study parameters:
      
              alpha =    0.0500
              power =    0.8000
              delta =   -0.2000  (difference)
                 p1 =    0.3000
                 p2 =    0.1000
      
      Estimated sample sizes:
      
                  N =       144
        N per group =        72
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      Stata Version: 16.0MP (2 processor)

      EMAIL: rwilliam@ND.Edu
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Thanks, I wouldn't have thought continuity correction would make such a big difference. I guess the change was to make STATA more in-line with how other software programs do their power calculations.

        I appreciate the rapid reply.

        Comment


        • #5
          Dear all,

          I encountered with the same problem using sampsi in Stata 12, but it could not be resolved with cancellation of the continuity correction:

          . sampsi 0.5 0.2, alpha(0.01) nocontinuity

          Estimated sample size for two-sample comparison of proportions

          Test Ho: p1 = p2, where p1 is the proportion in population 1
          and p2 is the proportion in population 2
          Assumptions:

          alpha = 0.0100 (two-sided)
          power = 0.9000
          p1 = 0.5000
          p2 = 0.2000
          n2/n1 = 1.00

          Estimated required sample sizes:

          n1 = 73
          n2 = 73


          And calculating by formula (M.Bland),

          . display (2.58+1.28)^2*((0.5*(1-0.5)+(0.2*(1-0.2))))/(0.5-0.2)^2
          67.875956

          which is approximately 68, not 73.

          Online calculator gives 68 as well.
          http://powerandsamplesize.com/Calcul...ample-Equality

          Where could be the problem?

          Thanks!

          Comment


          • #6

            The standard-two sample test of proportion is equivalent to the Chi-square test in a 2x2 table. sampsi does a correct power calculation for these tests, The same formula is used by power twoprop in recent versions of Stata and can be found on p. 158 of the PSS manual (http://www.stata.com/manuals13/pss.pdf).
            Bland's formula applies to a slightly different test statistic. The \(\alpha\) level associated with that statistic is larger than the specified \(\alpha\) level. This accounts for the smaller sample size given by his formula.

            Some detail:

            The basic definition of a p-value is that it is the probability that a test statistic exceeds a critical value if the null hypothesis is true.

            For the two-sample case, null hypothesis is \(H_0: p_1 = p_2 = p_0\), say.

            Under the null hypothesis, the difference
            \(\hat{p}_1-\hat{p}_2\) has variance
            \[
            p_0(1-p_0) \times (1/n_1 + 1/n_2)
            \]
            Of course \(p_0\) isn't known, but given observed proportions \(\hat{p}_1\) and \(\hat{p}_2 \), it is estimated by their average (what else?)
            \[\hat{p}_0 = \frac{\hat{p}_1+\hat{p}_2}{2}
            \]
            (sampsi uses the average of the hypothesized values of \(p_1\) and \(p_2\).)

            Then the test statistic for a two-sided test, ignoring the continuity correction, is:
            \[
            Z = \frac{\hat{p}_1-\hat{p}_2}{\sqrt{\hat{p}_0(1-\hat{p}_0) \times (1/n_1+1/n_2)}}
            \]
            and we reject if
            \[
            |Z| \geq z_{\alpha/2},
            \]
            where \(z_{\alpha/2}\) is the upper \(\alpha/2\) Normal quantile.

            Under the null hypothesis.
            \[
            P\left(|Z| \geq z_{\alpha/2}|H_0\right)=\alpha
            \]
            The ordinary Chi Square test statistic with one degree of freedom, as computed by tabulate twoway, for example, is \(Q= Z^2\). So the sampsi result will also apply to the Chi Square test.

            Bland's calculation is simpler than sampsi's calculation, probably the reason that he gave it. In effect, it is a calculation for a different test statistic:
            \[
            Z' = \frac{\hat{p}_1-\hat{p}_2}{\sqrt{\hat{p}_1(1-\hat{p}_1)/n_1 + \hat{p}_2(1-\hat{p}_2)/n_2}}
            \]
            The estimated standard error in the denominator of \(Z'\) can be recognized as an estimate of the standard error for \(\hat{p}_1-\hat{p}_2\), for any values of \(p_1\) and \(p_2\). It's the one you would use in a table or for a confidence interval. Because of this, you might occasionally find that the p-value for the \(Z\) or Chi square test is 0.05, but tat the CI does not include zero. This is uncomfortable.

            It can be proved that the denominator in \(Z'\) is less than the denominator in \(Z\), so that \(|Z'|>|Z|\) and
            \[
            P\left(|Z'| \geq z_{\alpha/2} |H_0\right) = \alpha' > \alpha
            \]
            In other words, the \(Z'\) test rejects too often under the null hypothesis. The \(n\)'s given by Bland's formula will always be too small; or, if \(n\) is fixed, and a power calculation is done (with the calculator you used, for example), the estimated power will be a little too large.

            The take-home message: to assure the desired power for the specified \(\alpha\), use sampsi.
            Last edited by Steve Samuels; 19 Sep 2015, 14:44.
            Steve Samuels
            Statistical Consulting
            sjsamuels@gmail.com

            Stata 14.2

            Comment


            • #7
              Dear Steve,

              Please accept my sincere gratitude for the comprehensive explanation.

              And good luck with your valuable work!
              Regards, T.

              Comment

              Working...
              X