problem with sampsi to calculate sample size

Chris Labos

Join Date: Aug 2014

Posts: 39
#1

problem with sampsi to calculate sample size

27 Aug 2014, 18:32

Hi,

I'm having a problem reconciling the sample size calculations with sampsi. which seem to be a bit off.
I'm using STATA 12 for Mac.

I am reviewing a research protocol where Drug A and Drug B are being used to look at chronic post-operative pain 1 year out. The expectation is that Drug A will result in pain 30% of the time, and the Drug B 10% of the time, with 80% power and a 95%confidence level.

Thus :

_____________________________________

Code:

. sampsi .30 0.10, power(0.8) alpha(0.05) Estimated sample size for two-sample comparison of proportions Test Ho: p1 = p2, where p1 is the proportion in population 1 and p2 is the proportion in population 2 Assumptions: alpha = 0.0500 (two-sided) power = 0.8000 p1 = 0.3000 p2 = 0.1000 n2/n1 = 1.00 Estimated required sample sizes: n1 = 72 n2 = 72

________________________________________________

However, after doing some back of the enveloppe calculations where n = 2 *(zα/2 + zβ ) ^2*p∗ (1 − p∗ ) / delta^2

and I got after some rounding n=63.

Confused by this discrepancy, I checked my math with R and got:

Code:

> power.prop.test(p1=0.3, p2=0.1, power=0.8) Two-sample comparison of proportions power calculation n = 61.5988 p1 = 0.3 p2 = 0.1 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group

I cannot understand why the sample size estimate with sampsi is larger than either my hand calculations or R. I tried looking at the base code with viewsource but nothing obvious jumped out at me.

I would appreciate if anyone could point out to me where the error is. If I have done something very silly please be kind.

Thanks,

Chris Labos

Last edited by Chris Labos; 27 Aug 2014, 18:41.
Tags: sample size, sampsi

Richard Williams

Join Date: Apr 2014
Posts: 5008

27 Aug 2014, 18:51

I am guessing it has to do with correction for continuity. If you tell sampsi not to make the correction,

Code:

. sampsi .30 0.10, power(0.8) alpha(0.05) nocont

Estimated sample size for two-sample comparison of proportions

Test Ho: p1 = p2, where p1 is the proportion in population 1
                    and p2 is the proportion in population 2
Assumptions:

         alpha =   0.0500  (two-sided)
         power =   0.8000
            p1 =   0.3000
            p2 =   0.1000
         n2/n1 =   1.00

Estimated required sample sizes:

            n1 =       62
            n2 =       62

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Comment

Richard Williams

Join Date: Apr 2014
Posts: 5008

27 Aug 2014, 18:57

Also, if you use the power command in Stata 13, nocont is the default, and continuity is the option. I assume the change in defaults reflects something about either common practice or what is now believed to be the best approach.

Code:

. power twoprop .30 0.10, power(0.8) alpha(0.05)

Performing iteration ...

Estimated sample sizes for a two-sample proportions test
Pearson's chi-squared test 
Ho: p2 = p1  versus  Ha: p2 != p1

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =   -0.2000  (difference)
           p1 =    0.3000
           p2 =    0.1000

Estimated sample sizes:

            N =       124
  N per group =        62

. power twoprop .30 0.10, power(0.8) alpha(0.05) continuity

Performing iteration ...

Estimated sample sizes for a two-sample proportions test
Pearson's chi-squared test 
Ho: p2 = p1  versus  Ha: p2 != p1

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =   -0.2000  (difference)
           p1 =    0.3000
           p2 =    0.1000

Estimated sample sizes:

            N =       144
  N per group =        72

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Comment

Chris Labos

Join Date: Aug 2014

Posts: 39
#4

27 Aug 2014, 19:22

Thanks, I wouldn't have thought continuity correction would make such a big difference. I guess the change was to make STATA more in-line with how other software programs do their power calculations.

I appreciate the rapid reply.
Comment
Tatiana Kharitonova

Join Date: Sep 2015

Posts: 2
#5

18 Sep 2015, 02:52

Dear all,

I encountered with the same problem using sampsi in Stata 12, but it could not be resolved with cancellation of the continuity correction:

. sampsi 0.5 0.2, alpha(0.01) nocontinuity

Estimated sample size for two-sample comparison of proportions

Test Ho: p1 = p2, where p1 is the proportion in population 1
and p2 is the proportion in population 2
Assumptions:

alpha = 0.0100 (two-sided)
power = 0.9000
p1 = 0.5000
p2 = 0.2000
n2/n1 = 1.00

Estimated required sample sizes:

n1 = 73
n2 = 73

And calculating by formula (M.Bland),

. display (2.58+1.28)^2*((0.5*(1-0.5)+(0.2*(1-0.2))))/(0.5-0.2)^2
67.875956

which is approximately 68, not 73.

Online calculator gives 68 as well.
http://powerandsamplesize.com/Calcul...ample-Equality

Where could be the problem?

Thanks!
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

19 Sep 2015, 14:07

The standard-two sample test of proportion is equivalent to the Chi-square test in a 2x2 table. sampsi does a correct power calculation for these tests, The same formula is used by power twoprop in recent versions of Stata and can be found on p. 158 of the PSS manual (http://www.stata.com/manuals13/pss.pdf).
Bland's formula applies to a slightly different test statistic. The \(\alpha\) level associated with that statistic is larger than the specified \(\alpha\) level. This accounts for the smaller sample size given by his formula.

Some detail:

The basic definition of a p-value is that it is the probability that a test statistic exceeds a critical value if the null hypothesis is true.

For the two-sample case, null hypothesis is \(H_0: p_1 = p_2 = p_0\), say.

Under the null hypothesis, the difference
\(\hat{p}_1-\hat{p}_2\) has variance
\[
p_0(1-p_0) \times (1/n_1 + 1/n_2)
\]
Of course \(p_0\) isn't known, but given observed proportions \(\hat{p}_1\) and \(\hat{p}_2 \), it is estimated by their average (what else?)
\[\hat{p}_0 = \frac{\hat{p}_1+\hat{p}_2}{2}
\]
(sampsi uses the average of the hypothesized values of \(p_1\) and \(p_2\).)

Then the test statistic for a two-sided test, ignoring the continuity correction, is:
\[
Z = \frac{\hat{p}_1-\hat{p}_2}{\sqrt{\hat{p}_0(1-\hat{p}_0) \times (1/n_1+1/n_2)}}
\]
and we reject if
\[
|Z| \geq z_{\alpha/2},
\]
where \(z_{\alpha/2}\) is the upper \(\alpha/2\) Normal quantile.

Under the null hypothesis.
\[
P\left(|Z| \geq z_{\alpha/2}|H_0\right)=\alpha
\]
The ordinary Chi Square test statistic with one degree of freedom, as computed by tabulate twoway, for example, is \(Q= Z^2\). So the sampsi result will also apply to the Chi Square test.

Bland's calculation is simpler than sampsi's calculation, probably the reason that he gave it. In effect, it is a calculation for a different test statistic:
\[
Z' = \frac{\hat{p}_1-\hat{p}_2}{\sqrt{\hat{p}_1(1-\hat{p}_1)/n_1 + \hat{p}_2(1-\hat{p}_2)/n_2}}
\]
The estimated standard error in the denominator of \(Z'\) can be recognized as an estimate of the standard error for \(\hat{p}_1-\hat{p}_2\), for any values of \(p_1\) and \(p_2\). It's the one you would use in a table or for a confidence interval. Because of this, you might occasionally find that the p-value for the \(Z\) or Chi square test is 0.05, but tat the CI does not include zero. This is uncomfortable.

It can be proved that the denominator in \(Z'\) is less than the denominator in \(Z\), so that \(|Z'|>|Z|\) and
\[
P\left(|Z'| \geq z_{\alpha/2} |H_0\right) = \alpha' > \alpha
\]
In other words, the \(Z'\) test rejects too often under the null hypothesis. The \(n\)'s given by Bland's formula will always be too small; or, if \(n\) is fixed, and a power calculation is done (with the calculator you used, for example), the estimated power will be a little too large.

The take-home message: to assure the desired power for the specified \(\alpha\), use sampsi.

Last edited by Steve Samuels; 19 Sep 2015, 14:44.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
2 likes
Comment
Tatiana Kharitonova

Join Date: Sep 2015

Posts: 2
#7

21 Sep 2015, 00:06

Dear Steve,

Please accept my sincere gratitude for the comprehensive explanation.

And good luck with your valuable work!
Regards, T.
Comment

Announcement