power calculation for two binary treatments and their interaction in a regression setting

Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#1

power calculation for two binary treatments and their interaction in a regression setting

16 Aug 2017, 22:44

I will have data from an experiment with two binary treatments (X and Z) and their interaction. The unequal sample sizes are given in this table:
Z=1 Z=0

X=1 N=2,500 N=717,500

X=0 N=2,500 N=77,500

The baseline rate is Pr(Y=1 | X=0, Z=0) = 4/1000. I am interested in doing inference on the average marginal effects (possibly with some covariates), so I will estimate something like one of these models:

Code:

reg y i.x##i.z, robust probit y i.x##i.z margins, at(x = (0 1) z=(0 1)) contrast(atcontrast(r._at) wald)

Is there a way to calculate the effect sizes that I can detect with either the OLS or probit/margins, given some alpha and power?

I tried using -powersim- (version 1.0.1), but I could not get the the block22() option with my uneven split.

Last edited by Dimitriy V. Masterov; 16 Aug 2017, 22:45. Reason: power calculation
Tags: None

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

17 Aug 2017, 00:13

Would something like the following help at least as a starting point?

Code:

version 15.0

clear *

// Create dataset with fixed cell counts
quietly set obs 2
generate x = _n - 1

quietly expand 2
bysort x: generate byte z = _n - 1

generate long count = 2500
quietly {
    replace count = 77500 in 1
    replace count = 717500 in 3
}
list, noobs sepby(x z)

tempfile tmpfil0
quietly save `tmpfil0'

// Simulations using the fixed dataset
program define simem
    version 15.0
    syntax , File(string) [Effect(real 0)]

    drop _all
    use `file'

    generate long count1 = rbinomial(count, 0.004)
    replace count1 = rbinomial(count, abs(`effect' - 0.004)) in l
    generate long count0 = count - count1
    drop count
    quietly reshape long count, i(x z) j(y)

    probit y i.x##i.z [fweight=count]
    testparm x#z
end

// Test size
local reps 1000
simulate p = r(p), reps(`reps') nodots nolegend: simem , file(`tmpfil0')
generate byte pos = p < 0.05
summarize pos, meanonly
display in smcl as text "Effect = " %5.3f `effect', "Power = " %4.2f r(mean), "Estimable = " %4.2f r(N) / `reps'

// Power
local reps 100
forvalues effect = 0.010(0.001)0.02 {
    simulate p = r(p), reps(100) nodots nolegend: simem , file(`tmpfil0') e(`effect')
    generate byte pos = p < 0.05
    summarize pos, meanonly
    display in smcl as text "Effect = " %5.3f `effect', "Power = " %4.2f r(mean), "Estimable = " %4.2f r(N) / `reps'
}

exit

Here, "effect" is defined such that the proportion in the cell diagonal to the baseline cell is "effect" minus the baseline rate. Substitute whatever code you need to make "effect" or "effect size" defined as desired. Also, the simulation program tests only the interaction term; substitute -margins- or whatever as desired. Be sure to look for coding errors.

Last edited by Joseph Coveney; 17 Aug 2017, 00:18.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#3

18 Aug 2017, 20:51

Originally posted by Joseph Coveney View Post

Be sure to look for coding errors.

Well, how about we start with a seed?

Code:

clear * set seed 1406723

And continue by making "Estimable" etc. meaningful

Code:

generate byte pos = p < 0.05 if !missing(p)

and make "effect" a little more intuitive.

Code:

replace count1 = rbinomial(count, `effect' + 0.004) in l

With that last one

Code:

forvalues effect = 0.002(0.002)0.012 {

hones in on the range where things are happening.
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#4

28 Aug 2017, 19:35

Thank you for this great code. I think I've digested and adapted it to my -margins, pwcompare- case successfully, but I am uncertain what the estimable part is doing. I understand that there will be some simulations with no p-values, but I am not sure if I understand why that is happening here.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#5

29 Aug 2017, 05:46

The rate (4/1000) is fairly low. If there is perchance a zero-count cell in the table, then you will get separation and the probit regression will fail. In such cases, I expected it to return a missing p-value; the estimable stuff was just to monitor whether it was happening. As it turns out, even with only 2500 total in the Z = 1 / X = 0 condition, it seem to be rare enough that it never happened with the small number of repetitions that I used.

You could always get around such a case with a combination of probit's asis option and the use of the likelihood-ratio test.
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#6

29 Aug 2017, 08:35

Thank you again for a great explanation. Makes a lot of sense! I will post my version of the code when it gets cleaned up.
Comment

	Z=1	Z=0
X=1	N=2,500	N=717,500
X=0	N=2,500	N=77,500

Announcement