Converting power calculation simulation with known sample size into empirical power simulation across an array of sample sizes

Christopher Rowan

Join Date: Jan 2016

Posts: 15
#1

Converting power calculation simulation with known sample size into empirical power simulation across an array of sample sizes

13 Jul 2023, 17:57

Dear Statalist members,

I have Stata code I used during grad school to simulate power for my dissertation project. A simplified version of the code is pasted below. I included a simplified version since my objective is to estimate a different treatment effect for my current project. I tested the simplified version of the code today; it works fine. You should be able to copy/paste the code below into two separate .do files.

The code is divided into two separate .do files (referred to as outer and inner) . The outer .do file runs the inner .do file with the specified arguments. If there's a better/simpler approach, as opposed to using two separate .do files, I would be grateful if you would suggest a more elegant solution.

That said, my primary request for assistance is to convert the power calculation simulation, with a known sample size from a feasibility assessment, to a power calculation across an array of sample sizes from 1,000-10,000 patients in intervals of 1,000 patients. Ultimately, I intend to run 1,000 reps to estimate power at each sample size and ultimately plot the estimated empirical power across the array of sample sizes.

Note: I have not included the variance inflation factor I will be using for the final power calculations - since I have not yet figured out how to establish the VIF or design effect for this study using MSM with IPTW and IPCW. I intend to add this later once the method to run the simulation is established.

Your assistance is greatly appreciated! I hope this post is appropriate and meets community standards. If not, please let me know.

Chris

Outer code .do file "simpowerouter_v2"
clear all
cd "C:\Users\Chris"
pwd

do simpower_v2.do

capture log close
log using simpower_v2.log, replace

set seed 04301971

#delimit ;
set more off;
simul "simpower_v2 10000 -2.5 0.4"

* ba1 is the beta coefficient of interest
ba1=r(ba1)
var_ba1=r(var_ba1)
ep_ba1 = r(ep_ba1)
,
reps(1000);

di "simpower_v2";

summ ep*;
summ ba1;
summ var*;

#delimit cr;

capture log close
exit

Inner code .do file "simpower_v2"
capture program drop _all
program define simpower_v2, rclass
version 16.1
set more off

* the statement below calls on the arguments established in the outer .do file
args ss ycons yba1

drop _all

* gen # observations
set obs `ss'
gen id = _n

* gen person-years
gen pyears=rpoisson(2.2)
replace pyears=pyears+0.1

* gen exposure var - assuming 5% exposed 95% unexposed
capture drop a1
gen a1 = uniform()
replace a1 = 1 if a1<0.05
replace a1=0 if a1>=0.05 & a1!=1

* gen outcome var - assuming cumulative incidence of 10%
capture drop proby
gen proby = exp(`ycons' + `yba1'*a1)/(1 + exp(`ycons' + `yba1'*a1))
capture drop random3
gen random3 = uniform()
capture drop y
gen y = 1 if random3<= proby
replace y = 0 if y==.
drop random3
tab a1 y, row

* st set data and run cox model
stset pyears, id(id) failure(y==1)
stcox a1, nohr

* obtain beta coeficient of exposure parameter
matrix beta = get(_b)
scalar ba1 = beta[1,1]
return scalar ba1 = scalar(ba1)

* obtain variance of exposure parameter
matrix v0=(e(V))
scalar var_ba1=v0[1,1]
return scalar var_ba1 = scalar(var_ba1)

* gen 95% ci lcl and ucl
capture drop lcl_ba1 ucl_ba1
gen lcl_ba1=scalar(ba1) - (1.96*(sqrt(scalar(var_ba1))))
gen ucl_ba1=scalar(ba1) + (1.96*(sqrt(scalar(var_ba1))))

* gen empiric power based on the 95% CI
gen ep_ba1=0 if lcl_ba1<=0
replace ep_ba1=1 if ep_ba1==.
scalar ep_ba1=ep_ba1
return scalar ep_ba1 = scalar(ep_ba1)

end
exit

Last edited by Christopher Rowan; 13 Jul 2023, 17:59.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

13 Jul 2023, 20:07

I would do this by creating a frame to hold all the simulation outputs and then just loop over sample sizes and reps to fill it in. This code is untested as there is no example data, but if it has errors, it at least points the way and should be easy to fix.

Code:

set seed 1234 // OR WHATEVER frame create results int (sample_size rep) float(ba1 var_ba1 ep_ba1) foreach n of numlist 1000 (1000) 10000 { forvalues i = 1/1000 { // REPS AT EACH SAMPLE SIZE simpower_v2 `n' -2.5 0.4 frame post results (`n') (`i') (r(ba1)) (r(var_ba1)) (r(ep_ba1)) } }

Program sim_power_v2 needs no modifications for this purpose. (Evidently you will need to modify it so that it simulates your new study design, not this older one. But whatever you would have done to sim_power_v2 in the "old regime," do exactly the same here.)

At the end of this code, frame results contains the results of all of the simulation runs: 10,000 in all, indexed by the sample size and the rep number. You can -frame change results- and then do whatever you want to summarize or analyze those results.

Last edited by Clyde Schechter; 13 Jul 2023, 20:10.
Comment
Christopher Rowan

Join Date: Jan 2016

Posts: 15
#3

15 Jul 2023, 13:20

Thank you Clyde! Your suggestion works perfectly. The updated code is below. Please let me know if you have any improvements.

Note: I converted the procedure to one .do file rather than two (inner and outer). To run the simulation code one may paste the code into one .do file and run. Of course the directory needs to be changed.

I will post a reply after I define and implement the VIF.

Chris

************************************************** ******************************
************************************************** ******************************
************************************************** ******************************

clear all

cd "C:\Users"
pwd

version 16.1
clear*

************************************************** ******************************
* define program for power simulations
* exposure prevalence = 5%; outcome incidence = 10%
************************************************** ******************************

capture program drop _all
program define simpower_v3, rclass
version 16.1
set more off

************************************************** ******************************
args ss ycons yba1
************************************************** ******************************

drop _all

* gen # observations
set obs `ss'
gen id = _n

* gen person-years
gen pyears=rpoisson(2.2)
replace pyears=pyears+0.1

* gen exposure var - assuming 5% exposed 95% unexposed
capture drop a1
gen a1 = uniform()
replace a1 = 1 if a1<0.05
replace a1=0 if a1>=0.05 & a1!=1

* gen outcome var - assuming cumulative incidence of ~10%
capture drop proby
gen proby = exp(`ycons' + `yba1'*a1)/(1 + exp(`ycons' + `yba1'*a1))
capture drop random3
gen random3 = uniform()
capture drop y
gen y = 1 if random3<= proby
replace y = 0 if y==.
drop random3
tab a1 y, row

* st set data and run cox model
stset pyears, id(id) failure(y==1)
stcox a1, nohr

* obtain beta coeficient of exposure parameter
matrix beta = get(_b)
scalar ba1 = beta[1,1]
return scalar ba1 = scalar(ba1)

* obtain variance of exposure parameter
matrix v0=(e(V))
scalar var_ba1=v0[1,1]
return scalar var_ba1 = scalar(var_ba1)

* obtain sample size for model
matrix nmat=(e(N))
scalar ss=nmat[1,1]
return scalar ss = scalar(ss)

* gen 95% ci lcl and ucl
capture drop lcl_ba1 ucl_ba1
gen lcl_ba1=scalar(ba1) - (1.96*(sqrt(scalar(var_ba1))))
gen ucl_ba1=scalar(ba1) + (1.96*(sqrt(scalar(var_ba1))))

* gen empiric power based on the 95% CI
gen ep_ba1=0 if lcl_ba1<=0
replace ep_ba1=1 if ep_ba1==.
scalar ep_ba1=ep_ba1
return scalar ep_ba1 = scalar(ep_ba1)

end

************************************************** ******************************
* power simulation for detecting HR=1.25
************************************************** ******************************

set seed 04301971

frame create hr_125 int (N rep) float(ba1 var_ba1 ep_ba1)
foreach n of numlist 500 (500) 10000 {
forvalues i = 1/3000 { // REPS AT EACH SAMPLE SIZE
simpower_v3 `n' -2.25 0.22314355
frame post hr_125 (`n') (`i') (r(ba1)) (r(var_ba1)) (r(ep_ba1))
}
}

cwf hr_125
capture drop dhr
gen dhr=1.25

save "C:\Users\hr_125.dta", replace

************************************************** ******************************
* power simulation for detecting HR=1.5
************************************************** ******************************

set seed 04301971

frame create hr_15 int (N rep) float(ba1 var_ba1 ep_ba1)
foreach n of numlist 500 (500) 10000 {
forvalues i = 1/3000 { // REPS AT EACH SAMPLE SIZE
simpower_v3 `n' -2.25 0.40546511
frame post hr_15 (`n') (`i') (r(ba1)) (r(var_ba1)) (r(ep_ba1))
}
}

cwf hr_15
capture drop dhr
gen dhr=1.5

save "C:\Users\hr_15.dta", replace

************************************************** ******************************
* power simulation for detecting HR=2.0
************************************************** ******************************

set seed 04301971

frame create hr_20 int (N rep) float(ba1 var_ba1 ep_ba1)
foreach n of numlist 500 (500) 10000 {
forvalues i = 1/3000 { // REPS AT EACH SAMPLE SIZE
simpower_v3 `n' -2.25 0.69314718
frame post hr_20 (`n') (`i') (r(ba1)) (r(var_ba1)) (r(ep_ba1))
}
}

cwf hr_20
capture drop dhr
gen dhr=2.0

save "C:\Users\hr_20.dta", replace

************************************************** ******************************
* Combine results from simulations for detecting HR = 1.25, 1.5, and 2.0
************************************************** ******************************

use hr_125, clear
append using hr_15
append using hr_20
format dhr %9.2f

************************************************** ******************************
* Generate the mean power for 3,000 reps by detectable HR
* keep one row per sample size and detectable HR
* drop vars
************************************************** ******************************

capture drop power
egen power = mean(ep_ba1), by(N dhr)

drop if rep!=3000

rename rep reps

drop ba1 var_ba1 ep_ba1

capture drop exp
gen exp=0.05
capture drop y
gen y=0.1

************************************************** ******************************
* plot results
************************************************** ******************************

#delimit ;
graph twoway
line power N if dhr==1.25,
sort lcolor(red)
yline(0.90, lcolor(black) lpattern(dash))
yline(0.80, lcolor(gs8) lpattern(dash))
yscale(range(0.0 1.0))
ylabel(0.0(0.1)1.0, format(%9.1f) labsize(vsmall) angle(0) nogrid)
ytitle(Power)
xscale(range(500 10000))
xlabel(500(500)10000, format() labsize(vsmall) angle(45))
xtitle(Sample Size)
xline(1000, lcolor(gs10) lpattern(dot))
xline(2000, lcolor(gs10) lpattern(dot))
xline(3000, lcolor(gs10) lpattern(dot))
xline(4000, lcolor(gs10) lpattern(dot))
xline(5000, lcolor(gs10) lpattern(dot))
xline(6000, lcolor(gs10) lpattern(dot))
xline(7000, lcolor(gs10) lpattern(dot))
xline(8000, lcolor(gs10) lpattern(dot))
xline(9000, lcolor(gs10) lpattern(dot))
xline(10000, lcolor(gs10) lpattern(dot))
graphregion(margin(4 4 4 4) fcolor(white) lcolor(black) lwidth(vthin))
title("Power Simulation: Exposure = 5%; Outcome = 10%", size(medium) margin(small) box bcolor(white) position(12) justification() )
legend(cols(3) position(6) ring(0) region(lcolor(white) margin(tiny)) size(small) symplacement(left) symysize(2) symxsize(8) forcesize rowgap(1) order(3 2 1) label(3 "dHR=2.0") label(2 "dHR=1.5") label(1 "dHR=1.25") textfirst)
note("Note: dHR = detectable Hazard Ratio", color(black) size(vsmall) span)
||
line power N if dhr==1.5, sort lcolor(gold)
||
line power N if dhr==2.0, sort lcolor(green)
; #delimit cr

graph save "Graph" "C:\Users\Power_simulation_exposure_0.05_Outco me_0 .1_dHR_1.25_1.5_2.0.gph", replace

exit

Power_simulation_exposure_0.05_Outcome_0.1_dHR_1.2 5_1.5_2.0.gph
Attached Files

Power_simulation_exposure_0.05_Outcome_0.1_dHR_1.25_1.5_2.0.gph (15.9 KB, 1 view)

Last edited by Christopher Rowan; 15 Jul 2023, 13:27.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

15 Jul 2023, 14:08

You can make this even simpler by looping over the values of HR, and saving everything in the single frame -results-. Like this:

Code:

frame create results int (N rep) float(hr ba1 var_ba1 ep_ba1) foreach hr of numlist 1.25 1.5 2.0 { foreach n of numlist 500 (500) 10000 { forvalues i = 1/3000 { // REPS AT EACH SAMPLE SIZE simpower_v3 `n' -2.25 `=-log(`hr')' frame post results (`n') (`i') (`hr') (r(ba1)) (r(var_ba1)) (r(ep_ba1)) } } }

No changes needed to simpower_v3 to do this. At the end, frame results contains all of the analyses, and there is a variable, hr, in addition to the others.
Comment

Announcement

Converting power calculation simulation with known sample size into empirical power simulation across an array of sample sizes

Comment

Comment

Comment