Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting power calculation simulation with known sample size into empirical power simulation across an array of sample sizes

    Dear Statalist members,

    I have Stata code I used during grad school to simulate power for my dissertation project. A simplified version of the code is pasted below. I included a simplified version since my objective is to estimate a different treatment effect for my current project. I tested the simplified version of the code today; it works fine. You should be able to copy/paste the code below into two separate .do files.

    The code is divided into two separate .do files (referred to as outer and inner) . The outer .do file runs the inner .do file with the specified arguments. If there's a better/simpler approach, as opposed to using two separate .do files, I would be grateful if you would suggest a more elegant solution.

    That said, my primary request for assistance is to convert the power calculation simulation, with a known sample size from a feasibility assessment, to a power calculation across an array of sample sizes from 1,000-10,000 patients in intervals of 1,000 patients. Ultimately, I intend to run 1,000 reps to estimate power at each sample size and ultimately plot the estimated empirical power across the array of sample sizes.

    Note: I have not included the variance inflation factor I will be using for the final power calculations - since I have not yet figured out how to establish the VIF or design effect for this study using MSM with IPTW and IPCW. I intend to add this later once the method to run the simulation is established.

    Your assistance is greatly appreciated! I hope this post is appropriate and meets community standards. If not, please let me know.

    Chris

    Outer code .do file "simpowerouter_v2"
    clear all
    cd "C:\Users\Chris"
    pwd

    do simpower_v2.do

    capture log close
    log using simpower_v2.log, replace

    set seed 04301971

    #delimit ;
    set more off;
    simul "simpower_v2 10000 -2.5 0.4"

    * ba1 is the beta coefficient of interest
    ba1=r(ba1)
    var_ba1=r(var_ba1)
    ep_ba1 = r(ep_ba1)
    ,
    reps(1000);

    di "simpower_v2";

    summ ep*;
    summ ba1;
    summ var*;

    #delimit cr;

    capture log close
    exit

    Inner code .do file "simpower_v2"
    capture program drop _all
    program define simpower_v2, rclass
    version 16.1
    set more off

    * the statement below calls on the arguments established in the outer .do file
    args ss ycons yba1

    drop _all

    * gen # observations
    set obs `ss'
    gen id = _n

    * gen person-years
    gen pyears=rpoisson(2.2)
    replace pyears=pyears+0.1

    * gen exposure var - assuming 5% exposed 95% unexposed
    capture drop a1
    gen a1 = uniform()
    replace a1 = 1 if a1<0.05
    replace a1=0 if a1>=0.05 & a1!=1

    * gen outcome var - assuming cumulative incidence of 10%
    capture drop proby
    gen proby = exp(`ycons' + `yba1'*a1)/(1 + exp(`ycons' + `yba1'*a1))
    capture drop random3
    gen random3 = uniform()
    capture drop y
    gen y = 1 if random3<= proby
    replace y = 0 if y==.
    drop random3
    tab a1 y, row

    * st set data and run cox model
    stset pyears, id(id) failure(y==1)
    stcox a1, nohr

    * obtain beta coeficient of exposure parameter
    matrix beta = get(_b)
    scalar ba1 = beta[1,1]
    return scalar ba1 = scalar(ba1)

    * obtain variance of exposure parameter
    matrix v0=(e(V))
    scalar var_ba1=v0[1,1]
    return scalar var_ba1 = scalar(var_ba1)

    * gen 95% ci lcl and ucl
    capture drop lcl_ba1 ucl_ba1
    gen lcl_ba1=scalar(ba1) - (1.96*(sqrt(scalar(var_ba1))))
    gen ucl_ba1=scalar(ba1) + (1.96*(sqrt(scalar(var_ba1))))

    * gen empiric power based on the 95% CI
    gen ep_ba1=0 if lcl_ba1<=0
    replace ep_ba1=1 if ep_ba1==.
    scalar ep_ba1=ep_ba1
    return scalar ep_ba1 = scalar(ep_ba1)

    end
    exit
    Last edited by Christopher Rowan; 13 Jul 2023, 17:59.

  • #2
    I would do this by creating a frame to hold all the simulation outputs and then just loop over sample sizes and reps to fill it in. This code is untested as there is no example data, but if it has errors, it at least points the way and should be easy to fix.

    Code:
    set seed 1234 // OR WHATEVER
    
    frame create results int (sample_size rep) float(ba1 var_ba1 ep_ba1)
    foreach n of numlist 1000 (1000) 10000 {
        forvalues i = 1/1000 { // REPS AT EACH SAMPLE SIZE
            simpower_v2 `n' -2.5 0.4
            frame post results (`n') (`i') (r(ba1)) (r(var_ba1)) (r(ep_ba1))
        }
    }
    Program sim_power_v2 needs no modifications for this purpose. (Evidently you will need to modify it so that it simulates your new study design, not this older one. But whatever you would have done to sim_power_v2 in the "old regime," do exactly the same here.)

    At the end of this code, frame results contains the results of all of the simulation runs: 10,000 in all, indexed by the sample size and the rep number. You can -frame change results- and then do whatever you want to summarize or analyze those results.
    Last edited by Clyde Schechter; 13 Jul 2023, 20:10.

    Comment


    • #3
      Thank you Clyde! Your suggestion works perfectly. The updated code is below. Please let me know if you have any improvements.

      Note: I converted the procedure to one .do file rather than two (inner and outer). To run the simulation code one may paste the code into one .do file and run. Of course the directory needs to be changed.

      I will post a reply after I define and implement the VIF.

      Chris

      ************************************************** ******************************
      ************************************************** ******************************
      ************************************************** ******************************

      clear all

      cd "C:\Users"
      pwd

      version 16.1
      clear*

      ************************************************** ******************************
      * define program for power simulations
      * exposure prevalence = 5%; outcome incidence = 10%
      ************************************************** ******************************

      capture program drop _all
      program define simpower_v3, rclass
      version 16.1
      set more off

      ************************************************** ******************************
      args ss ycons yba1
      ************************************************** ******************************

      drop _all

      * gen # observations
      set obs `ss'
      gen id = _n

      * gen person-years
      gen pyears=rpoisson(2.2)
      replace pyears=pyears+0.1

      * gen exposure var - assuming 5% exposed 95% unexposed
      capture drop a1
      gen a1 = uniform()
      replace a1 = 1 if a1<0.05
      replace a1=0 if a1>=0.05 & a1!=1

      * gen outcome var - assuming cumulative incidence of ~10%
      capture drop proby
      gen proby = exp(`ycons' + `yba1'*a1)/(1 + exp(`ycons' + `yba1'*a1))
      capture drop random3
      gen random3 = uniform()
      capture drop y
      gen y = 1 if random3<= proby
      replace y = 0 if y==.
      drop random3
      tab a1 y, row

      * st set data and run cox model
      stset pyears, id(id) failure(y==1)
      stcox a1, nohr

      * obtain beta coeficient of exposure parameter
      matrix beta = get(_b)
      scalar ba1 = beta[1,1]
      return scalar ba1 = scalar(ba1)

      * obtain variance of exposure parameter
      matrix v0=(e(V))
      scalar var_ba1=v0[1,1]
      return scalar var_ba1 = scalar(var_ba1)

      * obtain sample size for model
      matrix nmat=(e(N))
      scalar ss=nmat[1,1]
      return scalar ss = scalar(ss)

      * gen 95% ci lcl and ucl
      capture drop lcl_ba1 ucl_ba1
      gen lcl_ba1=scalar(ba1) - (1.96*(sqrt(scalar(var_ba1))))
      gen ucl_ba1=scalar(ba1) + (1.96*(sqrt(scalar(var_ba1))))

      * gen empiric power based on the 95% CI
      gen ep_ba1=0 if lcl_ba1<=0
      replace ep_ba1=1 if ep_ba1==.
      scalar ep_ba1=ep_ba1
      return scalar ep_ba1 = scalar(ep_ba1)

      end

      ************************************************** ******************************
      * power simulation for detecting HR=1.25
      ************************************************** ******************************

      set seed 04301971

      frame create hr_125 int (N rep) float(ba1 var_ba1 ep_ba1)
      foreach n of numlist 500 (500) 10000 {
      forvalues i = 1/3000 { // REPS AT EACH SAMPLE SIZE
      simpower_v3 `n' -2.25 0.22314355
      frame post hr_125 (`n') (`i') (r(ba1)) (r(var_ba1)) (r(ep_ba1))
      }
      }

      cwf hr_125
      capture drop dhr
      gen dhr=1.25

      save "C:\Users\hr_125.dta", replace

      ************************************************** ******************************
      * power simulation for detecting HR=1.5
      ************************************************** ******************************

      set seed 04301971

      frame create hr_15 int (N rep) float(ba1 var_ba1 ep_ba1)
      foreach n of numlist 500 (500) 10000 {
      forvalues i = 1/3000 { // REPS AT EACH SAMPLE SIZE
      simpower_v3 `n' -2.25 0.40546511
      frame post hr_15 (`n') (`i') (r(ba1)) (r(var_ba1)) (r(ep_ba1))
      }
      }

      cwf hr_15
      capture drop dhr
      gen dhr=1.5

      save "C:\Users\hr_15.dta", replace

      ************************************************** ******************************
      * power simulation for detecting HR=2.0
      ************************************************** ******************************

      set seed 04301971

      frame create hr_20 int (N rep) float(ba1 var_ba1 ep_ba1)
      foreach n of numlist 500 (500) 10000 {
      forvalues i = 1/3000 { // REPS AT EACH SAMPLE SIZE
      simpower_v3 `n' -2.25 0.69314718
      frame post hr_20 (`n') (`i') (r(ba1)) (r(var_ba1)) (r(ep_ba1))
      }
      }

      cwf hr_20
      capture drop dhr
      gen dhr=2.0

      save "C:\Users\hr_20.dta", replace

      ************************************************** ******************************
      * Combine results from simulations for detecting HR = 1.25, 1.5, and 2.0
      ************************************************** ******************************

      use hr_125, clear
      append using hr_15
      append using hr_20
      format dhr %9.2f

      ************************************************** ******************************
      * Generate the mean power for 3,000 reps by detectable HR
      * keep one row per sample size and detectable HR
      * drop vars
      ************************************************** ******************************

      capture drop power
      egen power = mean(ep_ba1), by(N dhr)

      drop if rep!=3000

      rename rep reps

      drop ba1 var_ba1 ep_ba1

      capture drop exp
      gen exp=0.05
      capture drop y
      gen y=0.1

      ************************************************** ******************************
      * plot results
      ************************************************** ******************************

      #delimit ;
      graph twoway
      line power N if dhr==1.25,
      sort lcolor(red)
      yline(0.90, lcolor(black) lpattern(dash))
      yline(0.80, lcolor(gs8) lpattern(dash))
      yscale(range(0.0 1.0))
      ylabel(0.0(0.1)1.0, format(%9.1f) labsize(vsmall) angle(0) nogrid)
      ytitle(Power)
      xscale(range(500 10000))
      xlabel(500(500)10000, format() labsize(vsmall) angle(45))
      xtitle(Sample Size)
      xline(1000, lcolor(gs10) lpattern(dot))
      xline(2000, lcolor(gs10) lpattern(dot))
      xline(3000, lcolor(gs10) lpattern(dot))
      xline(4000, lcolor(gs10) lpattern(dot))
      xline(5000, lcolor(gs10) lpattern(dot))
      xline(6000, lcolor(gs10) lpattern(dot))
      xline(7000, lcolor(gs10) lpattern(dot))
      xline(8000, lcolor(gs10) lpattern(dot))
      xline(9000, lcolor(gs10) lpattern(dot))
      xline(10000, lcolor(gs10) lpattern(dot))
      graphregion(margin(4 4 4 4) fcolor(white) lcolor(black) lwidth(vthin))
      title("Power Simulation: Exposure = 5%; Outcome = 10%", size(medium) margin(small) box bcolor(white) position(12) justification() )
      legend(cols(3) position(6) ring(0) region(lcolor(white) margin(tiny)) size(small) symplacement(left) symysize(2) symxsize(8) forcesize rowgap(1) order(3 2 1) label(3 "dHR=2.0") label(2 "dHR=1.5") label(1 "dHR=1.25") textfirst)
      note("Note: dHR = detectable Hazard Ratio", color(black) size(vsmall) span)
      ||
      line power N if dhr==1.5, sort lcolor(gold)
      ||
      line power N if dhr==2.0, sort lcolor(green)
      ; #delimit cr

      graph save "Graph" "C:\Users\Power_simulation_exposure_0.05_Outco me_0 .1_dHR_1.25_1.5_2.0.gph", replace


      exit


      Power_simulation_exposure_0.05_Outcome_0.1_dHR_1.2 5_1.5_2.0.gph
      Attached Files
      Last edited by Christopher Rowan; 15 Jul 2023, 13:27.

      Comment


      • #4
        You can make this even simpler by looping over the values of HR, and saving everything in the single frame -results-. Like this:
        Code:
        frame create results int (N rep) float(hr ba1 var_ba1 ep_ba1)
        foreach hr of numlist 1.25 1.5 2.0 {
            foreach n of numlist 500 (500) 10000 {
                forvalues i = 1/3000 { // REPS AT EACH SAMPLE SIZE
                simpower_v3 `n' -2.25 `=-log(`hr')'
                frame post results (`n') (`i') (`hr') (r(ba1)) (r(var_ba1)) (r(ep_ba1))
                }
            }
        }
        

        No changes needed to simpower_v3 to do this. At the end, frame results contains all of the analyses, and there is a variable, hr, in addition to the others.

        Comment

        Working...
        X