Power calculation for regression using simulation

Bryony Simmons

Join Date: Jan 2018

Posts: 37
#1

Power calculation for regression using simulation

14 Aug 2018, 01:02

Hi all,

I have been writing code for a power calculation based on the advice here & I have reached a bit of a standstill. I'm not sure I'm running it correctly & would really appreciate a sense check/new eyes on the code I have written.

The problem is as follows. An RCT is being planned to compare standard-of-care (DrugA) versus a new drug (DrugB) for the treatment of a disease to assess whether the DrugB causes unwanted weight gain. Weight will be measured at baseline and months 6 and 12. The primary endpoint is the treatment effect of the DrugB at month 12. This will be analysed using the following regression model:

Y_t - Y_t0 = b₀+ b₁X + b₂Y_t0

Where Y_t is the weight change from baseline, X = treatment group, b₁ = treatment effect and Y_t0is the baseline weight. In a previous exploratory study, mean baseline weight was 80.6 (sd 11). The newer drug gained on average 3.5kg over 12 months, and the SOC gained 0.6kg (meaning a difference of 1.3kg). The SD for change in weight was 1.7 for both groups.

Using the example code from the other thread as a basis, I have generated the following code to simulate & test various populations. I'm not convinced I have done this correctly - particularly the generation of the post variable & the error variables.

Code:

version 14.2 clear * set seed 123 program define simem // , rclass version 14.2 syntax , [n(integer 80) /// pre(real 80.6) presd(real 11) /// DElta(real 0) deltasd(real 1.7) /// Residual(real 1)] // Add input validation to test drop _all set obs `n' /* set number of observations */ generate int id = _n /* create an id variable */ egen trtgrp = cut(id), group(2) /* generate two treatment groups */ generate u = rnormal() /* random intercept for each individual */ generate e = rnormal(0,`residual') /* residual errors */ generate pre = rnormal(`pre', `presd') /* generate baseline weight variable */ summarize pre, meanonly generate c_pre = pre - r(mean) /* center baseline weight */ generate post = u + trtgrp*rnormal(`delta',`deltasd') + e /* generate post weight variable, accounting for variation in delta */ // Test to run reg post i.trtgrp c_pre test 1.trtgrp end postfile sim delta power lb ub using regression_powersim, replace foreach delta of numlist 1.3 1.7 2.1 2.5 2.9 { simulate p = r(p), reps(5) nolegend nodots: simem , de(`delta') generate byte pos = p < 0.05 qui ci means pos post sim (`delta') (r(mean)) (r(lb)) (r(ub)) display in smcl as text "Delta = " %4.2f `delta' " Power = " %4.2f r(mean) " Lower 95%CI = " %4.2f r(lb) " Upper 95%CI = " %4.2f r(ub) } postclose sim exit

Secondly, if the primary outcome was the change over time, I would need to include the other timepoints. I would need to generate a longitudinal panel & estimate the following regression model:

Y_t - Y_t0 = b₀ + b₁X + b₂Y_t0 + b₃time + b₄X*time

Where time is binary for month 6 and 12 (0/1). The treatment effect at month 12 is interpretable as b₁ + b₄. I will run this using either xtreg or the mixed command.

When constructing my inputs, I can expand the data such that each individual has two data points as follows:

Code:

expand 2 bysort id: generate byte time = _n-1

But I am unsure how to calculate the outcome variable that is correlated for each individual.

I really appreciate any help anyone can provide on this topic.

Best wishes,
Bryony
Tags: panel, Power, regression, residuals, simulation
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#2

14 Aug 2018, 19:52

I haven't looked at your code, but I have a couple of questions.

1. If Y_t is "the weight change from baseline" and Y_t0 is the baseline weight, then what is Y_t - Y_t0, which is what you're ostensibly modeling according to your equation? Is that a typo in your equation, or is Y_t the weight at time t and not the weight change from baseline? Moreover, if it's the weight and not weight change, then it would seem to me that you would be better off with a straightforward ANCOVA and not some kind of hybrid of an ANCOVA with a change-score model. Google change score harrell if you need more on this.

2. If you're having trouble believing your simulation results, then can't you get power and sample size for a conventional ANCOVA design from official Stata commands? You can use the latter as a sanity check for results that you're getting from the former. (I think that Stata's power & sample size command used to be called -sampsi- or something similar. It might still work under that name, or if not, then Stata's pop-up message will steer you to the command's current incarnation.)
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

15 Aug 2018, 19:56

I think that there is some literature about when it is advantageous to model the data as ANCOVA and when to use repeated-measures ANOVA. It has to do with the intraclass correlation's being greater or less than 0.5, or something as I recall. The Google search that I mentioned might turn something up about that.

Toward that end, you might want to explore the two methods of modeling the study's results for which might tend toward greater efficiency in testing your hypothesis. Something like this might be a good starting point.

Code:

version 15.1

clear *
set seed `=strreverse("1458042")'

program define simem, rclass
    version 15.1
    syntax , [n(integer 80) delta1(real 3.5) delta0(real 0.6)]

    drop _all
    quietly set obs `n'

    generate byte trt = mod(_n, 2)
    generate double y0 = rnormal(80.6, 11)

    generate double y12 = y0 + ///
        cond(trt, rnormal(`delta1', 1.7), rnormal(`delta0', 1.7))

    // ANCOVA
    anova y12 trt c.y0
    testparm i.trt
    tempname ancova
    scalar define `ancova' = r(p)

    // Repeated-measures ANOVA
    generate int pid = _n
    quietly reshape long y, i(pid) j(tim)
    anova y trt / pid|trt tim trt#tim
    testparm i.trt#i.tim
    return scalar rmanova = r(p)
    return scalar ancova = `ancova'
end

forvalues n = 20(-2)12 {
    quietly simulate ancova = r(ancova) rmanova = r(rmanova), ///
        reps(1000) nodots: simem , n(`n')

    display in smcl as text _newline(1) "n = `n'"

    foreach var of varlist *a {
        generate byte p_`var' = `var' < 0.05
        summarize p_`var', meanonly
        display in smcl as text strupper("`var'"), "Power = " %04.2f r(mean)
    }

}

exit

You could readily extend it to include intermediate follow-up observations for secondary analysis, although at that point, you'd be leaving ANCOVA behind.

Announcement

Power calculation for regression using simulation

Comment

Comment