Hi all,
I have been writing code for a power calculation based on the advice here & I have reached a bit of a standstill. I'm not sure I'm running it correctly & would really appreciate a sense check/new eyes on the code I have written.
The problem is as follows. An RCT is being planned to compare standard-of-care (DrugA) versus a new drug (DrugB) for the treatment of a disease to assess whether the DrugB causes unwanted weight gain. Weight will be measured at baseline and months 6 and 12. The primary endpoint is the treatment effect of the DrugB at month 12. This will be analysed using the following regression model:
Yt - Yt0 = b0 + b1X + b2 Yt0
Where Yt is the weight change from baseline, X = treatment group, b1 = treatment effect and Yt0 is the baseline weight. In a previous exploratory study, mean baseline weight was 80.6 (sd 11). The newer drug gained on average 3.5kg over 12 months, and the SOC gained 0.6kg (meaning a difference of 1.3kg). The SD for change in weight was 1.7 for both groups.
Using the example code from the other thread as a basis, I have generated the following code to simulate & test various populations. I'm not convinced I have done this correctly - particularly the generation of the post variable & the error variables.
Secondly, if the primary outcome was the change over time, I would need to include the other timepoints. I would need to generate a longitudinal panel & estimate the following regression model:
Yt - Yt0 = b0 + b1X + b2Yt0 + b3time + b4X*time
Where time is binary for month 6 and 12 (0/1). The treatment effect at month 12 is interpretable as b1 + b4. I will run this using either xtreg or the mixed command.
When constructing my inputs, I can expand the data such that each individual has two data points as follows:
But I am unsure how to calculate the outcome variable that is correlated for each individual.
I really appreciate any help anyone can provide on this topic.
Best wishes,
Bryony
I have been writing code for a power calculation based on the advice here & I have reached a bit of a standstill. I'm not sure I'm running it correctly & would really appreciate a sense check/new eyes on the code I have written.
The problem is as follows. An RCT is being planned to compare standard-of-care (DrugA) versus a new drug (DrugB) for the treatment of a disease to assess whether the DrugB causes unwanted weight gain. Weight will be measured at baseline and months 6 and 12. The primary endpoint is the treatment effect of the DrugB at month 12. This will be analysed using the following regression model:
Yt - Yt0 = b0 + b1X + b2 Yt0
Where Yt is the weight change from baseline, X = treatment group, b1 = treatment effect and Yt0 is the baseline weight. In a previous exploratory study, mean baseline weight was 80.6 (sd 11). The newer drug gained on average 3.5kg over 12 months, and the SOC gained 0.6kg (meaning a difference of 1.3kg). The SD for change in weight was 1.7 for both groups.
Using the example code from the other thread as a basis, I have generated the following code to simulate & test various populations. I'm not convinced I have done this correctly - particularly the generation of the post variable & the error variables.
Code:
version 14.2
clear *
set seed 123
program define simem // , rclass
version 14.2
syntax , [n(integer 80) ///
pre(real 80.6) presd(real 11) ///
DElta(real 0) deltasd(real 1.7) ///
Residual(real 1)]
// Add input validation to test
drop _all
set obs `n' /* set number of observations */
generate int id = _n /* create an id variable */
egen trtgrp = cut(id), group(2) /* generate two treatment groups */
generate u = rnormal() /* random intercept for each individual */
generate e = rnormal(0,`residual') /* residual errors */
generate pre = rnormal(`pre', `presd') /* generate baseline weight variable */
summarize pre, meanonly
generate c_pre = pre - r(mean) /* center baseline weight */
generate post = u + trtgrp*rnormal(`delta',`deltasd') + e /* generate post weight variable, accounting for variation in delta */
// Test to run
reg post i.trtgrp c_pre
test 1.trtgrp
end
postfile sim delta power lb ub using regression_powersim, replace
foreach delta of numlist 1.3 1.7 2.1 2.5 2.9 {
simulate p = r(p), reps(5) nolegend nodots: simem , de(`delta')
generate byte pos = p < 0.05
qui ci means pos
post sim (`delta') (r(mean)) (r(lb)) (r(ub))
display in smcl as text "Delta = " %4.2f `delta' " Power = " %4.2f r(mean) " Lower 95%CI = " %4.2f r(lb) " Upper 95%CI = " %4.2f r(ub)
}
postclose sim
exit
Secondly, if the primary outcome was the change over time, I would need to include the other timepoints. I would need to generate a longitudinal panel & estimate the following regression model:
Yt - Yt0 = b0 + b1X + b2Yt0 + b3time + b4X*time
Where time is binary for month 6 and 12 (0/1). The treatment effect at month 12 is interpretable as b1 + b4. I will run this using either xtreg or the mixed command.
When constructing my inputs, I can expand the data such that each individual has two data points as follows:
Code:
expand 2 bysort id: generate byte time = _n-1
I really appreciate any help anyone can provide on this topic.
Best wishes,
Bryony

Comment