simulation based power analysis for linear mixed model/multilevel model

CEdward

Join Date: Nov 2014

Posts: 131
#16

04 Oct 2019, 12:32

Hi Dr. Schechter,

I have taken a look at a previous post of yours where you simulated multi-level data and looped over combinations of parameter values as follows:

*Generate multi-level data
capture program drop swcrt
program define swcrt, rclass
version 15.1
preserve
clear
*Stepped-wedge specifications
local num_clus 3 6 9 18 36
local ICC_lvl3
local ICC_lvl2
local clussize 25
*Model specifications
local intercept
local timecoeff
local intcoeff
local sigma_u3
local sigma_u2
local sigma_error

local nrep 100
local alpha 0.05

args num_clus ICC_lvl3 ICC_lvl2 clussize intercept timecoeff intrvcoeff
assert `num_clus' > 0 `ICC_lvl3' > 0 `ICC_lvl2' > 0 `clussize' > 0 `intercept' > 0 `timecoeff' > 0 `intrvcoeff' > 0 `u_3' > 0 `u_2' > 0 `error' > 0
/*Generate simulated multi—level data*/
qui
set seed 12345
clear
set obs `num_clus'
qui gen cluster = _n
//Generate cluster-level errors //
qui gen u_3 = rnormal(0,`sigma_u3')
expand `clussize'
bysort cluster: gen individual = _n
//Generate patient-level errors //
qui gen u_2 = rnormal(0,`sigma_u2')
expand 6
//Set up time//
bysort cluster individual: gen time = _n-1
//Set up intervention variable//
gen intrv = (time>=cluster)
//Generate residual errors
qui gen error = rnormal(0,`sigma_error')
//Generate outcome y
qui gen y = `intercept' + `timecoeff'*time + `intrvcoeff'*intrv + u_3 + u_2 + error

/*Fit multi-level model to simulated dataset*/
mixed y intrv i.time ||cluster: ||individual:, covariance(unstructured) reml dfmethod(kroger)
/*Return estimated effect size, p-value, and significance dichotomy*/
tempname M
matrix `M' = r(table)
return scalar p = `M'r(p)
return scalar p_= `M'(`r(p)' < 0.05)
exit
end swcrt

*Postfile to store results
tempfile powerresults
capture postutil clear
postfile step int num_clus intrv ICC using `results'
*Loop over number of clusters, ICC
foreach x of local num_clus{
display as text "Number of clusters" as result "`c'"
foreach x of local ICC{
display as text "Intra-class correlation" as result `i'
forvalue i = 1/`nrep'{
display as text "Iterations" as result `nrep'
quietly swcrt `c' `ICC' `intrvcoeff' `u_3' `u_2' `error'
post step (`c') (`ICC') (`r(p)') (`r(p_)')
}
}
}
postclose step

*Open results, calculate power
use `powerresults', clear

levelsof num_clus, local(num_clus)
levelsof ICC, local(ICC)
matrix drop _all
*Loop over combinations of clusters and ICC
*Add power results to matrix
foreach x of local num_clus {
foreach d of local ICC {
quietly ci proportions sig if arm_size == `a' & float(delta) == float(`d') *Fix*
matrix M = nullmat(M) \ (`a', `d', `r(proportion)', `r(lb)', `r(ub)') *Fix*
}
}

*Display the matrix
matrix colnames M = arm_size delta power power_lb power_ub

matrix list M, noheader format(%3.2f)

I was wondering if you can perhaps explain to me what these lines mean:

quietly ci proportions sig if arm_size == `a' & float(delta) == float(`d') *Fix*
matrix M = nullmat(M) \ (`a', `d', `r(proportion)', `r(lb)', `r(ub)') *Fix*

What do they accomplish? How would I know what type of numeric variable (e.g. float) I should specify a variable as?

More importantly, considering that the type 1 error rate (defined as the proportion of simulations in which the p-value for the intervention effect was < 0.05 when the true treatment effect delta = 0), how would I amend the above code to determine that for each combination of num_clus and ICC? How would I do the same to calculate bias?

Your help is much appreciated.

Many thanks,
Jack
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#17

04 Oct 2019, 16:09

I was wondering if you can perhaps explain to me what these lines mean:

quietly ci proportions sig if arm_size == `a' & float(delta) == float(`d') *Fix*
matrix M = nullmat(M) \ (`a', `d', `r(proportion)', `r(lb)', `r(ub)') *Fix*

What do they accomplish? How would I know what type of numeric variable (e.g. float) I should specify a variable as?

This code is taken from an earlier post in this thread. I see that you have made some modifications to the original--and some of those modifications are going to throw syntax errors. (In particular, the -return scalar- commands at the end of program swcrt are mangled.)

But to answer your question:

The -ci- command calculates means and confidence intervals. The variable sig is a 0/1 variable that indicates whether that particular simulation yielded a "statistically significant" result for the coefficient of interest in the model. So the mean of that is the overall frequency with which a statistically significant result is obtained. Consequently, it represents the power of the study when implemented with the corresponding values of arm_size and delta.The reference to float() in the -if- condition is a precaution taken because conditioning anything on the exact equality of floating point numbers is hazardous due to problems of precision and rounding. So float(delta) == float(`d') is a weaker requirement: delta and `d' only have to agree to float precision in order to trigger inclusion in the calculations. The -matrix M = nullmat(M)\...- command simply adds the results of the -ci- calculations to a growing matrix of results. Each row of the matrix contains the arm size, the effect size (`d'), the power, and the confidence limits around that.

I'm not sure why I stored the results in a matrix. I usually don't do that. In fact, I usually advise others not to do it. Usually I do this kind of thing by building up a postfile of results. I imagine I had some specific reason for not doing that in this instance, but I cannot recall what it might be. Or maybe I just wasn't thinking clearly that day.

More importantly, considering that the type 1 error rate (defined as the proportion of simulations in which the p-value for the intervention effect was < 0.05 when the true treatment effect delta = 0), how would I amend the above code to determine that for each combination of num_clus and ICC?

I don't understand what you're asking. By using p < 0.05 to define the sig variable in program swcrt, we are setting the Type I error ate as 0.05: there is no type 1 error rate result being simulated. The simulation simulates the power of the study (100% minus the Type 2 error rate). And that is already being recorded in that result matrix M. In order to include the number of clusters and the ICC, change the -matrix M = ...- statement to

Code:

quietly ci proportions sig if num_clus == `x' & float(ICC) == float(`d') matrix M = nullmat(M) \ (`x', `d', `r(proportion)', `r(lb)', `r(ub)') matrix colnames M = num_clus ICC power power_lb power_ub

How would I do the same to calculate bias?

So in program swcrt, you would have to add code to calculate the bias and post it to the results file. Then at the end you would have to apply -ci- to the bias statistic as well and add that to the matrix of results. Assuming that the bias you are interested in is with respect to the coefficient of intrv, it would look something like this:

Code:

*Generate multi-level data capture program drop swcrt program define swcrt, rclass //... MANY LINES OMITTED HERE args num_clus ICC_lvl3 ICC_lvl2 clussize intercept timecoeff intrvcoeff assert `num_clus' > 0 `ICC_lvl3' > 0 `ICC_lvl2' > 0 `clussize' > 0 `intercept' > 0 `timecoeff' > 0 `intrvcoeff' > 0 `u_3' > 0 `u_2' > 0 `error' > 0 // MASSIVE SYNTAX ERROR -- FIX THIS /*Generate simulated multi—level data*/ //... MANY LINES OMITTED HERE qui gen y = `intercept' + `timecoeff'*time + `intrvcoeff'*intrv + u_3 + u_2 + error /*Fit multi-level model to simulated dataset*/ mixed y intrv i.time ||cluster: ||individual:, covariance(unstructured) reml dfmethod(kroger) /*Return estimated effect size, p-value, and significance dichotomy*/ tempname M matrix `M' = r(table) return scalar bias = _b[intrv] - `intrvcoeff' return scalar p = `M'r(p) // SYNTAX ERROR -- FIX THIS return scalar p_= `M'(`r(p)' < 0.05) // SYNTAX ERROR -- FIX THIS exit end swcrt *Postfile to store results tempfile powerresults capture postutil clear postfile step int num_clus intrv ICC bias using `results' // I THINK YOU MEAN `powerresults' *Loop over number of clusters, ICC foreach x of local num_clus{ display as text "Number of clusters" as result "`c'" // DON'T YOU MEAN `x', NOT `c' foreach x of local ICC{ // CAN'T RE-USE x HERE. PICK A DIFFERENT NAME display as text "Intra-class correlation" as result `i' forvalue i = 1/`nrep'{ display as text "Iterations" as result `nrep' quietly swcrt `c' `ICC' `intrvcoeff' `u_3' `u_2' `error' // CHECK CORRECT MACRO NAMES HERE post step (`c') (`ICC') (`r(p)') (`r(p_)') (`r(bias)') // CHECK CORRECT MACRO NAMES HERE } } } postclose step

and then at the end

Code:

*Open results, calculate power use `powerresults', clear levelsof num_clus, local(num_clus) levelsof ICC, local(ICC) matrix drop _all *Loop over combinations of clusters and ICC *Add power results to matrix foreach x of local num_clus { foreach d of local ICC { quietly ci proportions sig if clus_num == `x' & float(ICC) == float(`d') *Fix* local power `r(proportion)' local power_lb `r(lb)' local power_ub `r(ub)' quietly ci mean bias if clus_num == `x' & float(ICC) == float(`d') matrix M = nullmat(M) \ (`x', `d', `power', `power_lb', `power_ub', /// `r(mean)', `r(lb)', `r(ub)', ) *Fix* } } *Display the matrix matrix colnames M = arm_size delta power power_lb power_ub bias bias_lb bias_ub matrix list M, noheader format(%3.2f)

The key changes are in bold face. I've also indicated a few places where I see syntax errors that you will need to fix. And, as I have not tested this, I may have added some errors on my own, though I have tried not to.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#18

13 Oct 2019, 09:42

I have attached a paper that I am using as a reference.

I suppose that when I am trying to simulate type 1 error, then you would not set a pre-defined alpha (e.g. 0.05), but you would specify a null intervention effect estimate (e.g. OR = 1 in the Barker study). Your type 1 error rate would then be influenced by the sample size?

However, when you are simulating power, you would set the alpha (e.g. a = 0.05)? The required sample size would then be that which achieves a pre-specified power (e.g. 80%).

Would that be correct?
Attached Files

Barker et al. 2017 (minimum number clusters).pdf (1.01 MB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#19

13 Oct 2019, 09:50

I suppose that when I am trying to simulate type 1 error, then you would not set a pre-defined alpha (e.g. 0.05), but you would specify a null intervention effect estimate (e.g. OR = 1 in the Barker study). Your type 1 error rate would then be influenced by the sample size?

Right on both counts.

However, when you are simulating power, you would set the alpha (e.g. a = 0.05)? The required sample size would then be that which achieves a pre-specified power (e.g. 80%).

Well, what you are simulating, in either case, is the study itself. To do sample size estimation, you simulate the study using a sample size that you think is approximately what you need to obtain the desired alpha and power and you specifically set the effect size to the null value (diff = 0, or OR = 1 as the case may be). You apply a test of size alpha to the simulated outputs and observe how many of them reject the null (turn out "significant") and the proportion of results that do is the power of the test. Your initial guess at the sample size is likely to be somewhat off the mark, so then you re-do the simulation adjusting the sample size up or down until you get a simulated power that is close enough for practical purposes to your desired pre-specified power.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#20

13 Oct 2019, 14:53

In that case, what value would I look at to determine what the type 1 error rate is for a sample size of (e.g. 3 clusters)?
Is there any sample code out there that I could emulate for my specific case? I haven't seen any examples. I suppose the difficulty is that my program is defined with an intervention effect size of X so I wouldn't be able to set it equal to 0 as that would affect my analysis of power and bias.
Would the appropriate sample size then be determined as a function of power, type 1 error, bias, or all 3? e.g. What if it meets the power requirement of 80%, but also has a high type 1 error?

Last edited by CEdward; 13 Oct 2019, 14:56.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#21

13 Oct 2019, 15:27

I suppose the difficulty is that my program is defined with an intervention effect size of X so I wouldn't be able to set it equal to 0 as that would affect my analysis of power and bias.

This doesn't make any sense. It is impossible to design an intervention's effect size. The effect size is only meaningful once the intervention has been evaluated in your study. The closest you can come to this is to study previous evaluations of fairly similar interventions in similar contexts and hope that your effect size will be similar to what was found there.

In most of these situations, actually, the sample size calculations are carried out by identifying what would be considered the minimum meaningful effect size. That is, you pick some number that is large enough that if the effect size is that large you would want your evaluation to have a high probability of detecting it, but small enough that missing smaller effect sizes would not be particularly problematic.

The definition of alpha is the probability of rejecting the null hypothesis when it is true. To estimate alpha by simulation, you must have already settled on a planned analysis, including a test statistic and a threshold for that statistic that will be used to determine when the null hypothesis is rejected. You then simulate all of that with the effect size set to zero. That's mandatory: since alpha is defined as something that is conditional on the null hypothesis being true, any simulation of alpha has to have the null hypothesis true in the simulation--which is just another way of saying that the actual effect size is zero. So you have to do the simulation with the effect size set to zero to estimate alpha from a simulation.

The definition of power is, similarly the probability of rejecting the null hypothesis when the alternate hypothesis is true. So for these simulations you simulate the effect size equal to the value that you consider important to detect (see my first paragraph). Then you count how often the simulated study rejects the null hypothesis.

Would the appropriate sample size then be determined as a function of power, type 1 error, bias, or all 3?

Yes, all three, and also a fourth: effect size. In my experience, we don't usually figure bias into these calculations, because we try to design the study to reduce bias to zero. Of course we don't always achieve that, but also we usually don't have any good way of estimating the bias that we haven't designed out. So normally we think of sample size as being a function of power and type 1 error; but if we have some idea of bias, then that, too, enters the formula.

What if it meets the power requirement of 80%, but also has a high type 1 error?

For any combination of power, type 1 error, and effect size, it is always possible to find a sample size that achieves that combination. It may turn out to be an infeasibly large sample size, but in principle it exists. In any case, the point is that you make tradeoffs among these. We desire low type 1 error, high power for a reasonable effect size, and a feasibly small sample size. Sometimes we can't have all three, and then we have to make a decision to struggle for a higher sample size to preserve alpha and power, or to sacrifice power for what we consider an important sample size so as to maintain alpha and keep the sample size down, or to worry less about alpha so long as we have adequate power for a reasonable effect size and a feasible sample size. These are value judgments that will vary from study to study depending on the circumstances. By circumstances I mean, specifically, the disutility of the consequences of type I and type II errors and how those compare to the costs and disutility of increasing the sample size.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#22

13 Oct 2019, 15:48

My apologies - I meant to say since I have an intervention coefficient set as not equal to zero, how would I amend the program to set it equal to zero for evaluating type 1 error?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#23

13 Oct 2019, 15:57

I have an intervention coefficient set as not equal to zero, how would I amend the program to set it equal to zero for evaluating type 1 error

Well, that depends on how it is handled in your simulation code. If it's stored in a local macro, just revise the statement that defines that local macro to set it to zero. -local whatever_you_called_it 0-. Do the analgous if it's a global macro. Harder is if it is "hard wired" somewhere in the code, say, appearing as a "magic number" in some expressions. Then you have to go through the code with a search and replace operation to change all occurrences of that magic number to zero.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#24

13 Oct 2019, 16:04

Right - of course, that was a rather pedestrian question and what you said about the local macro makes complete sense. Really what I am intending to ask is how would you figure out what the type 1 error rate actually is for a given sample size. In the case of power, I did a count of the proportion of simulations that came up statistically significant given an alpha of 0.05 (e.g. return scalar p_= `M'(`r(p)' < 0.05)). But, I don't think I would be doing the same here since I don't know what that alpha is.

I think I understand now...

I would essentially repeat the same code above and keep an alpha of 0.05 and intrvcoeff = 0. I would then determine the proportion of simulations for which I achieved a statistically significant result (i.e. p < 0.05) for each sample size that I am interested in.

Last edited by CEdward; 13 Oct 2019, 16:14.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#25

13 Oct 2019, 17:28

I would essentially repeat the same code above and keep an alpha of 0.05 and intrvcoeff = 0. I would then determine the proportion of simulations for which I achieved a statistically significant result (i.e. p < 0.05) for each sample size that I am interested in.

If there is a bias term in your simulation model, then, yes, this is what you would do.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#26

14 Oct 2019, 08:03

It would seem that I am still confused then. What is clear is that intrvcoeff needs to be set to 0. However, if I am not setting the type 1 error beforehand (e.g. a = 0.05), then what test statistic would I be using to determine what the type 1 error rate is given some sample size? Wouldn't it be actually correct to say that the type 1 error is the proportion of simulations in which the p-value was less than 0.05 given intrvcoeff = 0?

This sentence is what I don't understand: "To estimate alpha by simulation, you must have already settled on a planned analysis, including a test statistic and a threshold for that statistic that will be used to determine when the null hypothesis is rejected."
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#27

14 Oct 2019, 15:07

If your proposed study incorporates no biases and the analysis plan relies on conventional test statistics applied at the usual .05 significance thresholds (e.g. 1.96 for a z-statistic), then there is no purpose in simulating alpha. The alpha will be .05, and any deviation from that in a simulation would just be due to Monte Carlo error.

Simulating alpha is done when there is reason to believe that the analytic model and the real world generating model are substantially different. The difference might be due to a design that builds in a bias. Or it might be that the analysis relies on a test statistic whose assumptions are not met. Or it might be that the analysis relies on some novel test statistic whose distributional properties are not known in advance. In any of these situations, the nominal p-values that come out of the planned analysis cannot be taken at face value. In these situations, you could do a simulation in which the simulated data sets are sampled in accordance with the real world data generating model with the effect size constrained at zero, then analyzed in accordance with the planned analytic model, and you then see how often the analytic model rejects the null hypothesis. That gives you the simulated type I error rate (alpha).
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#28

16 Oct 2019, 18:01

Thank you, Dr. Schechter. Your posts have been tremendously informative. It makes sense that in my case since the data generating model and analysis model are the same that I don't need to simulate type 1 error. In your experience, in the absence of literature estimates and expert informed values for the parameters that you need to generate the data, is there a work-around for that? I do have access to a previously performed study, but hesitate to use its values because the sample was very small and hence there are concerns about internal validity.

Last edited by CEdward; 16 Oct 2019, 18:30.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4415
#29

16 Oct 2019, 18:33

Originally posted by Jack Chau View Post

. . . in the absence of literature estimates and expert informed values for the parameters that you need to generate the data, is there a work-around for that?

Do you mean nuisance parameters, or the effect? For the former, you'd simulate throughout a range of plausible values and see how sensitive the test performance (power, test size) or model performance (bias, efficiency) is to them. For the latter, that's just a conventional power analysis, done analogously with effect size and sample size varying and type 1 error rate fixed.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#30

16 Oct 2019, 18:46

Both - since I am doing a simulation to determine sample size. In my case I would need the standard deviation for three random effects for the error terms, an intercept, and a coefficient for a time covariate. Based on Clyde's previous response, it would seem the effect size should only come from experts.

Last edited by CEdward; 16 Oct 2019, 18:49.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment