clustersampsi and power calculation for binary outcome in cluster-randomized trial: how to choose rho?

Bert Lloyd

Join Date: Apr 2014

Posts: 107
#1

clustersampsi and power calculation for binary outcome in cluster-randomized trial: how to choose rho?

17 May 2014, 11:37

Dear statalist,

I am using the user-written clustersampsi (references below) to perform a power calculation for a binary outcome variable in a cluster-randomized trial.

The number of clusters is fixed at 15 per arm (k=15). A reasonable a priori estimate of the proportion in the control condition is p1=0.2, with a range of 0.1-0.3. I expect the treatment effect to be approximately 0.2, so a priori I would select p2=0.4. My goal is to calculate the average cluster size necessary to detect an effect with the standard level and power (alpha(0.05), beta(0.8)).

The last parameter remaining to be specified is rho, the intra-cluster correlation, and this is where I would like some guidance.

First, is there a range of values that are seen as reasonable ex ante?

Second, while I do not have baseline data for this population I do have some data on proportions in a sample of similar untreated clusters. Can I use these data to estimate rho? How might I do this?

Many thanks in advance for any advice.

Best,

BL

clustersampsi:
SJ-13-1 st0286 Sample-size calculations in cluster random. controlled trials
st0286 from http://www.stata-journal.com/software/sj13-1
Hemming and Marsh, "A menu-driven facility for sample-size calculations in cluster randomized controlled trials", Stata Journal 13(1), 2013.
http://www.stata-journal.com/article...article=st0286
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4449
#2

17 May 2014, 14:02

there was recently a discussion about this on a different listserve and a number of people replied that in their (sub)discipline it ranged from .05-.15 or from .5-.7, etc.; in other words you need someone who has substantive expertise in your area (or possibly a lit search in this substantive area will give you some info

note that it is always possible to get answers for a range of ICC's and then make a table or a graph
1 like
Comment
Kieran McCaul

Join Date: Apr 2014

Posts: 60
#3

17 May 2014, 14:32

If you have some data at hand that would allow you to estimate the ICC, then you could use that to get an idea of how big the ICC is likely to be in your study and then use a range of values to see how the sample size estimate varies.

Kerry and Bland (1998) give a brief overview of the ICC. To estimate the ICC in Stata, have a look at loneway.

To get an idea of how the ICC can vary within a study, have a look at the paper by Smeeth and Ng (2002). They tabulate intraclass correlation coefficients for a range of variables reported by patients clustered by GP.

Kerry SM and Bland JM (1998). The intracluster correlation coefficient in cluster randomisation. BMJ 316(7142): 1455.

Smeeth L and Ng ES-W (2002). Intraclass correlation coefficients for cluster randomized trials in primary care: data from the MRC Trial of the Assessment and Management of Older People in the Community. Control Clin Trials 23(4): 409-421.
Comment
Bert Lloyd

Join Date: Apr 2014

Posts: 107
#4

17 May 2014, 15:26

Dear Rich and Kieran,

Thanks for your suggestions.

Is loneway suitable for binary outcome variables? The example given in the reference manual is for a continuous outcome.

Last edited by Bert Lloyd; 17 May 2014, 15:29.
Comment
Kieran McCaul

Join Date: Apr 2014

Posts: 60
#5

17 May 2014, 16:12

As long as the binary variables are coded 0/1, it shouldn't be a problem.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4389
#6

17 May 2014, 19:49

For binary outcome variables, you could also try xtprobit outcome, i(cluster) in your sample of similar untreated clusters, and then look at rho, which is the estimate for ICC. It will also give you an estimate of the precision with which ICC is estimated, and you can take that into account when doing your power analysis.
1 like
Comment
Bert Lloyd

Join Date: Apr 2014

Posts: 107
#7

27 May 2014, 10:13

Joseph, does your approach require individual-level data, or can I use cluster-level data (shares rather than 0/1)?

As a followup, would the following give a reasonable approximation to the ICC:

Call p the proportion in each cluster.

Obtain the mean and standard deviation of p, call these mean_p and sd_p.

Estimate the ICC as (variance between) / (total variance) = (sd_p^2) / (mean_p*(1-mean_p)).

Last edited by Bert Lloyd; 27 May 2014, 10:16.
Comment
Leah Bevis

Join Date: Oct 2015

Posts: 125
#8

21 Mar 2021, 14:15

Kieran and Joseph -- from what you both say, I should be able to calculate the ICC for a continuous outcome via either loneway depvar group, or xtreg depvar, i(group). Those 2 commands give me quite different estimates, however. Why is this? (I need to figure out which one, if either, gives me the same ICC requested by clustersampsi.)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4389
#9

21 Mar 2021, 18:36

Originally posted by Leah Bevis View Post

Those 2 commands give me quite different estimates, however. Why is this?

The ANOVA command assumes that you have balanced data, i.e., equal numbers of observations for each group.

If you don't, then you'll probably want to trust -xtreg- more than -loneway-.

Or, better, something like

Code:

mixed outcome || group: , reml dfmethod(kroger) nolrtest nolog estat icc
Comment

Announcement

clustersampsi and power calculation for binary outcome in cluster-randomized trial: how to choose rho?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment