How to simulate clustered data with a specific intra-class correlation

Klaus Steitzel

Join Date: Aug 2014

Posts: 61
#1

How to simulate clustered data with a specific intra-class correlation

02 Sep 2018, 12:01

Dear Statalist,

I am trying to simulate clustered data with a specific intra-class correlation. One approach I came up with is:

Code:

clear set seed 1 matrix C = (1, 0.5 \ 0.5 , 1) drawnorm y1 y2, n(100) corr(C) gen i = _n reshape long y, i(i) j(j) qui mixed y || i: estat icc

This creates j = 2 observations per i = 100 clusters which have an ICC of 0.5. Is there an approach that allows me to vary the number of observations j per cluster in a random fashion?

Thanks so much for your consideration
KS
Tags: ICC, mixed, multilevel, simulation
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

02 Sep 2018, 12:28

Code:

clear set seed 1 set obs 100 gen cluster_num = _n gen obs_per_cluster = rpoisson(5) expand obs_per_cluster by cluster_num, sort: gen member_num = _n local rho = 0.5 local sd_u = sqrt(`rho') local sd_e = sqrt(1-`rho') by cluster_num (member_num), sort: gen u = rnormal(0, `sd_u') if _n == 1 by cluster_num (member_num): replace u = u[1] gen e = rnormal(0, `sd_e') gen y = u + e mixed y || cluster_num: estat icc

Here, I have randomized the number of observations per cluster using a Poisson distribution to illustrate the approach, but you can do that part any way you like.

The logic is simple. The ICC is, by definition, var u/(var u + var e), where u is the cluster-level intercept, and e is the residual. In your example, the total variance var u + var e was set to 1, so I assumed the same here. A little algebra then says that the variance of u must be the desired value of the ICC. So sample u with a standard deviation equal to the square root of the desired ICC. And sample e with standard deviation equal to the square root of 1 - the desired ICC. Then add up u and e, your variable y will have the desired icc.
2 likes
Comment
George Higgins

Join Date: Aug 2018

Posts: 39
#3

16 Oct 2021, 08:36

I know this is an older post, but I has led me to a question. I was wonder how this might be generalized so that 2 normally distributed random variables may covary with y (e.g., y=u + x1 + x2 + e), and still be able to retain the ability to control the clustering using the ICC? If this is a question already asked, please point me to the correct post. Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#4

16 Oct 2021, 10:01

I'm afraid I don't understand what is being asked in #3. Can you clarify?
Comment
William Rossi

Join Date: May 2022

Posts: 24
#5

21 Sep 2023, 02:33

Dear community,
I have a curiosity regarding this quote:

Originally posted by Clyde Schechter View Post

In your example, the total variance var u + var e was set to 1, so I assumed the same here. A little algebra then says that the variance of u must be the desired value of the ICC.

What if the total variance was different, say 17. How should I proceed then?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#6

21 Sep 2023, 09:33

I would use the exact same code, and then, at the end add:

Code:

replace y = y*sqrt(17)
1 like
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 430
#7

21 Sep 2023, 09:44

Then you would need to figure out how you want the total variance to be distributed across the levels, and then specify the variances accordingly. For example, if you wanted an ICC in a two level model to be .50, then you would want the variance to be equally divided between the levels. My setup is a little different than Clyde's as I specify the standard deviations of u and e immediately proceeding creating the identifiers for those levels, but the end result is the same:

Code:

clear* version 16 set seed 1 set obs 100 *clusters gen cluster_num = _n gen obs_per_cluster = rpoisson(5) // randomize number of obs/cluster with Poisson distribution w/ a mean of 5 gen u = rnormal(0,.292) // sqrt(.17/2) *members within clusters expand obs_per_cluster by cluster_num, sort: gen member_num = _n gen e = rnormal(0,.292) // sqrt(.17/2) gen y = u + e mixed y || cluster_num: estat icc
Comment
William Rossi

Join Date: May 2022

Posts: 24
#8

21 Sep 2023, 10:19

Thank you Clyde Schechter and Erik Ruzek. Very useful insights
1 like
Comment

Announcement

How to simulate clustered data with a specific intra-class correlation

Comment

Comment

Comment

Comment

Comment

Comment

Comment