compare means of clustered data

Matthew Ryan

Join Date: May 2017

Posts: 2
#1

compare means of clustered data

16 May 2017, 08:32

I have completed a mouse study and have compared mean pup birth weight among pups born to moms that were in one of four treatment groups using pairwise t-tests as follows:

ttest pup_avg if group==1 | group==3, by(group)

I created a variable that averaged the pups born to a treated mom (pup_avg).

However, I need to account for clustering of pups born to the same mom in the same pregnancy.

Each mom has her own ID (variable = mouse_id) and within each mom's observation are variables for each pup weight (weight1, weight2, weight3, etc).

Any advice on how I can compare mean pup birth weight between two different treatment groups and account for clustering within the same mouse_id?

Thanks so much.
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4455
#2

16 May 2017, 09:40

you can't do it with t-test; you need to move to some form of regression; depending on your exact situation (why are you using averages?), there are several options that could be used; here is a simple regression command:

Code:

regress pup_avg i.group if group==1 | group==3, vce(cluster mouse_id)

again, I ask why you are using averages within mouse as this is hiding some variation
Comment
Matthew Ryan

Join Date: May 2017

Posts: 2
#3

16 May 2017, 10:36

Rich,

Thanks so much for your help.

Honestly, I don't know how to answer your question about why I'm using averages. My primary question was whether or not the babies weighted more or less depending on the exposure their mom received (I have 4 different exposure groups) during pregnancy Is there some other test that I could use?

Also, the commands you gave me worked, but now I'm not sure what I'm looking at in terms of the output. Forgive my ignorance. I'm a clinical person who doesn't typically do this kind of work.

regress pup_avg i.group if group==1 | group==3, vce(cluster mouse_id)

Linear regression Number of obs = 10
F(1, 9) = 29.13
Prob > F = 0.0004
R-squared = 0.7846
Root MSE = .03724

(Std. Err. adjusted for 10 clusters in mouse_id)
------------------------------------------------------------------------------
| Robust
pup_avg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
3.group | .1271425 .0235554 5.40 0.000 .0738565 .1804286
_cons | .8447308 .0208979 40.42 0.000 .7974565 .892005
------------------------------------------------------------------------------

. regress pup_avg i.group if group==3 | group==4, vce(cluster mouse_id)

Linear regression Number of obs = 10
F(1, 9) = 10.12
Prob > F = 0.0112
R-squared = 0.5585
Root MSE = .03341

(Std. Err. adjusted for 10 clusters in mouse_id)
------------------------------------------------------------------------------
| Robust
pup_avg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
4.group | -.0672079 .0211285 -3.18 0.011 -.1150039 -.019412
_cons | .9718733 .010869 89.42 0.000 .9472858 .9964608
------------------------------------------------------------------------------
Comment
Robson Glasscock

Join Date: Apr 2014

Posts: 25
#4

17 May 2017, 01:59

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(mouse_id weight group) 1 10 1 1 9 1 1 8 1 1 7 1 1 6 1 1 8 1 1 11 1 2 11 2 2 12 2 2 13 2 2 14 2 2 12 2 2 15 2 2 10 2 3 15 3 3 16 3 3 17 3 3 18 3 3 17 3 3 19 3 3 14 3 4 11 4 4 8 4 4 12 4 4 7 4 4 6 4 4 8 4 4 11 4 end

The .do file below includes comments for what is taking place at each step along the way.

/* Generate dummy (aka indicator) variables for each group for use in one
overall regression */
tab group, gen(g)

/* Look at means for each group two different ways */
bys g*: summ weight

forvalues i = 1/4 {
summ weight if g`i'==1
di "group"`i'
}

/* The means increase as the group number increases. The example data was
purposely set up this way to facilitate an explanation of the regression
results. We use group 1 as the base group */

reg weight g2 g3 g4

/* The mean of group 1 is represented by the _cons term, which is often shown
as B0 in empirical papers. This mean is 8.428571. We know that the mean of
group 2 is 12.42857. This is equal to our constant term, _cons, plus the
regression parameter estimate for group 2, which here is g2. The 8.428571
+ 4 = 12.42857. The mean of group 3 is 8.428571 + 8.142857= 16.57143. This
pattern also holds for group 4.

The stock regression output gives you the mean differences between groups with
reference to the base group as well as the statistical significance of the
differences of groups 2, 3, and 4 from the base group.

Alternatively, we can estimate the model without a constant term and a full
set of dummy variables. */
reg weight g1 g2 g3 g4, nocons

/* You can see that the means of each group are directly reported. There is no
base group here. To test for differences between groups, use the -test-
command */

test g1 = g2
test g1= g3
test g1= g4

reg weight g2 g3 g4

/* You can see that the the model with a full set of dummies and no constant
term yields the same statistical significance for testing the difference
between g1 and g4 as the model with three dummy variables and a constant term.
In either model, the statistical significance of the difference between
the group 1 and group 4 means is .5756. */

/* You can also test for differences between groups, and not just for
differences with respect to the base group, from the regression model with
three indicator variables and a constant term. This is shown below as we
test for differences between the means of group 2 and group 3 with both models.
We achieve equivalent results*/

test g2 =g3

reg weight g1 g2 g3 g4, nocons
test g2=g3

/* Using this example data with clustering based on mouse_id will result
in a "." being reported for the model's F statistic based on the
number of clusters and parameter estimates. The command would be:

reg weight g1 g2 g3, cluster(mouse_id)

*/
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4455
#5

17 May 2017, 05:54

10 clusters is rather small as this adjustment is a large-sample solution

please read the FAQ and post results using CODE blocks as described in the FAQ

the constant is the mean of the outcome for the reference group (1 in your first output, 3 in your second) and the coefficient for "3.group" ("4.group" in the second output) is the difference in the means

if you have more than 2 groups of interest, you could do them all in the same regression. one advantage of this is that you could easily do a post-hoc test (group 3 v group 4 for example) if that were of interest

you should not use averages

added: I originally wrote the above yesterday but apparently forgot to post it - sorry about that
Comment

Announcement

compare means of clustered data

Comment

Comment

Comment

Comment