Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • compare means of clustered data

    I have completed a mouse study and have compared mean pup birth weight among pups born to moms that were in one of four treatment groups using pairwise t-tests as follows:

    ttest pup_avg if group==1 | group==3, by(group)

    I created a variable that averaged the pups born to a treated mom (pup_avg).

    However, I need to account for clustering of pups born to the same mom in the same pregnancy.

    Each mom has her own ID (variable = mouse_id) and within each mom's observation are variables for each pup weight (weight1, weight2, weight3, etc).

    Any advice on how I can compare mean pup birth weight between two different treatment groups and account for clustering within the same mouse_id?

    Thanks so much.

  • #2
    you can't do it with t-test; you need to move to some form of regression; depending on your exact situation (why are you using averages?), there are several options that could be used; here is a simple regression command:
    Code:
    regress pup_avg i.group if group==1 | group==3, vce(cluster mouse_id)
    again, I ask why you are using averages within mouse as this is hiding some variation

    Comment


    • #3
      Rich,

      Thanks so much for your help.

      Honestly, I don't know how to answer your question about why I'm using averages. My primary question was whether or not the babies weighted more or less depending on the exposure their mom received (I have 4 different exposure groups) during pregnancy Is there some other test that I could use?

      Also, the commands you gave me worked, but now I'm not sure what I'm looking at in terms of the output. Forgive my ignorance. I'm a clinical person who doesn't typically do this kind of work.

      regress pup_avg i.group if group==1 | group==3, vce(cluster mouse_id)

      Linear regression Number of obs = 10
      F(1, 9) = 29.13
      Prob > F = 0.0004
      R-squared = 0.7846
      Root MSE = .03724

      (Std. Err. adjusted for 10 clusters in mouse_id)
      ------------------------------------------------------------------------------
      | Robust
      pup_avg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      3.group | .1271425 .0235554 5.40 0.000 .0738565 .1804286
      _cons | .8447308 .0208979 40.42 0.000 .7974565 .892005
      ------------------------------------------------------------------------------

      . regress pup_avg i.group if group==3 | group==4, vce(cluster mouse_id)

      Linear regression Number of obs = 10
      F(1, 9) = 10.12
      Prob > F = 0.0112
      R-squared = 0.5585
      Root MSE = .03341

      (Std. Err. adjusted for 10 clusters in mouse_id)
      ------------------------------------------------------------------------------
      | Robust
      pup_avg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      4.group | -.0672079 .0211285 -3.18 0.011 -.1150039 -.019412
      _cons | .9718733 .010869 89.42 0.000 .9472858 .9964608
      ------------------------------------------------------------------------------

      Comment


      • #4
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte(mouse_id weight group)
        1 10 1
        1  9 1
        1  8 1
        1  7 1
        1  6 1
        1  8 1
        1 11 1
        2 11 2
        2 12 2
        2 13 2
        2 14 2
        2 12 2
        2 15 2
        2 10 2
        3 15 3
        3 16 3
        3 17 3
        3 18 3
        3 17 3
        3 19 3
        3 14 3
        4 11 4
        4  8 4
        4 12 4
        4  7 4
        4  6 4
        4  8 4
        4 11 4
        end
        The .do file below includes comments for what is taking place at each step along the way.

        /* Generate dummy (aka indicator) variables for each group for use in one
        overall regression */
        tab group, gen(g)

        /* Look at means for each group two different ways */
        bys g*: summ weight

        forvalues i = 1/4 {
        summ weight if g`i'==1
        di "group"`i'
        }

        /* The means increase as the group number increases. The example data was
        purposely set up this way to facilitate an explanation of the regression
        results. We use group 1 as the base group */

        reg weight g2 g3 g4

        /* The mean of group 1 is represented by the _cons term, which is often shown
        as B0 in empirical papers. This mean is 8.428571. We know that the mean of
        group 2 is 12.42857. This is equal to our constant term, _cons, plus the
        regression parameter estimate for group 2, which here is g2. The 8.428571
        + 4 = 12.42857. The mean of group 3 is 8.428571 + 8.142857= 16.57143. This
        pattern also holds for group 4.

        The stock regression output gives you the mean differences between groups with
        reference to the base group as well as the statistical significance of the
        differences of groups 2, 3, and 4 from the base group.

        Alternatively, we can estimate the model without a constant term and a full
        set of dummy variables. */
        reg weight g1 g2 g3 g4, nocons

        /* You can see that the means of each group are directly reported. There is no
        base group here. To test for differences between groups, use the -test-
        command */

        test g1 = g2
        test g1= g3
        test g1= g4

        reg weight g2 g3 g4

        /* You can see that the the model with a full set of dummies and no constant
        term yields the same statistical significance for testing the difference
        between g1 and g4 as the model with three dummy variables and a constant term.
        In either model, the statistical significance of the difference between
        the group 1 and group 4 means is .5756. */

        /* You can also test for differences between groups, and not just for
        differences with respect to the base group, from the regression model with
        three indicator variables and a constant term. This is shown below as we
        test for differences between the means of group 2 and group 3 with both models.
        We achieve equivalent results*/

        test g2 =g3

        reg weight g1 g2 g3 g4, nocons
        test g2=g3

        /* Using this example data with clustering based on mouse_id will result
        in a "." being reported for the model's F statistic based on the
        number of clusters and parameter estimates. The command would be:

        reg weight g1 g2 g3, cluster(mouse_id)

        */

        Comment


        • #5
          10 clusters is rather small as this adjustment is a large-sample solution

          please read the FAQ and post results using CODE blocks as described in the FAQ

          the constant is the mean of the outcome for the reference group (1 in your first output, 3 in your second) and the coefficient for "3.group" ("4.group" in the second output) is the difference in the means

          if you have more than 2 groups of interest, you could do them all in the same regression. one advantage of this is that you could easily do a post-hoc test (group 3 v group 4 for example) if that were of interest

          you should not use averages

          added: I originally wrote the above yesterday but apparently forgot to post it - sorry about that

          Comment

          Working...
          X