problem with a simple nested anova

Nigel Moore

Join Date: Apr 2016
Posts: 79

problem with a simple nested anova

02 Dec 2017, 06:23

I'm having trouble with a nested ANOVA on a fairly simple dataset. Concentrations of a chemical were measured in the blood and two organs of four individuals. I would like to compare conc in each matrix by id:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id float(matrix conc)
 73 0 276
 75 0 214
 79 0 227
121 0 241
 73 1 168
 75 1 144
 79 1 147
121 1 154
 73 2 253
 75 2 195
 79 2 179
121 2 224
end

If I run a plain vanilla -anova- I see that the concentration in one organ is different to blood, but not the other organ:

Code:

. anova conc matrix

                         Number of obs =         12    R-squared     =  0.7328
                         Root MSE      =    25.1319    Adj R-squared =  0.6735

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  15593.167          2   7796.5833     12.34  0.0026
                         |
                  matrix |  15593.167          2   7796.5833     12.34  0.0026
                         |
                Residual |     5684.5          9   631.61111  
              -----------+----------------------------------------------------
                   Total |  21277.667         11   1934.3333  

. pwcompare matrix, pveffects mcompare(scheffe)

Pairwise comparisons of marginal linear predictions

Margins      : asbalanced

---------------------------
             |    Number of
             |  Comparisons
-------------+-------------
      matrix |            3
---------------------------

-----------------------------------------------------
             |                             Scheffe
             |   Contrast   Std. Err.      t    P>|t|
-------------+---------------------------------------
      matrix |
     1 vs 0  |     -86.25   17.77092    -4.85   0.003
     2 vs 0  |     -26.75   17.77092    -1.51   0.364
     2 vs 1  |       59.5   17.77092     3.35   0.026
-----------------------------------------------------

However, since the matrices are nested in the individual, I should run a nested ANOVA:

Code:

. anova conc matrix / matrix|id /

                         Number of obs =         12    R-squared     =  1.0000
                         Root MSE      =          0    Adj R-squared =

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  21277.667         11   1934.3333  
                         |
                  matrix |  15593.167          2   7796.5833     12.34  0.0026
               matrix|id |     5684.5          9   631.61111  
              -----------+----------------------------------------------------
               matrix|id |     5684.5          9   631.61111  
                         |
                Residual |          0          0
              -----------+----------------------------------------------------
                   Total |  21277.667         11   1934.3333  

. pwcompare matrix, pveffects mcompare(scheffe)

Pairwise comparisons of marginal linear predictions

Margins      : asbalanced

---------------------------
             |    Number of
             |  Comparisons
-------------+-------------
      matrix |            3
---------------------------

-----------------------------------------------------
             |                             Scheffe
             |   Contrast   Std. Err.      t    P>|t|
-------------+---------------------------------------
      matrix |
     1 vs 0  |     -86.25          .        .       .
     2 vs 0  |     -26.75          .        .       .
     2 vs 1  |       59.5          .        .       .
-----------------------------------------------------

That doesn't look right! There's obvuiously something wrong with that model. But with such a simple dataset, the model should also, presumably, be simple. Your suggestions would be appreciated!

Last edited by Nigel Moore; 02 Dec 2017, 06:25.

Stata 14.2MP
OS X

Tags: None

Nigel Moore

Join Date: Apr 2016
Posts: 79

02 Dec 2017, 08:51

One other consideration. I tried a -mixed- analysis with id as the repeated measures indicator, and that worked well showing that organ 2 was also different to blood. But I seem to recall that nested ANOVA is recommended over -mixed- for small datasets:

Code:

. mixed conc i.matrix || id:

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log likelihood = -50.455837  
Iteration 1:   log likelihood = -50.455837  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =         12
Group variable: id                              Number of groups  =          4

                                                Obs per group:
                                                              min =          3
                                                              avg =        3.0
                                                              max =          3

                                                Wald chi2(2)      =     125.31
Log likelihood = -50.455837                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
        conc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      matrix |
          1  |     -86.25   7.887885   -10.93   0.000      -101.71   -70.79003
          2  |     -26.75   7.887885    -3.39   0.001    -42.20997   -11.29003
             |
       _cons |      239.5   10.88242    22.01   0.000     218.1708    260.8292
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   349.2711   277.0795      73.77305    1653.589
-----------------------------+------------------------------------------------
               var(Residual) |   124.4375   62.21872      46.70361    331.5521
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 7.07          Prob >= chibar2 = 0.0039

. margins matrix

Adjusted predictions                            Number of obs     =         12

Expression   : Linear prediction, fixed portion, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      matrix |
          0  |      239.5   10.88242    22.01   0.000     218.1708    260.8292
          1  |     153.25   10.88242    14.08   0.000     131.9208    174.5792
          2  |     212.75   10.88242    19.55   0.000     191.4208    234.0792
------------------------------------------------------------------------------

. pwcompare matrix, pveffects mcompare(scheffe)

Pairwise comparisons of marginal linear predictions

Margins      : asbalanced

---------------------------
             |    Number of
             |  Comparisons
-------------+-------------
conc         |
      matrix |            3
---------------------------

-----------------------------------------------------
             |                             Scheffe
             |   Contrast   Std. Err.      z    P>|z|
-------------+---------------------------------------
conc         |
      matrix |
     1 vs 0  |     -86.25   7.887885   -10.93   0.000
     2 vs 0  |     -26.75   7.887885    -3.39   0.003
     2 vs 1  |       59.5   7.887885     7.54   0.000
-----------------------------------------------------

Stata 14.2MP
OS X

Comment

Dave Airey

Join Date: Apr 2014

Posts: 398
#3

02 Dec 2017, 10:55

Are the two organs the same two organs in all patients? If so your design is a repeated measures with matrix crossed with subject.

anova conc matrix id, repeated(matrix)
pwcompare matrix
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1132

02 Dec 2017, 12:31

I agree that this appears to be (what I would call) a one-factor repeated measures ANOVA (aka., a Treatment x Subjects design). I would not use pwcompare for this design, though, unless there is a way to make it use a different error term for every contrast. For repeated measures designs, there is no reason to expect the Treatment x Subjects interaction (i.e., the error term) to be the same for every pair of conditions, and so there is no good reason to use the overall error term from the ANOVA table. Many authors recommend using ordinary paired t-tests for the pair-wise contrasts.

Also, note that when there are 3 conditions, carrying out all 3 pair-wise contrasts conditional of first observing a statistically significant omnibus F-test ensures that the family-wise alpha equals the per-contrast alpha. As Meier (2006) said, "Fisher's LSD procedure is known to preserve the experimentwise type I error rate at the nominal level of significance, if (and only if) the number of treatment groups is three." (See p. 41 in this chapter from an old edition of Dave Howell's Statistical Methods for Psychology for an explanation.) Granted, Meier was talking about pair-wise contrasts in a between-Ss ANOVA (using a pooled error term.) But I believe the same logic holds for the repeated measures designs.

In the following example, I renamed Nigel's matrix variable to site (because matrix was causing problems on my sort command).

Code:

. * Example generated by -dataex-. To install: ssc install dataex
. clear

. input int id float(site conc)

           id       site       conc
  1.  73 0 276
  2.  75 0 214
  3.  79 0 227
  4. 121 0 241
  5.  73 1 168
  6.  75 1 144
  7.  79 1 147
  8. 121 1 154
  9.  73 2 253
 10.  75 2 195
 11.  79 2 179
 12. 121 2 224
 13. end

. sort id site

. list, sepby(id)

     +-------------------+
     |  id   site   conc |
     |-------------------|
  1. |  73      0    276 |
  2. |  73      1    168 |
  3. |  73      2    253 |
     |-------------------|
  4. |  75      0    214 |
  5. |  75      1    144 |
  6. |  75      2    195 |
     |-------------------|
  7. |  79      0    227 |
  8. |  79      1    147 |
  9. |  79      2    179 |
     |-------------------|
 10. | 121      0    241 |
 11. | 121      1    154 |
 12. | 121      2    224 |
     +-------------------+

.
. * This appears to be a one-factor repeated measures ANOVA.
. anova conc id site, repeated(site)

                         Number of obs =         12    R-squared     =  0.9532
                         Root MSE      =    12.8809    Adj R-squared =  0.9142

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  20282.167          5   4056.4333     24.45  0.0006
                         |
                      id |       4689          3        1563      9.42  0.0109
                    site |  15593.167          2   7796.5833     46.99  0.0002
                         |
                Residual |      995.5          6   165.91667  
              -----------+----------------------------------------------------
                   Total |  21277.667         11   1934.3333  


Between-subjects error term:  id
                     Levels:  4         (3 df)
     Lowest b.s.e. variable:  id

Repeated variable: site
                                          Huynh-Feldt epsilon        =  1.2609
                                          *Huynh-Feldt epsilon reset to 1.0000
                                          Greenhouse-Geisser epsilon =  0.7333
                                          Box's conservative epsilon =  0.5000

                                            ------------ Prob > F ------------
                  Source |     df      F    Regular    H-F      G-G      Box
              -----------+----------------------------------------------------
                    site |      2    46.99   0.0002   0.0002   0.0013   0.0064
                Residual |      6
              ----------------------------------------------------------------

.
. * When there are 3 groups, or 3 conditions as in this case,
. * carrying out all 3 pair-wise contrasts conditional on a
. * significant omnibus test preserves the family-wise alpha
. * at the alpha level used for the omnibus test and for each
. * of the pair-wise contrasts.  For between-Ss ANOVA, the
. * pair-wise contrasts are carried out via modified t-tests
. * that all use SQRT(MS_error) from the ANOVA table as the SE.
. * But for repeated measures ANOVA, it does not make sense to
. * use the MS_error from the ANOVA summary table, because there
. * is no reason to expect the Treatment x Subjects interaction
. * to be similar in nature across all pairs of conditions.  
. * For that reason, some authors recommend using ordinary
. * paired t-tests to make the pair-wise comparisons.
. * Let's see what -pwcompare- does after the RM ANOVA done above.
.
. pwcompare site, effects

Pairwise comparisons of marginal linear predictions

Margins      : asbalanced

------------------------------------------------------------------------------
             |                            Unadjusted           Unadjusted
             |   Contrast   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        site |
     1 vs 0  |     -86.25   9.108147    -9.47   0.000    -108.5368   -63.96317
     2 vs 0  |     -26.75   9.108147    -2.94   0.026    -49.03683   -4.463168
     2 vs 1  |       59.5   9.108147     6.53   0.001     37.21317    81.78683
------------------------------------------------------------------------------

.
. * Notice that the SE is the same for all 3 contrasts.
. * This is not what I want.  I don't know if there is a way
. * to make -pwcompare- perform ordinary paired t-tests.
. * If not, one can always reshape the dataset and do them
. * the old-fashioned way.
.
. reshape wide conc, i(id) j(site)
(note: j = 0 1 2)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       12   ->       4
Number of variables                   3   ->       4
j variable (3 values)              site   ->   (dropped)
xij variables:
                                   conc   ->   conc0 conc1 conc2
-----------------------------------------------------------------------------

. ttest conc1 == conc0

Paired t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
   conc1 |       4      153.25     5.34439    10.68878    136.2418    170.2582
   conc0 |       4       239.5    13.35727    26.71454    196.9912    282.0088
---------+--------------------------------------------------------------------
    diff |       4      -86.25    8.045444    16.09089   -111.8542   -60.64581
------------------------------------------------------------------------------
     mean(diff) = mean(conc1 - conc0)                             t = -10.7204
 Ho: mean(diff) = 0                              degrees of freedom =        3

 Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
 Pr(T < t) = 0.0009         Pr(|T| > |t|) = 0.0017          Pr(T > t) = 0.9991

. ttest conc2 == conc0

Paired t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
   conc2 |       4      212.75    16.33185    32.66369    160.7748    264.7252
   conc0 |       4       239.5    13.35727    26.71454    196.9912    282.0088
---------+--------------------------------------------------------------------
    diff |       4      -26.75    7.192299     14.3846   -49.63911   -3.860894
------------------------------------------------------------------------------
     mean(diff) = mean(conc2 - conc0)                             t =  -3.7193
 Ho: mean(diff) = 0                              degrees of freedom =        3

 Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
 Pr(T < t) = 0.0169         Pr(|T| > |t|) = 0.0338          Pr(T > t) = 0.9831

. ttest conc2 == conc1

Paired t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
   conc2 |       4      212.75    16.33185    32.66369    160.7748    264.7252
   conc1 |       4      153.25     5.34439    10.68878    136.2418    170.2582
---------+--------------------------------------------------------------------
    diff |       4        59.5    11.50724    23.01449    22.87881    96.12119
------------------------------------------------------------------------------
     mean(diff) = mean(conc2 - conc1)                             t =   5.1707
 Ho: mean(diff) = 0                              degrees of freedom =        3

 Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
 Pr(T < t) = 0.9930         Pr(|T| > |t|) = 0.0140          Pr(T > t) = 0.0070

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Dave Airey

Join Date: Apr 2014

Posts: 398
#5

02 Dec 2017, 15:13

I agree about the t-test comments above for repeated measures ANOVA.
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

02 Dec 2017, 18:57

I suggest approaching it more systematically, for example, to explicitly examine the covariance structure (see below) to see how reasonable the assumption that Bruce mentions in #4 is. In this case, the residual variance ranges from 100 to 1000, and so Bruce has a point. I've seen the same advice that Bruce mentions (individual t-tests) and, although it's easy to do a bunch of t-tests, you can also model the error covariance structure using mixed and then use the small-sample adjustments there to help accommodate the limitations in the dataset size. This allows all of the postestimation commands for mixed that you don't have with individual t-tests, such as the pwcompare postestimation command that was mentioned a couple of times in this thread.

Because you have balanced data, you can also do the same with MANOVA. Again, the advantage of manova over a bunch of t-tests is the availability of postestimation commands, for example, the ability to test joint hypotheses (example below) that are more difficult or impossible to pull off with pairwise t-tests.

Code:

version 15.1

clear *

input int id  byte matrix int conc
 73 0 276
 75 0 214
 79 0 227
121 0 241
 73 1 168
 75 1 144
 79 1 147
121 1 154
 73 2 253
 75 2 195
 79 2 179
121 2 224
end

*
* Examination of residual error structure (& testing assumption Bruce brought up in #4)
*
mixed conc i.matrix || id:, noconstant residuals(unstructured, t(matrix)) nolrtest nolog
estimates store Unstructured

mixed conc i.matrix || id:, noconstant residuals(exchangeable) nolrtest nolog
estimates store Exchangeable

lrtest Unstructured Exchangeable

estimates drop _all

*
* Modeling the residual error with small-sample adjustments using -mixed- (allows -pwcompare- etc.)
*
mixed conc i.matrix || id:, noconstant reml dfmethod(satterthwaite) residuals(unstructured, t(matrix)) nolrtest nolog
pwcompare i.matrix, small effects

*
* Ditto using MANOVA (exact test statistics)
*
quietly reshape wide conc, i(id) j(matrix)
 
generate byte k = 1.

/* Begin message to StataCorp

   The following are new undesired behaviors with this mistaken syntax:
 
manova conc0-conc2 = k
 
    (1) uninformative "error message"
    (2) attempt to lookup error yields "No entries found"
    (3) -capture noisily- doesn't display anything

   End message to StataCorp */
 
*
* Omnibus test
*
manova conc0-conc2 = k, noconstant
 
*
* Pairwise t-tests
*
matrix input Contrast01 = (1 -1 0)
matrix input Contrast02 = (1 0 -1)
matrix input Contrast12 = (0 -1 1)
 
manovatest k, ytransform(Contrast01)
manovatest k, ytransform(Contrast02)
manovatest k, ytransform(Contrast12)

*
* Joint test (e.g., do the two organs differ from blood?)
*
matrix define Joint = Contrast01 \ Contrast02
manovatest k, ytransform(Joint)

exit

Last edited by Joseph Coveney; 02 Dec 2017, 18:59.

Announcement

problem with a simple nested anova

Comment

Comment

Comment

Comment

Comment