Comparing xtreg and repeated-measures ANOVA

Matt McMahon

Join Date: Aug 2017

Posts: 13
#1

Comparing xtreg and repeated-measures ANOVA

06 Jun 2019, 09:14

Hello,

This is a partial repost of a question I asked here: https://www.statalist.org/forums/for...s-error-anyway. In my original post, the title only contained a reference to one of my two questions. Because that question was more related to technical Stata syntax, it did not attract attention from those who are more interested in statistics. It was suggested by another member in the comments section that I repost just the more statistical part of the question to gain appropriate attention. I apologize if this goes against any rules. I will be happy to remove it if so.

I'm new to using the anova command (it is not common in my field). A reviewer for a journal submission asked me to use ANOVA as a robustness test to my main specification. It likely will not go in the final version of the paper, but I need to run it correctly in order to appease the reviewer. My question is essentially whether I am correctly translating from the xtreg environment to the anova environment.

I have unbalanced panel data resulting from a lab experiment. Subjects are indexed by the variable SubjectID. Each subject participated in exactly 1 session, which are indexed by SessionID, meaning subjects are nested within sessions. Each subject plays the game multiple times, though not all subjects play it the same amount of times. The repetitions are indexed by Period. Thus, I use xtset SubjectID Period at the beginning of my code (both for my standard analysis and for the ANOVA analysis).

Essentially, we care about the marginal impact along one treatment dimension, “N” vs “D”. (I’ll limit the treatments used in the regression here to show a minimum working example. I can easily extrapolate from any answers received here.) The minimum working example of our main regression specification is

Code:

xtreg LiqPerc1_2_B TreatD if Agent==1 & TreatAV==1, re cluster(SessionID)

The independent variable TreatD is a dummy variable that equals one for observations in treatment D and zero for all others (that is, those in treatment N). Because there is also a constant, the coefficient estimate on TreatD tells us the marginal impact of switching from treatment N to treatment D, which is exactly what we’re examining. We use subject-level random effects and cluster our error terms at the session-level, as is standard in the literature. (We also tested for time trends, etc. and found nothing.)

My question is then how to translate that specification from the xtreg environment into its analogous anova environment. I’ve spent a lot of time reading through textbooks, Stata’s ANOVA help text, the examples provided in Stata’s r.pdf file, and various online forums, but I’m still struggling with this adaptation. I think I’ve correctly adapted it using the following specification, but I’m unsure:

Code:

anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(Period) grouping(SessionID)

I would greatly appreciate any help you can offer. Thank you!

Best,
Matt

Last edited by Matt McMahon; 06 Jun 2019, 09:16. Reason: Edit: Adding tags
Tags: anova, panel data, repeated measures, specification, xtreg
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

07 Jun 2019, 12:18

You didn't get a quick response. You'll increase your chances of a helpful answer by following the FAQ on asking questions.

There is an anova that is identical to the xtreg. I've done it with fixed effects, but it should also work with random effects. So, I'd work on it until you get the same results.
Comment

Roman Mostazir

Join Date: Apr 2014
Posts: 874

07 Jun 2019, 18:03

Besides Phil's advice, couple of issues from your post I would like to point out:

There probably is a misunderstanding about the nesting structure of your data as you said "Each subject participated in exactly 1 session, which are indexed by SessionID, meaning subjects are nested within sessions." This only means that sessions or subjects do not vary within one or the other and in that case there is no nesting structure. Rather Period is nested within subject as it varies within subjects. And I would use the xtreg command with subjectID to cluster the variance. If my understanding is correct about your data (again follow Phil's advice on how to make a meaningful post using dataex), then for the anova model, you have one between-subject error term subjectID | TreatD and one within-subject error term which is Period . This follows the anova command:

Code:

anova LiqPerc1_2_B TreatD / SubjectID | TreatD Period TreatD#Period, repeated(Period)

I cannot confirm whether they will be identical to xtreg as both are from different estimation methods and estimation of standard errors are different too. But that is something to follow with Phil's advice.

Here is below one example with example dataset where the main-effect coefficient from -xtreg- was reproducable using -margins- after anova:

Code:

//Get the dataset:

use http://www.stata-press.com/data/r14/t77, clear

*******Random effect**************

xtset subject

xtreg score calib##shape, re cluster(subject)

Random-effects GLS regression                   Number of obs     =         24
Group variable: subject                         Number of groups  =          3

R-sq:                                           Obs per group:
     within  = 0.0000                                         min =          8
     between = 0.0000                                         avg =        8.0
     overall = 0.7680                                         max =          8

                                                Wald chi2(2)      =          .
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .

                                (Std. Err. adjusted for 3 clusters in subject)
------------------------------------------------------------------------------
             |               Robust
       score |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     2.calib |          3   .6922187     4.33   0.000     1.643276    4.356724
             |
       shape |
          2  |         -1   .6922187    -1.44   0.149    -2.356724    .3567236
          3  |          3   1.198958     2.50   0.012     .6500857    5.349914
          4  |   .6666667   1.742045     0.38   0.702     -2.74768    4.081013
             |
 calib#shape |
        2 2  |  -.6666667   1.057381    -0.63   0.528    -2.739096    1.405763
        2 3  |  -1.333333   .3996526    -3.34   0.001    -2.116638   -.5500286
        2 4  |   1.666667   1.440968     1.16   0.247    -1.157579    4.490912
             |
       _cons |   2.333333   1.440968     1.62   0.105    -.4909121    5.157579
-------------+----------------------------------------------------------------
     sigma_u |  .86945522
     sigma_e |  1.1153688
         rho |  .37797619   (fraction of variance due to u_i)
------------------------------------------------------------------------------


*******Repeated measure Anova**********

anova score calib / subject|calib shape calib#shape, repeated(shape)


                         Number of obs =         24    R-squared     =  0.8925
                         Root MSE      =    1.11181    Adj R-squared =  0.7939

                  Source | Partial SS         df         MS        F    Prob>F
           --------------+----------------------------------------------------
                   Model |    123.125         11   11.193182      9.06  0.0003
                         |
                   calib |  51.041667          1   51.041667     11.89  0.0261
           subject|calib |  17.166667          4   4.2916667  
           --------------+----------------------------------------------------
                   shape |  47.458333          3   15.819444     12.80  0.0005
             calib#shape |  7.4583333          3   2.4861111      2.01  0.1662
                         |
                Residual |  14.833333         12   1.2361111  
           --------------+----------------------------------------------------
                   Total |  137.95833         23   5.9981884  


********Use margin to estimate the main effect of calib when shape=1***********


margins, dydx(calib) at(shape=1)

Average marginal effects                        Number of obs     =         24

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 2.calib
at           : shape           =           1

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     2.calib |          3   1.111805     2.70   0.019     .5775843    5.422416
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Roman

Comment

Matt McMahon

Join Date: Aug 2017
Posts: 13

11 Jun 2019, 12:44

Originally posted by Phil Bromiley View Post

You didn't get a quick response. You'll increase your chances of a helpful answer by following the FAQ on asking questions.

There is an anova that is identical to the xtreg. I've done it with fixed effects, but it should also work with random effects. So, I'd work on it until you get the same results.

Thanks for your response. I've read through the FAQ and included the dataex output at the very end of this post. I believe that was the gist of what you were getting at, but please let me know if I've missed anything else important.

Originally posted by Roman Mostazir View Post

Code:

anova LiqPerc1_2_B TreatD / SubjectID | TreatD Period TreatD#Period, repeated(Period)

Originally posted by Roman Mostazir View Post

Code:

Get the dataset:

use http://www.stata-press.com/data/r14/t77, clear

*******Random effect**************

xtset subject

xtreg score calib##shape, re cluster(subject)

Random-effects GLS regression Number of obs = 24
Group variable: subject Number of groups = 3

R-sq: Obs per group:
within = 0.0000 min = 8
between = 0.0000 avg = 8.0
overall = 0.7680 max = 8

Wald chi2(2) = .
corr(u_i, X) = 0 (assumed) Prob > chi2 = .

(Std. Err. adjusted for 3 clusters in subject)
------------------------------------------------------------------------------
| Robust
score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
2.calib |  3 .6922187 4.33 0.000 1.643276 4.356724
|
shape |
2 | -1 .6922187 -1.44 0.149 -2.356724 .3567236
3 | 3 1.198958 2.50 0.012 .6500857 5.349914
4 | .6666667 1.742045 0.38 0.702 -2.74768 4.081013
|
calib#shape |
2 2 | -.6666667 1.057381 -0.63 0.528 -2.739096 1.405763
2 3 | -1.333333 .3996526 -3.34 0.001 -2.116638 -.5500286
2 4 | 1.666667 1.440968 1.16 0.247 -1.157579 4.490912
|
_cons | 2.333333 1.440968 1.62 0.105 -.4909121 5.157579
-------------+----------------------------------------------------------------
sigma_u | .86945522
sigma_e | 1.1153688
rho | .37797619 (fraction of variance due to u_i)
------------------------------------------------------------------------------


*******Repeated measure Anova**********

anova score calib / subject|calib shape calib#shape, repeated(shape)


Number of obs = 24 R-squared = 0.8925
Root MSE = 1.11181 Adj R-squared = 0.7939

Source | Partial SS df MS F Prob>F
--------------+----------------------------------------------------
Model | 123.125 11 11.193182 9.06 0.0003
|
calib | 51.041667 1 51.041667 11.89 0.0261
subject|calib | 17.166667 4 4.2916667
--------------+----------------------------------------------------
shape | 47.458333 3 15.819444 12.80 0.0005
calib#shape | 7.4583333 3 2.4861111 2.01 0.1662
|
Residual | 14.833333 12 1.2361111
--------------+----------------------------------------------------
Total | 137.95833 23 5.9981884


********Use margin to estimate the main effect of calib when shape=1***********


margins, dydx(calib) at(shape=1)

Average marginal effects Number of obs = 24

Expression : Linear prediction, predict()
dy/dx w.r.t. : 2.calib
at : shape = 1

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
2.calib |  3  1.111805 2.70 0.019 .5775843 5.422416
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Thanks for your helpful response! I'm working on adapting this.

Note: See very bottom for the full dataset (well, the relevant subsample for the minimum working example discussed here) using the dataex command.

I've run the code you suggested, and here is the output I get:

Code:

. anova LiqPerc1_2_B TreatD / SubjectID|TreatD Period TreatD#Period, repeated(Period)

                         Number of obs =        172    R-squared     =  0.6984
                         Root MSE      =    21.1421    Adj R-squared =  0.4993

                  Source | Partial SS         df         MS        F    Prob>F
        -----------------+----------------------------------------------------
                   Model |  106603.84         68   1567.7035      3.51  0.0000
                         |
                  TreatD |  991.47015          1   991.47015      0.49  0.4894
        SubjectID|TreatD |  96003.485         47   2042.6273  
        -----------------+----------------------------------------------------
                  Period |  8705.7182         11   791.42892      1.77  0.0687
           TreatD#Period |  4871.4462          9    541.2718      1.21  0.2964
                         |
                Residual |  46039.706        103   446.98744  
        -----------------+----------------------------------------------------
                   Total |  152643.55        171   892.65232  


Between-subjects error term:  SubjectID|TreatD
                     Levels:  51        (47 df)
     Lowest b.s.e. variable:  SubjectID
     Covariance pooled over:  TreatD    (for repeated variable)

Repeated variable: Period
                                          Huynh-Feldt epsilon        =    .
                                          Greenhouse-Geisser epsilon =    .
                                          Box's conservative epsilon =  0.0909

                                            ------------ Prob > F ------------
                  Source |     df      F    Regular    H-F      G-G      Box
        -----------------+----------------------------------------------------
                  Period |     11     1.77   0.0687     .        .      0.2148
           TreatD#Period |      9     1.21   0.2964     .        .      0.2840
                Residual |    103
        ----------------------------------------------------------------------

. margins, dydx(TreatD)

Average marginal effects                        Number of obs     =        172

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 1.TreatD

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    1.TreatD |          .  (not estimable)
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

I also tried several other variations, and they all return the same "not estimable" response. For example, I tried setting the option at(Period=2) in the margins command. I also tried using TreatD#c.Period in the actual anova command.

(Also, for what it's worth, I'd prefer to stick to the xtreg analogy with errors clustered at the session level for now (rather than the subject level) since that's what the reviewer asked for. Once I better understand how to translate between xtreg and anova, then I can start adjusting the empirical specification a bit more.)

Here's the relevant variables for the full dataset for the minimum working example:

Code:

. dataex SessionID SubjectID Period TreatD LiqPerc1_2_B if Agent==1 & TreatAV==1, count(192)
clear
input float(SessionID SubjectID) byte Period float(TreatD LiqPerc1_2_B)
 4  61 10 0        15
 4  61 11 0         4
 4  61 12 0         .
 4  64 10 0         .
 4  64 11 0      37.5
 4  64 12 0         .
 4  66 10 0        50
 4  66 11 0       100
 4  66 12 0        50
 4  68 10 0        50
 4  68 11 0        50
 4  68 12 0        45
 4  71 10 0       100
 4  71 11 0         .
 4  71 12 0      37.5
 4  72 10 0        50
 4  72 11 0         .
 4  72 12 0        52
 5  82 10 1       100
 5  82 11 1       100
 5  82 12 1       100
 5  83 10 1        30
 5  83 11 1  29.23077
 5  83 12 1        50
 5  85 10 1      87.5
 5  85 11 1        75
 5  85 12 1       100
 5  88 10 1 36.363636
 5  88 11 1  42.85714
 5  88 12 1        50
 5  90 10 1 33.333332
 5  90 11 1         .
 5  90 12 1         0
 5  91 10 1        20
 5  91 11 1 16.666666
 5  91 12 1 14.285714
 5  92 10 1       100
 5  92 11 1        50
 5  92 12 1         .
 5  95 10 1  22.22222
 5  95 11 1 18.181818
 5  95 12 1  28.57143
 6 101 10 1         0
 6 101 11 1       100
 6 101 12 1  47.61905
 6 102 10 1      12.5
 6 102 11 1        82
 6 102 12 1         .
 6 104 10 1        70
 6 104 11 1  85.71429
 6 104 12 1  83.33334
 6 107 10 1        25
 6 107 11 1         0
 6 107 12 1        30
 6 109 10 1  91.66666
 6 109 11 1         .
 6 109 12 1  92.85714
 6 111 10 1         .
 6 111 11 1      12.5
 6 111 12 1        25
 8 141  4 0        40
 8 141  5 0  52.94118
 8 141  6 0  42.85714
 8 143  4 0        25
 8 143  5 0 33.333332
 8 143  6 0      37.5
 8 145  4 0  71.42857
 8 145  5 0       100
 8 145  6 0  55.55556
 8 148  4 0      97.5
 8 148  5 0  8.333333
 8 148  6 0  6.666667
 8 149  4 0        50
 8 149  5 0        25
 8 149  6 0        40
10 181  1 0 33.333332
10 181  2 0 34.285713
10 181  3 0        20
10 181  4 1        35
10 181  5 1        30
10 181  6 1        40
10 181  7 0        25
10 181  8 0  15.09434
10 181  9 0        25
10 182  1 0 66.666664
10 182  2 0        50
10 182  3 0        50
10 182  4 1        50
10 182  5 1  57.14286
10 182  6 1        50
10 182  7 0        50
10 182  8 0        60
10 182  9 0        50
10 184  1 0        30
10 184  2 0        20
10 184  3 0        20
10 184  4 1         .
10 184  5 1        40
10 184  6 1  41.66667
10 184  7 0 33.333332
10 184  8 0  21.42857
10 184  9 0         .
10 187  1 0        50
10 187  2 0        75
10 187  3 0 66.666664
10 187  4 1        50
10 187  5 1        80
10 187  6 1 66.666664
10 187  7 0        60
10 187  8 0        75
10 187  9 0  83.33334
10 188  1 0 66.666664
10 188  2 0        50
10 188  3 0        60
10 188  4 1        55
10 188  5 1        40
10 188  6 1         .
10 188  7 0  41.66667
10 188  8 0     81.25
10 188  9 0         0
10 191  1 0         .
10 191  2 0        40
10 191  3 0       100
10 191  4 1       100
10 191  5 1       100
10 191  6 1         .
10 191  7 0         .
10 191  8 0         .
10 191  9 0         .
11 202  1 1        50
11 202  2 1        50
11 202  3 1         .
11 202  4 0        60
11 202  5 0      62.5
11 202  6 0         .
11 202  7 1        60
11 202  8 1        25
11 202  9 1        50
11 205  1 1        50
11 205  2 1        50
11 205  3 1  29.87013
11 205  4 0        50
11 205  5 0 66.666664
11 205  6 0        50
11 205  7 1        30
11 205  8 1        50
11 205  9 1        50
11 206  1 1        25
11 206  2 1         0
11 206  3 1 11.764706
11 206  4 0         0
11 206  5 0         0
11 206  6 0        20
11 206  7 1         0
11 206  8 1 33.333332
11 206  9 1      37.5
11 207  1 1         0
11 207  2 1        30
11 207  3 1         0
11 207  4 0        25
11 207  5 0         0
11 207  6 0        50
11 207  7 1         0
11 207  8 1        10
11 207  9 1        25
11 209  1 1       100
11 209  2 1       100
11 209  3 1       100
11 209  4 0       100
11 209  5 0         0
11 209  6 0         0
11 209  7 1       100
11 209  8 1       100
11 209  9 1       100
11 210  1 1        50
11 210  2 1         0
11 210  3 1        50
11 210  4 0       100
11 210  5 0         0
11 210  6 0         0
11 210  7 1 66.666664
11 210  8 1       100
11 210  9 1       100
11 212  1 1        50
11 212  2 1        50
11 212  3 1        30
11 212  4 0        50
11 212  5 0 33.333332
11 212  6 0  42.85714
11 212  7 1        52
11 212  8 1        50
11 212  9 1        50
end

Note: The missing values for the dependent variable (right-most column) are because that variable is generated as a percentage of two observed variables, the denominator of which can be zero.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#5

12 Jun 2019, 02:29

Originally posted by Matt McMahon View Post

Once I better understand how to translate between xtreg and anova

As stated in the sister thread, despite

Code:

xtset SubjectID Period

Period is nowhere in the

Code:

xtreg LiqPerc1_2_B TreatD if Agent==1 & TreatAV==1, re cluster(SessionID)

model.

Thus, to translate between xtreg and anova, Period and interaction terms involving it will be absent:

Code:

anova LiqPerc1_2_B TreatD SubjectID

Moreover, participant isn't nested within treatment group: look at participants and their treatment assignments in the last two sessions.
Comment

Announcement