Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing xtreg and repeated-measures ANOVA

    Hello,

    This is a partial repost of a question I asked here: https://www.statalist.org/forums/for...s-error-anyway. In my original post, the title only contained a reference to one of my two questions. Because that question was more related to technical Stata syntax, it did not attract attention from those who are more interested in statistics. It was suggested by another member in the comments section that I repost just the more statistical part of the question to gain appropriate attention. I apologize if this goes against any rules. I will be happy to remove it if so.


    I'm new to using the anova command (it is not common in my field). A reviewer for a journal submission asked me to use ANOVA as a robustness test to my main specification. It likely will not go in the final version of the paper, but I need to run it correctly in order to appease the reviewer. My question is essentially whether I am correctly translating from the xtreg environment to the anova environment.


    I have unbalanced panel data resulting from a lab experiment. Subjects are indexed by the variable SubjectID. Each subject participated in exactly 1 session, which are indexed by SessionID, meaning subjects are nested within sessions. Each subject plays the game multiple times, though not all subjects play it the same amount of times. The repetitions are indexed by Period. Thus, I use xtset SubjectID Period at the beginning of my code (both for my standard analysis and for the ANOVA analysis).

    Essentially, we care about the marginal impact along one treatment dimension, “N” vs “D”. (I’ll limit the treatments used in the regression here to show a minimum working example. I can easily extrapolate from any answers received here.) The minimum working example of our main regression specification is
    Code:
    xtreg LiqPerc1_2_B TreatD if Agent==1 & TreatAV==1, re cluster(SessionID)
    The independent variable TreatD is a dummy variable that equals one for observations in treatment D and zero for all others (that is, those in treatment N). Because there is also a constant, the coefficient estimate on TreatD tells us the marginal impact of switching from treatment N to treatment D, which is exactly what we’re examining. We use subject-level random effects and cluster our error terms at the session-level, as is standard in the literature. (We also tested for time trends, etc. and found nothing.)


    My question is then how to translate that specification from the xtreg environment into its analogous anova environment. I’ve spent a lot of time reading through textbooks, Stata’s ANOVA help text, the examples provided in Stata’s r.pdf file, and various online forums, but I’m still struggling with this adaptation. I think I’ve correctly adapted it using the following specification, but I’m unsure:
    Code:
    anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(Period) grouping(SessionID)

    I would greatly appreciate any help you can offer. Thank you!

    Best,
    Matt
    Last edited by Matt McMahon; 06 Jun 2019, 09:16. Reason: Edit: Adding tags

  • #2
    You didn't get a quick response. You'll increase your chances of a helpful answer by following the FAQ on asking questions.

    There is an anova that is identical to the xtreg. I've done it with fixed effects, but it should also work with random effects. So, I'd work on it until you get the same results.

    Comment


    • #3
      Besides Phil's advice, couple of issues from your post I would like to point out:

      There probably is a misunderstanding about the nesting structure of your data as you said "Each subject participated in exactly 1 session, which are indexed by SessionID, meaning subjects are nested within sessions." This only means that sessions or subjects do not vary within one or the other and in that case there is no nesting structure. Rather Period is nested within subject as it varies within subjects. And I would use the xtreg command with subjectID to cluster the variance. If my understanding is correct about your data (again follow Phil's advice on how to make a meaningful post using dataex), then for the anova model, you have one between-subject error term subjectID | TreatD and one within-subject error term which is Period . This follows the anova command:

      Code:
      anova LiqPerc1_2_B TreatD / SubjectID | TreatD Period TreatD#Period, repeated(Period)
      I cannot confirm whether they will be identical to xtreg as both are from different estimation methods and estimation of standard errors are different too. But that is something to follow with Phil's advice.

      Here is below one example with example dataset where the main-effect coefficient from -xtreg- was reproducable using -margins- after anova:

      Code:
      //Get the dataset:
      
      use http://www.stata-press.com/data/r14/t77, clear
      
      *******Random effect**************
      
      xtset subject
      
      xtreg score calib##shape, re cluster(subject)
      
      Random-effects GLS regression                   Number of obs     =         24
      Group variable: subject                         Number of groups  =          3
      
      R-sq:                                           Obs per group:
           within  = 0.0000                                         min =          8
           between = 0.0000                                         avg =        8.0
           overall = 0.7680                                         max =          8
      
                                                      Wald chi2(2)      =          .
      corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .
      
                                      (Std. Err. adjusted for 3 clusters in subject)
      ------------------------------------------------------------------------------
                   |               Robust
             score |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           2.calib |          3   .6922187     4.33   0.000     1.643276    4.356724
                   |
             shape |
                2  |         -1   .6922187    -1.44   0.149    -2.356724    .3567236
                3  |          3   1.198958     2.50   0.012     .6500857    5.349914
                4  |   .6666667   1.742045     0.38   0.702     -2.74768    4.081013
                   |
       calib#shape |
              2 2  |  -.6666667   1.057381    -0.63   0.528    -2.739096    1.405763
              2 3  |  -1.333333   .3996526    -3.34   0.001    -2.116638   -.5500286
              2 4  |   1.666667   1.440968     1.16   0.247    -1.157579    4.490912
                   |
             _cons |   2.333333   1.440968     1.62   0.105    -.4909121    5.157579
      -------------+----------------------------------------------------------------
           sigma_u |  .86945522
           sigma_e |  1.1153688
               rho |  .37797619   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      
      
      *******Repeated measure Anova**********
      
      anova score calib / subject|calib shape calib#shape, repeated(shape)
      
      
                               Number of obs =         24    R-squared     =  0.8925
                               Root MSE      =    1.11181    Adj R-squared =  0.7939
      
                        Source | Partial SS         df         MS        F    Prob>F
                 --------------+----------------------------------------------------
                         Model |    123.125         11   11.193182      9.06  0.0003
                               |
                         calib |  51.041667          1   51.041667     11.89  0.0261
                 subject|calib |  17.166667          4   4.2916667  
                 --------------+----------------------------------------------------
                         shape |  47.458333          3   15.819444     12.80  0.0005
                   calib#shape |  7.4583333          3   2.4861111      2.01  0.1662
                               |
                      Residual |  14.833333         12   1.2361111  
                 --------------+----------------------------------------------------
                         Total |  137.95833         23   5.9981884  
      
      
      ********Use margin to estimate the main effect of calib when shape=1***********
      
      
      margins, dydx(calib) at(shape=1)
      
      Average marginal effects                        Number of obs     =         24
      
      Expression   : Linear prediction, predict()
      dy/dx w.r.t. : 2.calib
      at           : shape           =           1
      
      ------------------------------------------------------------------------------
                   |            Delta-method
                   |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           2.calib |          3   1.111805     2.70   0.019     .5775843    5.422416
      ------------------------------------------------------------------------------
      Note: dy/dx for factor levels is the discrete change from the base level.


      Roman

      Comment


      • #4
        Originally posted by Phil Bromiley View Post
        You didn't get a quick response. You'll increase your chances of a helpful answer by following the FAQ on asking questions.

        There is an anova that is identical to the xtreg. I've done it with fixed effects, but it should also work with random effects. So, I'd work on it until you get the same results.
        Thanks for your response. I've read through the FAQ and included the dataex output at the very end of this post. I believe that was the gist of what you were getting at, but please let me know if I've missed anything else important.


        Originally posted by Roman Mostazir View Post
        Besides Phil's advice, couple of issues from your post I would like to point out:

        There probably is a misunderstanding about the nesting structure of your data as you said "Each subject participated in exactly 1 session, which are indexed by SessionID, meaning subjects are nested within sessions." This only means that sessions or subjects do not vary within one or the other and in that case there is no nesting structure. Rather Period is nested within subject as it varies within subjects. And I would use the xtreg command with subjectID to cluster the variance. If my understanding is correct about your data (again follow Phil's advice on how to make a meaningful post using dataex), then for the anova model, you have one between-subject error term subjectID | TreatD and one within-subject error term which is Period . This follows the anova command:
        Code:
        anova LiqPerc1_2_B TreatD / SubjectID | TreatD Period TreatD#Period, repeated(Period)
        Originally posted by Roman Mostazir View Post
        I cannot confirm whether they will be identical to xtreg as both are from different estimation methods and estimation of standard errors are different too. But that is something to follow with Phil's advice.

        Here is below one example with example dataset where the main-effect coefficient from -xtreg- was reproducable using -margins- after anova:
        Code:
        Get the dataset:
        
        use http://www.stata-press.com/data/r14/t77, clear
        
        *******Random effect**************
        
        xtset subject
        
        xtreg score calib##shape, re cluster(subject)
        
        Random-effects GLS regression Number of obs = 24
        Group variable: subject Number of groups = 3
        
        R-sq: Obs per group:
        within = 0.0000 min = 8
        between = 0.0000 avg = 8.0
        overall = 0.7680 max = 8
        
        Wald chi2(2) = .
        corr(u_i, X) = 0 (assumed) Prob > chi2 = .
        
        (Std. Err. adjusted for 3 clusters in subject)
        ------------------------------------------------------------------------------
        | Robust
        score | Coef. Std. Err. z P>|z| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        2.calib |  3 .6922187 4.33 0.000 1.643276 4.356724
        |
        shape |
        2 | -1 .6922187 -1.44 0.149 -2.356724 .3567236
        3 | 3 1.198958 2.50 0.012 .6500857 5.349914
        4 | .6666667 1.742045 0.38 0.702 -2.74768 4.081013
        |
        calib#shape |
        2 2 | -.6666667 1.057381 -0.63 0.528 -2.739096 1.405763
        2 3 | -1.333333 .3996526 -3.34 0.001 -2.116638 -.5500286
        2 4 | 1.666667 1.440968 1.16 0.247 -1.157579 4.490912
        |
        _cons | 2.333333 1.440968 1.62 0.105 -.4909121 5.157579
        -------------+----------------------------------------------------------------
        sigma_u | .86945522
        sigma_e | 1.1153688
        rho | .37797619 (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        
        *******Repeated measure Anova**********
        
        anova score calib / subject|calib shape calib#shape, repeated(shape)
        
        
        Number of obs = 24 R-squared = 0.8925
        Root MSE = 1.11181 Adj R-squared = 0.7939
        
        Source | Partial SS df MS F Prob>F
        --------------+----------------------------------------------------
        Model | 123.125 11 11.193182 9.06 0.0003
        |
        calib | 51.041667 1 51.041667 11.89 0.0261
        subject|calib | 17.166667 4 4.2916667
        --------------+----------------------------------------------------
        shape | 47.458333 3 15.819444 12.80 0.0005
        calib#shape | 7.4583333 3 2.4861111 2.01 0.1662
        |
        Residual | 14.833333 12 1.2361111
        --------------+----------------------------------------------------
        Total | 137.95833 23 5.9981884
        
        
        ********Use margin to estimate the main effect of calib when shape=1***********
        
        
        margins, dydx(calib) at(shape=1)
        
        Average marginal effects Number of obs = 24
        
        Expression : Linear prediction, predict()
        dy/dx w.r.t. : 2.calib
        at : shape = 1
        
        ------------------------------------------------------------------------------
        | Delta-method
        | dy/dx Std. Err. t P>|t| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        2.calib |  3  1.111805 2.70 0.019 .5775843 5.422416
        ------------------------------------------------------------------------------
        Note: dy/dx for factor levels is the discrete change from the base level.
        Thanks for your helpful response! I'm working on adapting this.

        Note: See very bottom for the full dataset (well, the relevant subsample for the minimum working example discussed here) using the dataex command.

        I've run the code you suggested, and here is the output I get:

        Code:
        . anova LiqPerc1_2_B TreatD / SubjectID|TreatD Period TreatD#Period, repeated(Period)
        
                                 Number of obs =        172    R-squared     =  0.6984
                                 Root MSE      =    21.1421    Adj R-squared =  0.4993
        
                          Source | Partial SS         df         MS        F    Prob>F
                -----------------+----------------------------------------------------
                           Model |  106603.84         68   1567.7035      3.51  0.0000
                                 |
                          TreatD |  991.47015          1   991.47015      0.49  0.4894
                SubjectID|TreatD |  96003.485         47   2042.6273  
                -----------------+----------------------------------------------------
                          Period |  8705.7182         11   791.42892      1.77  0.0687
                   TreatD#Period |  4871.4462          9    541.2718      1.21  0.2964
                                 |
                        Residual |  46039.706        103   446.98744  
                -----------------+----------------------------------------------------
                           Total |  152643.55        171   892.65232  
        
        
        Between-subjects error term:  SubjectID|TreatD
                             Levels:  51        (47 df)
             Lowest b.s.e. variable:  SubjectID
             Covariance pooled over:  TreatD    (for repeated variable)
        
        Repeated variable: Period
                                                  Huynh-Feldt epsilon        =    .
                                                  Greenhouse-Geisser epsilon =    .
                                                  Box's conservative epsilon =  0.0909
        
                                                    ------------ Prob > F ------------
                          Source |     df      F    Regular    H-F      G-G      Box
                -----------------+----------------------------------------------------
                          Period |     11     1.77   0.0687     .        .      0.2148
                   TreatD#Period |      9     1.21   0.2964     .        .      0.2840
                        Residual |    103
                ----------------------------------------------------------------------
        
        . margins, dydx(TreatD)
        
        Average marginal effects                        Number of obs     =        172
        
        Expression   : Linear prediction, predict()
        dy/dx w.r.t. : 1.TreatD
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
            1.TreatD |          .  (not estimable)
        ------------------------------------------------------------------------------
        Note: dy/dx for factor levels is the discrete change from the base level.
        I also tried several other variations, and they all return the same "not estimable" response. For example, I tried setting the option at(Period=2) in the margins command. I also tried using TreatD#c.Period in the actual anova command.

        (Also, for what it's worth, I'd prefer to stick to the xtreg analogy with errors clustered at the session level for now (rather than the subject level) since that's what the reviewer asked for. Once I better understand how to translate between xtreg and anova, then I can start adjusting the empirical specification a bit more.)


        Here's the relevant variables for the full dataset for the minimum working example:

        Code:
        . dataex SessionID SubjectID Period TreatD LiqPerc1_2_B if Agent==1 & TreatAV==1, count(192)
        clear
        input float(SessionID SubjectID) byte Period float(TreatD LiqPerc1_2_B)
         4  61 10 0        15
         4  61 11 0         4
         4  61 12 0         .
         4  64 10 0         .
         4  64 11 0      37.5
         4  64 12 0         .
         4  66 10 0        50
         4  66 11 0       100
         4  66 12 0        50
         4  68 10 0        50
         4  68 11 0        50
         4  68 12 0        45
         4  71 10 0       100
         4  71 11 0         .
         4  71 12 0      37.5
         4  72 10 0        50
         4  72 11 0         .
         4  72 12 0        52
         5  82 10 1       100
         5  82 11 1       100
         5  82 12 1       100
         5  83 10 1        30
         5  83 11 1  29.23077
         5  83 12 1        50
         5  85 10 1      87.5
         5  85 11 1        75
         5  85 12 1       100
         5  88 10 1 36.363636
         5  88 11 1  42.85714
         5  88 12 1        50
         5  90 10 1 33.333332
         5  90 11 1         .
         5  90 12 1         0
         5  91 10 1        20
         5  91 11 1 16.666666
         5  91 12 1 14.285714
         5  92 10 1       100
         5  92 11 1        50
         5  92 12 1         .
         5  95 10 1  22.22222
         5  95 11 1 18.181818
         5  95 12 1  28.57143
         6 101 10 1         0
         6 101 11 1       100
         6 101 12 1  47.61905
         6 102 10 1      12.5
         6 102 11 1        82
         6 102 12 1         .
         6 104 10 1        70
         6 104 11 1  85.71429
         6 104 12 1  83.33334
         6 107 10 1        25
         6 107 11 1         0
         6 107 12 1        30
         6 109 10 1  91.66666
         6 109 11 1         .
         6 109 12 1  92.85714
         6 111 10 1         .
         6 111 11 1      12.5
         6 111 12 1        25
         8 141  4 0        40
         8 141  5 0  52.94118
         8 141  6 0  42.85714
         8 143  4 0        25
         8 143  5 0 33.333332
         8 143  6 0      37.5
         8 145  4 0  71.42857
         8 145  5 0       100
         8 145  6 0  55.55556
         8 148  4 0      97.5
         8 148  5 0  8.333333
         8 148  6 0  6.666667
         8 149  4 0        50
         8 149  5 0        25
         8 149  6 0        40
        10 181  1 0 33.333332
        10 181  2 0 34.285713
        10 181  3 0        20
        10 181  4 1        35
        10 181  5 1        30
        10 181  6 1        40
        10 181  7 0        25
        10 181  8 0  15.09434
        10 181  9 0        25
        10 182  1 0 66.666664
        10 182  2 0        50
        10 182  3 0        50
        10 182  4 1        50
        10 182  5 1  57.14286
        10 182  6 1        50
        10 182  7 0        50
        10 182  8 0        60
        10 182  9 0        50
        10 184  1 0        30
        10 184  2 0        20
        10 184  3 0        20
        10 184  4 1         .
        10 184  5 1        40
        10 184  6 1  41.66667
        10 184  7 0 33.333332
        10 184  8 0  21.42857
        10 184  9 0         .
        10 187  1 0        50
        10 187  2 0        75
        10 187  3 0 66.666664
        10 187  4 1        50
        10 187  5 1        80
        10 187  6 1 66.666664
        10 187  7 0        60
        10 187  8 0        75
        10 187  9 0  83.33334
        10 188  1 0 66.666664
        10 188  2 0        50
        10 188  3 0        60
        10 188  4 1        55
        10 188  5 1        40
        10 188  6 1         .
        10 188  7 0  41.66667
        10 188  8 0     81.25
        10 188  9 0         0
        10 191  1 0         .
        10 191  2 0        40
        10 191  3 0       100
        10 191  4 1       100
        10 191  5 1       100
        10 191  6 1         .
        10 191  7 0         .
        10 191  8 0         .
        10 191  9 0         .
        11 202  1 1        50
        11 202  2 1        50
        11 202  3 1         .
        11 202  4 0        60
        11 202  5 0      62.5
        11 202  6 0         .
        11 202  7 1        60
        11 202  8 1        25
        11 202  9 1        50
        11 205  1 1        50
        11 205  2 1        50
        11 205  3 1  29.87013
        11 205  4 0        50
        11 205  5 0 66.666664
        11 205  6 0        50
        11 205  7 1        30
        11 205  8 1        50
        11 205  9 1        50
        11 206  1 1        25
        11 206  2 1         0
        11 206  3 1 11.764706
        11 206  4 0         0
        11 206  5 0         0
        11 206  6 0        20
        11 206  7 1         0
        11 206  8 1 33.333332
        11 206  9 1      37.5
        11 207  1 1         0
        11 207  2 1        30
        11 207  3 1         0
        11 207  4 0        25
        11 207  5 0         0
        11 207  6 0        50
        11 207  7 1         0
        11 207  8 1        10
        11 207  9 1        25
        11 209  1 1       100
        11 209  2 1       100
        11 209  3 1       100
        11 209  4 0       100
        11 209  5 0         0
        11 209  6 0         0
        11 209  7 1       100
        11 209  8 1       100
        11 209  9 1       100
        11 210  1 1        50
        11 210  2 1         0
        11 210  3 1        50
        11 210  4 0       100
        11 210  5 0         0
        11 210  6 0         0
        11 210  7 1 66.666664
        11 210  8 1       100
        11 210  9 1       100
        11 212  1 1        50
        11 212  2 1        50
        11 212  3 1        30
        11 212  4 0        50
        11 212  5 0 33.333332
        11 212  6 0  42.85714
        11 212  7 1        52
        11 212  8 1        50
        11 212  9 1        50
        end
        Note: The missing values for the dependent variable (right-most column) are because that variable is generated as a percentage of two observed variables, the denominator of which can be zero.

        Comment


        • #5
          Originally posted by Matt McMahon View Post
          Once I better understand how to translate between xtreg and anova
          As stated in the sister thread, despite
          Code:
          xtset SubjectID Period
          Period is nowhere in the
          Code:
          xtreg LiqPerc1_2_B TreatD if Agent==1 & TreatAV==1, re cluster(SessionID)
          model.

          Thus, to translate between xtreg and anova, Period and interaction terms involving it will be absent:
          Code:
          anova LiqPerc1_2_B TreatD SubjectID
          Moreover, participant isn't nested within treatment group: look at participants and their treatment assignments in the last two sessions.

          Comment

          Working...
          X