Difference-in-differences random vs. fixed effects

Lisa Ni Dhu

Join Date: Jun 2024
Posts: 6

Difference-in-differences random vs. fixed effects

28 Jun 2024, 05:22

Hello,

I am using a difference in differences approach to examine the effect of a treatment at the school level (the entire school was exposed); there are 5 treated schools and 5 controls. The question is: Does the treatment result in a higher number of active travellers at the treated school. The outcome variable is binary - active traveller: yes/no. Schoolyr is a surrogate for age. I have data at two time points (before and 6 months later).

I have been advised to use linear regression with random effects (for school); however, my reading suggests that fixed effects is the default and the Hausman test should be used to help decide. I have since used Hausman which also suggests that fixed effects should be used. I would be really grateful for any advice on how to decide on whether fixed or random effects should be used in my case? Thank you in advance for any help or signposting.

Code:

xtset schoolname
xtreg activetravel group##wave i.schoolyr i.gender, re

estimates store random

xtreg activetravel group##wave i.schoolyr i.gender, fe

estimates store fixed

hausman fixed random, sigmamore

Code:

. xtset schoolname 

Panel variable: schoolname (unbalanced)

. xtreg activetravel group##wave i.schoolyr i.gender, re 

Random-effects GLS regression                   Number of obs     =      1,235
Group variable: schoolname                      Number of groups  =          10

R-squared:                                      Obs per group:
     Within  = 0.0015                                         min =        101
     Between = 0.3958                                         avg =      176.4
     Overall = 0.0106                                         max =        270

                                                Wald chi2(7)      =      13.13
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0690

------------------------------------------------------------------------------------------------
                      activetravel| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------------------------+----------------------------------------------------------------
                         group |
                SchoolTx|    .121159   .0405407     2.99   0.003     .0417007    .2006173
                               |
                          wave |
                        After  |   -.010604   .0461146    -0.23   0.818     -.100987    .0797791
                               |
                    group#wave |
          SchoolTx#After       |  -.0530355    .057781    -0.92   0.359    -.1662842    .0602132
                               |
                      schoolyr |
                       Year 5  |   .0071617    .040096     0.18   0.858     -.071425    .0857484
                       Year 6  |   .0516574   .0566333     0.91   0.362    -.0593418    .1626566
                               |
                        gender |
                   Male (boy)  |  -.0000905   .0285401    -0.00   0.997    -.0560281    .0558472
Other / do not want to answer  |   .0266824   .1161086     0.23   0.818    -.2008863    .2542512
                               |
                         _cons |    .497226   .0367652    13.52   0.000     .4251675    .5692846
-------------------------------+----------------------------------------------------------------
                       sigma_u |          0
                       sigma_e |  .49140585
                           rho |          0   (fraction of variance due to u_i)
------------------------------------------------------------------------------------------------

. 
. estimates store random

. 
. xtreg activetravelgroup##wave i.schoolyr i.gender, fe 
note: 1.group omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =      1,235
Group variable: schoolname                      Number of groups  =          10

R-squared:                                      Obs per group:
     Within  = 0.0016                                         min =        101
     Between = 0.3691                                         avg =      176.4
     Overall = 0.0001                                         max =        270

                                                F(6,1222)         =       0.32
corr(u_i, Xb) = -0.2873                         Prob > F          =     0.9246

------------------------------------------------------------------------------------------------
                      activetravel| Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------------------------+----------------------------------------------------------------
                         group |
                SchoolTx|          0  (omitted)
                               |
                          wave |
                        After  |  -.0127956   .0456243    -0.28   0.779    -.1023062    .0767149
                               |
                    group#wave |
          SchoolTx#After        |  -.0477443   .0572251    -0.83   0.404    -.1600146     .064526
                               |
                      schoolyr |
                       Year 5  |   .0034861   .0396779     0.09   0.930    -.0743583    .0813305
                       Year 6  |   .0413395   .0560763     0.74   0.461    -.0686769    .1513559
                               |
                        gender |
                   Male (boy)  |   .0018304   .0283053     0.06   0.948    -.0537019    .0573628
Other / do not want to answer  |   .0491031    .115235     0.43   0.670    -.1769773    .2751834
                               |
                         _cons |   .5494247   .0316862    17.34   0.000     .4872594      .61159
-------------------------------+----------------------------------------------------------------
                       sigma_u |  .11956556
                       sigma_e |  .49140585
                           rho |  .05589244   (fraction of variance due to u_i)
------------------------------------------------------------------------------------------------
F test that all u_i=0: F(6, 1222) = 6.80                     Prob > F = 0.0000

. 
. estimates store fixed

. 
. hausman fixed random, sigmamore

Note: the rank of the differenced variance matrix (5) does not equal the number of coefficients being tested (6); be
        sure this is what you expect, or there may be problems computing the test.  Examine the output of your
        estimators for anything unexpected and possibly consider scaling your variables so that the coefficients are on
        a similar scale.

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |     fixed        random       Difference       Std. err.
-------------+----------------------------------------------------------------
      1.wave |   -.0127956     -.010604       -.0021917        .0005302
  group#wave |
        1 1  |   -.0477443    -.0530355        .0052912        .0026981
    schoolyr |
          2  |    .0034861     .0071617       -.0036756        .0009404
          3  |    .0413395     .0516574       -.0103179        .0023659
      gender |
          2  |    .0018304    -.0000905        .0019209        .0020178
          3  |    .0491031     .0266824        .0224206         .009309
------------------------------------------------------------------------------
                          b = Consistent under H0 and Ha; obtained from xtreg.
           B = Inconsistent under Ha, efficient under H0; obtained from xtreg.

Test of H0: Difference in coefficients not systematic

    chi2(5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
            =  31.01
Prob > chi2 = 0.0000
(V_b-V_B is not positive definite)

. 
end of do-file

Last edited by Lisa Ni Dhu; 28 Jun 2024, 05:39.

Tags: None

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

28 Jun 2024, 06:59

Greetings, Lisa.

You only use random effects (effectively) when you're sure that there's no unobserved confounding at the school level that may explain the variation in the outcome. You'll get (if I recall correctly) better standard errors.

But this is a pipe dream, usually. In real life, especially across 10 schools, there's pretty much always something that we're not controlling for in this instance that makes these schools outcomes different.

Does the treatment result in a higher number of active travellers at the treated school. The outcome variable is binary - active traveller: yes/no.

These seem like completely different constructs. The former is the raw number of active travelers. The latter is "are there any active travelers at all". You don't say, but I guess you're doing this at the student level. But why? Aren't we interested in the ATT for the schools? If everyone's exposed, why not just use the raw number of active travelers at each school as the outcome?

I may sound super ignorant since I'm not an education researcher, but I really am curious here on your reasoning.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2171
#3

28 Jun 2024, 12:16

Lisa, a few things. First, I suspect this isn't a true panel data set in the sense that you have different students in each period. The data are at the student level. As Jared said, you shouldn't be considering school random effects. The only think that makes sense is to include dummy variables for the schools and the two time periods, and then the controls.

Having said that, you'll notice that the RE and FE estimates of the treatment effect are similar. This doesn't surprise me. In the balanced panel data case, I showed they're identical in my 2021 working paper on two-way fixed effects and the two-way Mundlak regression.

What you're doing with the FE is fine. Clustering at the school level with five treated units, five controls is a bit dicey -- and you haven't done that. If you use vce(robust) with reg and i.schoolname (turned into an integer) then it should all work. No point in computing the Hausman test, which I think is wrong, anyway.

BTW, the RE estimation is reducing to just OLS (which you can do with reg): the estimated variance of u is zero.
Comment
Lisa Ni Dhu

Join Date: Jun 2024

Posts: 6
#4

10 Jul 2024, 07:41

Thank you both for your thoughtful responses.

Jeff, you are correct - I have since realised that I was using xtset incorrectly (I thought the schoolname was the panel but as the data is at individual levels this doesn't make sense!).

Am I correct in understanding your advice as:

Code:

regress activetravel group##wave i.schoolyr i.gender i.schoolname fsm, vce(robust) //fsm is a cluster level variable

My supervisor has advised me to compare Intervention School A vs.5 controls, Intervention School B vs 5 controls, Intervention School C vs 5 controls (etc) and then to run a meta-analysis. The reason for pooling the controls is because we do not have 2x pre intervention time points to confirm parallel trends.

I would be really grateful if you could help with two follow-up questions:
1. Can the i.schoolname (cluster variable) be removed using this new approach (1 vs. 5)
2. Standard errors - due to the low numbers vce(cluster schoolname) seems to be inappropriate (?), vce(robust) is not appropriate (?) as the observations are not independent due to study design / clusters i.e. the whole school was exposed. What is the best option in this case?

Thank you both again for your time.
Comment

Announcement

Difference-in-differences random vs. fixed effects

Comment

Comment

Comment