Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-differences random vs. fixed effects

    Hello,

    I am using a difference in differences approach to examine the effect of a treatment at the school level (the entire school was exposed); there are 5 treated schools and 5 controls. The question is: Does the treatment result in a higher number of active travellers at the treated school. The outcome variable is binary - active traveller: yes/no. Schoolyr is a surrogate for age. I have data at two time points (before and 6 months later).

    I have been advised to use linear regression with random effects (for school); however, my reading suggests that fixed effects is the default and the Hausman test should be used to help decide. I have since used Hausman which also suggests that fixed effects should be used. I would be really grateful for any advice on how to decide on whether fixed or random effects should be used in my case? Thank you in advance for any help or signposting.

    Code:
    xtset schoolname
    xtreg activetravel group##wave i.schoolyr i.gender, re
    
    estimates store random
    
    xtreg activetravel group##wave i.schoolyr i.gender, fe
    
    estimates store fixed
    
    hausman fixed random, sigmamore

    Code:
    . xtset schoolname 
    
    Panel variable: schoolname (unbalanced)
    
    . xtreg activetravel group##wave i.schoolyr i.gender, re 
    
    Random-effects GLS regression                   Number of obs     =      1,235
    Group variable: schoolname                      Number of groups  =          10
    
    R-squared:                                      Obs per group:
         Within  = 0.0015                                         min =        101
         Between = 0.3958                                         avg =      176.4
         Overall = 0.0106                                         max =        270
    
                                                    Wald chi2(7)      =      13.13
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0690
    
    ------------------------------------------------------------------------------------------------
                          activetravel| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------------------------+----------------------------------------------------------------
                             group |
                    SchoolTx|    .121159   .0405407     2.99   0.003     .0417007    .2006173
                                   |
                              wave |
                            After  |   -.010604   .0461146    -0.23   0.818     -.100987    .0797791
                                   |
                        group#wave |
              SchoolTx#After       |  -.0530355    .057781    -0.92   0.359    -.1662842    .0602132
                                   |
                          schoolyr |
                           Year 5  |   .0071617    .040096     0.18   0.858     -.071425    .0857484
                           Year 6  |   .0516574   .0566333     0.91   0.362    -.0593418    .1626566
                                   |
                            gender |
                       Male (boy)  |  -.0000905   .0285401    -0.00   0.997    -.0560281    .0558472
    Other / do not want to answer  |   .0266824   .1161086     0.23   0.818    -.2008863    .2542512
                                   |
                             _cons |    .497226   .0367652    13.52   0.000     .4251675    .5692846
    -------------------------------+----------------------------------------------------------------
                           sigma_u |          0
                           sigma_e |  .49140585
                               rho |          0   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------------------------
    
    . 
    . estimates store random
    
    . 
    . xtreg activetravelgroup##wave i.schoolyr i.gender, fe 
    note: 1.group omitted because of collinearity.
    
    Fixed-effects (within) regression               Number of obs     =      1,235
    Group variable: schoolname                      Number of groups  =          10
    
    R-squared:                                      Obs per group:
         Within  = 0.0016                                         min =        101
         Between = 0.3691                                         avg =      176.4
         Overall = 0.0001                                         max =        270
    
                                                    F(6,1222)         =       0.32
    corr(u_i, Xb) = -0.2873                         Prob > F          =     0.9246
    
    ------------------------------------------------------------------------------------------------
                          activetravel| Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------------------------+----------------------------------------------------------------
                             group |
                    SchoolTx|          0  (omitted)
                                   |
                              wave |
                            After  |  -.0127956   .0456243    -0.28   0.779    -.1023062    .0767149
                                   |
                        group#wave |
              SchoolTx#After        |  -.0477443   .0572251    -0.83   0.404    -.1600146     .064526
                                   |
                          schoolyr |
                           Year 5  |   .0034861   .0396779     0.09   0.930    -.0743583    .0813305
                           Year 6  |   .0413395   .0560763     0.74   0.461    -.0686769    .1513559
                                   |
                            gender |
                       Male (boy)  |   .0018304   .0283053     0.06   0.948    -.0537019    .0573628
    Other / do not want to answer  |   .0491031    .115235     0.43   0.670    -.1769773    .2751834
                                   |
                             _cons |   .5494247   .0316862    17.34   0.000     .4872594      .61159
    -------------------------------+----------------------------------------------------------------
                           sigma_u |  .11956556
                           sigma_e |  .49140585
                               rho |  .05589244   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------------------------
    F test that all u_i=0: F(6, 1222) = 6.80                     Prob > F = 0.0000
    
    . 
    . estimates store fixed
    
    . 
    . hausman fixed random, sigmamore
    
    Note: the rank of the differenced variance matrix (5) does not equal the number of coefficients being tested (6); be
            sure this is what you expect, or there may be problems computing the test.  Examine the output of your
            estimators for anything unexpected and possibly consider scaling your variables so that the coefficients are on
            a similar scale.
    
                     ---- Coefficients ----
                 |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                 |     fixed        random       Difference       Std. err.
    -------------+----------------------------------------------------------------
          1.wave |   -.0127956     -.010604       -.0021917        .0005302
      group#wave |
            1 1  |   -.0477443    -.0530355        .0052912        .0026981
        schoolyr |
              2  |    .0034861     .0071617       -.0036756        .0009404
              3  |    .0413395     .0516574       -.0103179        .0023659
          gender |
              2  |    .0018304    -.0000905        .0019209        .0020178
              3  |    .0491031     .0266824        .0224206         .009309
    ------------------------------------------------------------------------------
                              b = Consistent under H0 and Ha; obtained from xtreg.
               B = Inconsistent under Ha, efficient under H0; obtained from xtreg.
    
    Test of H0: Difference in coefficients not systematic
    
        chi2(5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                =  31.01
    Prob > chi2 = 0.0000
    (V_b-V_B is not positive definite)
    
    . 
    end of do-file
    Last edited by Lisa Ni Dhu; 28 Jun 2024, 05:39.

  • #2
    Greetings, Lisa.

    You only use random effects (effectively) when you're sure that there's no unobserved confounding at the school level that may explain the variation in the outcome. You'll get (if I recall correctly) better standard errors.

    But this is a pipe dream, usually. In real life, especially across 10 schools, there's pretty much always something that we're not controlling for in this instance that makes these schools outcomes different.

    Does the treatment result in a higher number of active travellers at the treated school. The outcome variable is binary - active traveller: yes/no.
    These seem like completely different constructs. The former is the raw number of active travelers. The latter is "are there any active travelers at all". You don't say, but I guess you're doing this at the student level. But why? Aren't we interested in the ATT for the schools? If everyone's exposed, why not just use the raw number of active travelers at each school as the outcome?

    I may sound super ignorant since I'm not an education researcher, but I really am curious here on your reasoning.

    Comment


    • #3
      Lisa, a few things. First, I suspect this isn't a true panel data set in the sense that you have different students in each period. The data are at the student level. As Jared said, you shouldn't be considering school random effects. The only think that makes sense is to include dummy variables for the schools and the two time periods, and then the controls.

      Having said that, you'll notice that the RE and FE estimates of the treatment effect are similar. This doesn't surprise me. In the balanced panel data case, I showed they're identical in my 2021 working paper on two-way fixed effects and the two-way Mundlak regression.

      What you're doing with the FE is fine. Clustering at the school level with five treated units, five controls is a bit dicey -- and you haven't done that. If you use vce(robust) with reg and i.schoolname (turned into an integer) then it should all work. No point in computing the Hausman test, which I think is wrong, anyway.

      BTW, the RE estimation is reducing to just OLS (which you can do with reg): the estimated variance of u is zero.

      Comment


      • #4
        Thank you both for your thoughtful responses.

        Jeff, you are correct - I have since realised that I was using xtset incorrectly (I thought the schoolname was the panel but as the data is at individual levels this doesn't make sense!).

        Am I correct in understanding your advice as:

        Code:
         
         regress activetravel group##wave i.schoolyr i.gender i.schoolname fsm, vce(robust)  //fsm is a cluster level variable
        My supervisor has advised me to compare Intervention School A vs.5 controls, Intervention School B vs 5 controls, Intervention School C vs 5 controls (etc) and then to run a meta-analysis. The reason for pooling the controls is because we do not have 2x pre intervention time points to confirm parallel trends.

        I would be really grateful if you could help with two follow-up questions:
        1. Can the i.schoolname (cluster variable) be removed using this new approach (1 vs. 5)
        2. Standard errors - due to the low numbers vce(cluster schoolname) seems to be inappropriate (?), vce(robust) is not appropriate (?) as the observations are not independent due to study design / clusters i.e. the whole school was exposed. What is the best option in this case?

        Thank you both again for your time.

        Comment

        Working...
        X