Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • CSDID and DID regressions produce different standard errors

    Hello. I am using the staggered DID model (Callaway and Sant'Anna 2021) using the csdid package and I am comparing its results to several DID regressions.

    I don't understand why DID regressions and the csdid command produce different standard errors.

    Here are the results of the csdid command:

    Code:
    . encode comune, gen(idcomune)
    
    . csdid vote_share, ivar(idcomune) time(period) gvar(first_treated) reg
    Units always treated found. These will be ignored
    Panel is not balanced
    Will use observations with Pair balanced (observed at t0 and t1)
    ................
    Difference-in-difference with Multiple Time Periods
    
                                                            Number of obs = 38,042
    Outcome model  : regression adjustment
    Treatment model: none
    ------------------------------------------------------------------------------
                 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    g2           |
           t_1_2 |   .0161844   .0065602     2.47   0.014     .0033267    .0290421
           t_1_3 |    .079245    .022349     3.55   0.000     .0354418    .1230483
           t_1_4 |   .0790281   .0241041     3.28   0.001     .0317849    .1262712
           t_1_5 |   .0617656   .0208679     2.96   0.003     .0208653    .1026658
    -------------+----------------------------------------------------------------
    g3           |
           t_1_2 |   .0210849    .004212     5.01   0.000     .0128295    .0293402
           t_2_3 |    .040359   .0082862     4.87   0.000     .0241184    .0565996
           t_2_4 |   .0162996   .0067423     2.42   0.016      .003085    .0295142
           t_2_5 |   .0303758   .0085279     3.56   0.000     .0136614    .0470902
    -------------+----------------------------------------------------------------
    g4           |
           t_1_2 |  -.0439547   .0005086   -86.42   0.000    -.0449515   -.0429578
           t_2_3 |   .0564394   .0009929    56.85   0.000     .0544934    .0583853
           t_3_4 |  -.0692106   .0005165  -134.00   0.000    -.0702229   -.0681983
           t_3_5 |  -.0485215   .0006746   -71.92   0.000    -.0498437   -.0471992
    -------------+----------------------------------------------------------------
    g5           |
           t_1_2 |   -.014802   .0034881    -4.24   0.000    -.0216386   -.0079655
           t_2_3 |  -.0067809   .0046161    -1.47   0.142    -.0158283    .0022665
           t_3_4 |    .000345   .0074088     0.05   0.963     -.014176     .014866
           t_4_5 |   .0039694   .0074588     0.53   0.595    -.0106496    .0185884
    ------------------------------------------------------------------------------
    Control: Never Treated
    
    See Callaway and Sant'Anna (2021) for details
    I now try to imitate "g4 t_2_3" using a normal DID regression.

    I import a different dataset extrapolated for the dataset used with the csdid command. This new dataset only has observations in periods 2 and 3 for those municipalities treated in period 4 for the first time.

    I first delete unbalanced observations as the csdid command would do:

    Code:
    . egen var1 = count(vote_share), by(comune)
    
    . keep if var1==2
    (614 observations deleted)
    Then create a dummy for each of the two periods

    Code:
    . tab period, gen(dummyP)
    
    period |
             |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              2 |      7,462       50.00       50.00
              3 |      7,462       50.00      100.00
    ------------+-----------------------------------
          Total |     14,924      100.00
    And then run the DID regression:

    Code:
    . reg vote_share ever_treated##dummyP2
    
          Source |       SS           df       MS      Number of obs   =    14,924
    -------------+----------------------------------   F(3, 14920)     =   1727.22
           Model |  40.7331805         3  13.5777268   Prob > F        =    0.0000
        Residual |  117.286738    14,920  .007861041   R-squared       =    0.2578
    -------------+----------------------------------   Adj R-squared   =    0.2576
           Total |  158.019919    14,923  .010589018   Root MSE        =    .08866
    
    --------------------------------------------------------------------------------------
              vote_share | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    ---------------------+----------------------------------------------------------------
          1.ever_treated |    .039173   .0886685     0.44   0.659    -.1346281    .2129741
               1.dummyP2 |   .1044656   .0014516    71.96   0.000     .1016202    .1073109
                         |
    ever_treated#dummyP2 |
                    1 1  |   .0564394   .1253961     0.45   0.653    -.1893525    .3022312
                         |
                   _cons |   .1794675   .0010265   174.84   0.000     .1774555    .1814795
    --------------------------------------------------------------------------------------
    The resulting coefficient is the same: .0564394
    But standard errors are different: .0009929 in the csdid, .1253961 in the did reg.

    This always happens. Csdid's SEs are always different, either smaller (like in this case) or bigger.

    For example, in the g2 t_1_4, csdid gives a bigger SE: .0241041
    The DID' regression' SE is .0212055, smaller.

    Code:
    . reg vote_share ever_treated##dummyP2
    
          Source |       SS           df       MS      Number of obs   =    14,642
    -------------+----------------------------------   F(3, 14638)     =   1249.40
           Model |  22.6698886         3  7.55662952   Prob > F        =    0.0000
        Residual |  88.5335856    14,638  .006048202   R-squared       =    0.2039
    -------------+----------------------------------   Adj R-squared   =    0.2037
           Total |  111.203474    14,641  .007595347   Root MSE        =    .07777
    
    --------------------------------------------------------------------------------------
              vote_share | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    ---------------------+----------------------------------------------------------------
          1.ever_treated |   .0280523   .0149946     1.87   0.061    -.0013389    .0574435
               1.dummyP2 |  -.0784136   .0012878   -60.89   0.000    -.0809378   -.0758893
                         |
    ever_treated#dummyP2 |
                    1 1  |   .0790281   .0212055     3.73   0.000     .0374626    .1205935
                         |
                   _cons |   .2259089   .0009106   248.09   0.000      .224124    .2276938
    --------------------------------------------------------------------------------------
    Why does this happen? Thank you.

  • #2
    Two reasons
    1. Csdid cluster standard errors at the individial
    level when using panel data
    2. does not incorporate any degrees of freedom adjustment

    Comment

    Working...
    X