Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -csdid- and -didregress- give different results with non-staggered timing

    I am exploring -csdid- (user written by FernandoRios as an alternative to TWFE under non-staggered timing. According to my understanding if treatment is given at the same time, the standard DiD and CSDID should give similar results.

    I tried csdid and didregress with some (publicly available) example datasets, and in most cases, the results from the two procedures were not the same. Could someone help me understand why this is happening? Pls let me know if my code is incorrect. Following is my code.

    Code:
    cap ssc install drdid
    cap ssc install csdid
    
    
    * Example 1
    
    use "https://dss.princeton.edu/training/Panel101.dta", clear
    
    gen time = 0
    replace time = 1 if year>=1994
    
    gen treated = 0
    replace treated = 1 if country>4
    
    gen did = time*treated
    
    didregress (y) (did), group(country) time(year)
    
    *replicating with reghdfe for faster estimation
    reghdfe y did i.country i.year
    
    
    gen gvar=1994 if treated==1
    replace gvar=0 if treated==0
    
    
    csdid y, gvar(gvar) time(year)
    estat simple //results not same
    
    
    *Example 2
    
    use "https://www.stata-press.com/data/r19/hospdd", clear
    
    
    bysort hospital (month):egen evertreated= max(procedure)
    gen after=1 if month>=4
    replace after=0 if month<4
    gen did=after*ever
    
    didregress (satis) (did), group(hospital) time(month)
    
    gen gvar=4 if evertreated==1
    replace gvar=0 if evertreated==0
    
    csdid satis, gvar(gvar) time(month)
    estat simple // results quite similar
    
    *Example 3
    use "http://www.princeton.edu/~otorres/WDI.dta", clear
    gen after = (year >= 2009) if !missing(year)
    merge m:1 country using "http://www.princeton.edu/~otorres/Treated.dta", gen(merge1)
    
    replace treated = 0 if treated == .
    gen did = after * treated
    encode country, gen(country1)
    didregress (gdppc) (did), group(country1) time(year)
    
    gen gvar=2009 if treated==1
    replace gvar=0 if treated==0
    
    csdid gdppc, gvar(gvar) time(year)
    estat simple 
    
    *retried with long option
    csdid gdppc, gvar(gvar) time(year) long
    estat simple

  • #2
    I have some reasons they differ, but you should first show the Stata output.

    Comment


    • #3
      Please see the Stata output below. Examples 1 and 3 give different results with the two methods. Example 2 gives almost similar results with didregress and csdid.

      The actual data I use is giving wildly different results with didregress and csdid (statistically significance is also changing along with signs of some coefficients). Please advise how should I justify the use of CSDID over TWFE (or vice versa).

      Thank you!


      Code:
      ------------------------------------------------------------------------------------------------------------------------------
            name:  <unnamed>
             log:  C:\Users\parul\OneDrive\Documents\CSDIDResults.log
        log type:  text
       opened on:  13 Jul 2025, 17:42:06
      
      . 
      . * Example 1
      . 
      . use "https://dss.princeton.edu/training/Panel101.dta", clear
      
      . 
      . gen time = 0
      
      . replace time = 1 if year>=1994
      (42 real changes made)
      
      . 
      . gen treated = 0
      
      . replace treated = 1 if country>4
      (30 real changes made)
      
      . 
      . gen did = time*treated
      
      . 
      . didregress (y) (did), group(country) time(year)
      
      Number of groups and treatment time
      
      Time variable: year
      Control:       did = 0
      Treatment:     did = 1
      -----------------------------------
                   |   Control  Treatment
      -------------+---------------------
      Group        |
           country |         4          3
      -------------+---------------------
      Time         |
           Minimum |      1990       1994
           Maximum |      1990       1994
      -----------------------------------
      
      Difference-in-differences regression                        Number of obs = 70
      Data type: Repeated cross-sectional
      
                                      (Std. err. adjusted for 7 clusters in country)
      ------------------------------------------------------------------------------
                   |               Robust
                 y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      ATET         |
               did |
         (1 vs 0)  |  -2.52e+09   1.15e+09    -2.20   0.070    -5.32e+09    2.85e+08
      ------------------------------------------------------------------------------
      Note: ATET estimate adjusted for group effects and time effects.
      
      . 
      . *replicating with reghdfe for faster estimation
      . reghdfe y did i.country i.year
      (MWFE estimator converged in 1 iterations)
      
      HDFE Linear regression                            Number of obs   =         70
      Absorbing 1 HDFE group                            F(  16,     53) =       2.09
                                                        Prob > F        =     0.0228
                                                        R-squared       =     0.3874
                                                        Adj R-squared   =     0.2024
                                                        Within R-sq.    =     0.3874
                                                        Root MSE        =  2.693e+09
      
      ------------------------------------------------------------------------------
                 y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               did |  -2.52e+09   1.33e+09    -1.90   0.063    -5.18e+09    1.43e+08
                   |
           country |
                B  |  -1.51e+09   1.20e+09    -1.26   0.214    -3.93e+09    9.01e+08
                C  |  -3.83e+08   1.20e+09    -0.32   0.751    -2.80e+09    2.03e+09
                D  |   1.91e+09   1.20e+09     1.59   0.118    -5.03e+08    4.33e+09
                E  |   5.07e+08   1.44e+09     0.35   0.727    -2.39e+09    3.40e+09
                F  |   3.34e+09   1.44e+09     2.32   0.025     4.47e+08    6.24e+09
                G  |   1.49e+09   1.44e+09     1.03   0.307    -1.41e+09    4.39e+09
                   |
              year |
             1991  |   6.17e+08   1.44e+09     0.43   0.670    -2.27e+09    3.50e+09
             1992  |   6.47e+08   1.44e+09     0.45   0.655    -2.24e+09    3.53e+09
             1993  |   3.20e+09   1.44e+09     2.22   0.031     3.11e+08    6.08e+09
             1994  |   4.74e+09   1.55e+09     3.06   0.003     1.63e+09    7.84e+09
             1995  |   2.62e+09   1.55e+09     1.69   0.096    -4.85e+08    5.72e+09
             1996  |   3.49e+09   1.55e+09     2.26   0.028     3.89e+08    6.60e+09
             1997  |   4.80e+09   1.55e+09     3.10   0.003     1.70e+09    7.91e+09
             1998  |   2.07e+09   1.55e+09     1.34   0.186    -1.03e+09    5.18e+09
             1999  |   2.70e+09   1.55e+09     1.75   0.086    -4.01e+08    5.81e+09
                   |
             _cons |  -7.61e+08   1.33e+09    -0.57   0.570    -3.43e+09    1.91e+09
      ------------------------------------------------------------------------------
      
      . 
      . 
      . gen gvar=1994 if treated==1
      (40 missing values generated)
      
      . replace gvar=0 if treated==0
      (40 real changes made)
      
      . 
      . 
      . csdid y, gvar(gvar) time(year)
      .........
      Difference-in-difference with Multiple Time Periods
      
                                                                  Number of obs = 70
      Outcome model  : regression adjustment
      Treatment model: none
      ------------------------------------------------------------------------------
                   | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      g1994        |
       t_1990_1991 |  -9.00e+08   2.68e+09    -0.34   0.737    -6.16e+09    4.36e+09
       t_1991_1992 |   1.41e+09   2.26e+09     0.62   0.533    -3.02e+09    5.84e+09
       t_1992_1993 |  -2.39e+09   2.42e+09    -0.99   0.323    -7.13e+09    2.35e+09
       t_1993_1994 |  -3.53e+08   3.03e+09    -0.12   0.907    -6.29e+09    5.58e+09
       t_1993_1995 |  -6.69e+09   3.62e+09    -1.85   0.065    -1.38e+10    4.04e+08
       t_1993_1996 |  -4.52e+08   2.74e+09    -0.17   0.869    -5.81e+09    4.91e+09
       t_1993_1997 |   7.73e+08   2.48e+09     0.31   0.755    -4.08e+09    5.63e+09
       t_1993_1998 |  -2.98e+09   3.07e+09    -0.97   0.332    -9.00e+09    3.04e+09
       t_1993_1999 |   2.46e+09   3.05e+09     0.81   0.420    -3.52e+09    8.43e+09
      ------------------------------------------------------------------------------
      Control: Never Treated
      
      See Callaway and Sant'Anna (2021) for details
      
      . estat simple //results not same
      Average Treatment Effect on Treated
      ------------------------------------------------------------------------------
                   | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               ATT |  -1.21e+09   2.50e+09    -0.48   0.629    -6.11e+09    3.70e+09
      ------------------------------------------------------------------------------
      
      . 
      . 
      . *Example 2
      . 
      . use "https://www.stata-press.com/data/r19/hospdd", clear
      (Artificial hospital admission procedure data)
      
      . 
      . 
      . bysort hospital (month):egen evertreated= max(procedure)
      
      . gen after=1 if month>=4
      (3,684 missing values generated)
      
      . replace after=0 if month<4
      (3,684 real changes made)
      
      . gen did=after*ever
      
      . 
      . didregress (satis) (did), group(hospital) time(month)
      
      Number of groups and treatment time
      
      Time variable: month
      Control:       did = 0
      Treatment:     did = 1
      -----------------------------------
                   |   Control  Treatment
      -------------+---------------------
      Group        |
          hospital |        28         18
      -------------+---------------------
      Time         |
           Minimum |         1          4
           Maximum |         1          4
      -----------------------------------
      
      Difference-in-differences regression                     Number of obs = 7,368
      Data type: Repeated cross-sectional
      
                                    (Std. err. adjusted for 46 clusters in hospital)
      ------------------------------------------------------------------------------
                   |               Robust
             satis | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      ATET         |
               did |
         (1 vs 0)  |   .8479879   .0321121    26.41   0.000     .7833108     .912665
      ------------------------------------------------------------------------------
      Note: ATET estimate adjusted for group effects and time effects.
      
      . 
      . gen gvar=4 if evertreated==1
      (4,304 missing values generated)
      
      . replace gvar=0 if evertreated==0
      (4,304 real changes made)
      
      . 
      . csdid satis, gvar(gvar) time(month)
      ......
      Difference-in-difference with Multiple Time Periods
      
                                                               Number of obs = 7,368
      Outcome model  : regression adjustment
      Treatment model: none
      ------------------------------------------------------------------------------
                   | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      g4           |
             t_1_2 |  -.0061649   .0799205    -0.08   0.939    -.1628062    .1504765
             t_2_3 |  -.0217322   .0922354    -0.24   0.814    -.2025103     .159046
             t_3_4 |   .8228153   .0947859     8.68   0.000     .6370383    1.008592
             t_3_5 |   .9040498   .0939959     9.62   0.000     .7198212    1.088278
             t_3_6 |    .844724   .0963428     8.77   0.000     .6558957    1.033552
             t_3_7 |   .8978885   .0949395     9.46   0.000     .7118104    1.083967
      ------------------------------------------------------------------------------
      Control: Never Treated
      
      See Callaway and Sant'Anna (2021) for details
      
      . estat simple // results quite similar
      Average Treatment Effect on Treated
      ------------------------------------------------------------------------------
                   | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               ATT |   .8673694   .0738888    11.74   0.000     .7225499    1.012189
      ------------------------------------------------------------------------------
      
      . 
      . *Example 3
      . use "http://www.princeton.edu/~otorres/WDI.dta", clear
      
      . gen after = (year >= 2009) if !missing(year)
      
      . merge m:1 country using "http://www.princeton.edu/~otorres/Treated.dta", gen(merge1)
      
          Result                      Number of obs
          -----------------------------------------
          Not matched                         1,276
              from master                     1,276  (merge1==1)
              from using                          0  (merge1==2)
      
          Matched                             1,496  (merge1==3)
          -----------------------------------------
      
      . 
      . replace treated = 0 if treated == .
      (1,276 real changes made)
      
      . gen did = after * treated
      
      . encode country, gen(country1)
      
      . didregress (gdppc) (did), group(country1) time(year)
      
      Number of groups and treatment time
      
      Time variable: year
      Control:       did = 0
      Treatment:     did = 1
      -----------------------------------
                   |   Control  Treatment
      -------------+---------------------
      Group        |
          country1 |        58         68
      -------------+---------------------
      Time         |
           Minimum |      2000       2009
           Maximum |      2000       2009
      -----------------------------------
      
      Difference-in-differences regression                     Number of obs = 2,772
      Data type: Repeated cross-sectional
      
                                   (Std. err. adjusted for 126 clusters in country1)
      ------------------------------------------------------------------------------
                   |               Robust
             gdppc | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      ATET         |
               did |
         (1 vs 0)  |   1164.492   624.4461     1.86   0.065    -71.36446    2400.348
      ------------------------------------------------------------------------------
      Note: ATET estimate adjusted for group effects and time effects.
      
      . 
      . gen gvar=2009 if treated==1
      (1,276 missing values generated)
      
      . replace gvar=0 if treated==0
      (1,276 real changes made)
      
      . 
      . csdid gdppc, gvar(gvar) time(year)
      .....................
      Difference-in-difference with Multiple Time Periods
      
                                                               Number of obs = 2,772
      Outcome model  : regression adjustment
      Treatment model: none
      ------------------------------------------------------------------------------
                   | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      g2009        |
       t_2000_2001 |    52.0571   4285.151     0.01   0.990    -8346.684    8450.798
       t_2001_2002 |   81.63228   4335.731     0.02   0.985    -8416.245     8579.51
       t_2002_2003 |   173.7856   4378.884     0.04   0.968     -8408.67    8756.241
       t_2003_2004 |   271.0033   4468.422     0.06   0.952    -8486.944     9028.95
       t_2004_2005 |   141.5645   4582.016     0.03   0.975    -8839.022    9122.151
       t_2005_2006 |   265.4404   4697.584     0.06   0.955    -8941.654    9472.535
       t_2006_2007 |   265.1601   4850.768     0.05   0.956    -9242.171    9772.492
       t_2007_2008 |  -35.19131   4911.036    -0.01   0.994    -9660.644    9590.262
       t_2008_2009 |  -115.9793    4787.77    -0.02   0.981    -9499.836    9267.878
       t_2008_2010 |   95.67771   4852.568     0.02   0.984     -9415.18    9606.536
       t_2008_2011 |     277.03   4903.081     0.06   0.955    -9332.831    9886.891
       t_2008_2012 |   313.4299   4922.722     0.06   0.949    -9334.927    9961.787
       t_2008_2013 |   502.7693   4959.788     0.10   0.919    -9218.237    10223.78
       t_2008_2014 |   551.7631   4970.492     0.11   0.912    -9190.222    10293.75
       t_2008_2015 |    502.061   4947.307     0.10   0.919    -9194.483     10198.6
       t_2008_2016 |   607.7407   4977.828     0.12   0.903    -9148.623     10364.1
       t_2008_2017 |   761.9342   5029.846     0.15   0.880    -9096.382    10620.25
       t_2008_2018 |   951.1796     5075.2     0.19   0.851     -8996.03    10898.39
       t_2008_2019 |   943.6327   5087.899     0.19   0.853    -9028.467    10915.73
       t_2008_2020 |   587.6601   4936.923     0.12   0.905    -9088.532    10263.85
       t_2008_2021 |   932.3051   5071.822     0.18   0.854    -9008.283    10872.89
      ------------------------------------------------------------------------------
      Control: Never Treated
      
      See Callaway and Sant'Anna (2021) for details
      
      . estat simple 
      Average Treatment Effect on Treated
      ------------------------------------------------------------------------------
                   | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               ATT |   531.6311   3596.333     0.15   0.882    -6517.052    7580.314
      ------------------------------------------------------------------------------
      
      . 
      . *retried with long option
      . csdid gdppc, gvar(gvar) time(year) long
      .....................
      Difference-in-difference with Multiple Time Periods
      
                                                               Number of obs = 2,772
      Outcome model  : regression adjustment
      Treatment model: none
      ------------------------------------------------------------------------------
                   | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      g2009        |
       t_2000_2008 |   1215.452   4585.502     0.27   0.791    -7771.968    10202.87
       t_2001_2008 |   1163.395   4609.932     0.25   0.801    -7871.906     10198.7
       t_2002_2008 |   1081.763   4632.805     0.23   0.815    -7998.369    10161.89
       t_2003_2008 |    907.977   4650.541     0.20   0.845    -8206.916    10022.87
       t_2004_2008 |   636.9737   4717.527     0.14   0.893    -8609.209    9883.156
       t_2005_2008 |   495.4092   4759.791     0.10   0.917    -8833.609    9824.428
       t_2006_2008 |   229.9688   4829.852     0.05   0.962    -9236.368    9696.306
       t_2007_2008 |  -35.19131   4911.036    -0.01   0.994    -9660.644    9590.262
       t_2008_2009 |  -115.9793    4787.77    -0.02   0.981    -9499.836    9267.878
       t_2008_2010 |   95.67771   4852.568     0.02   0.984     -9415.18    9606.536
       t_2008_2011 |     277.03   4903.081     0.06   0.955    -9332.831    9886.891
       t_2008_2012 |   313.4299   4922.722     0.06   0.949    -9334.927    9961.787
       t_2008_2013 |   502.7693   4959.788     0.10   0.919    -9218.237    10223.78
       t_2008_2014 |   551.7631   4970.492     0.11   0.912    -9190.222    10293.75
       t_2008_2015 |    502.061   4947.307     0.10   0.919    -9194.483     10198.6
       t_2008_2016 |   607.7407   4977.828     0.12   0.903    -9148.623     10364.1
       t_2008_2017 |   761.9342   5029.846     0.15   0.880    -9096.382    10620.25
       t_2008_2018 |   951.1796     5075.2     0.19   0.851     -8996.03    10898.39
       t_2008_2019 |   943.6327   5087.899     0.19   0.853    -9028.467    10915.73
       t_2008_2020 |   587.6601   4936.923     0.12   0.905    -9088.532    10263.85
       t_2008_2021 |   932.3051   5071.822     0.18   0.854    -9008.283    10872.89
      ------------------------------------------------------------------------------
      Control: Never Treated
      
      See Callaway and Sant'Anna (2021) for details
      
      . estat simple
      Average Treatment Effect on Treated
      ------------------------------------------------------------------------------
                   | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               ATT |   531.6311   3596.333     0.15   0.882    -6517.052    7580.314
      ------------------------------------------------------------------------------
      
      .

      Comment


      • #4
        I have nothing useful to say about the question of interest and importance. This is just to note that your code could be slimmed down without loss of clarity to people who might be expected to have good answers. The recurrent issue is taking two statements where one would work fine.

        I have extracted only example of code where two can go into one. Neither code block will run otherwise.


        Code:
        gen time = 0
        replace time = 1 if year>=1994  
        
        gen treated = 0
        replace treated = 1 if country>4  
        
        gen gvar=1994 if treated==1
        replace gvar=0 if treated==0  
        
        gen after=1 if month>=4
        replace after=0 if month<4  
        
        gen gvar=4 if evertreated==1
        replace gvar=0 if evertreated==0  
        
        gen gvar=2009 if treated==1
        replace gvar=0 if treated==0

        Code:
        gen time = year >= 1994  
        
        gen treated = country > 4  
        
        gen gvar = 1994 * treated  
        
        gen after = month >= 4  
        
        gen gvar = 4 * evertreated  
        
        gen gvar = 2009 * treated

        Comment


        • #5
          The main difference between the two methods, when you don’t have control variables, is that didregress averages all of the pre-treatment periods in order to create a pre-intervention comparison. A more flexible version then uses these to estimate a separate effect in every treated period. I’m now calling this a “lags only” estimator because it can be obtained from regression using current and lagged treatment indicators. csdid is a “leads and lags” estimator. Without controls, it can be obtained by including the leads of the indicators, too — omitting the one just before the intervention. This is the same as using only period a-1 as the reference period where q is the time of the intervention. You can find my shared Dropbox folder pinned to my Twitter account for lots more.

          Comment

          Working...
          X