Diff-in-diff with panel data, fixed-effect estimation and "faking" the baseline difference

Nicholas Winter

Join Date: Mar 2014
Posts: 122

Diff-in-diff with panel data, fixed-effect estimation and "faking" the baseline difference

25 Jul 2018, 08:59

I'm estimating a difference-in-differences model using panel data. I have two waves of data (pre-treatment and post-treatment) and two groups (treated and untreated). Treatment occurs between the two waves, only for the "treated" group.

This is relatively straightforward to estimate with a fixed effect model:

Code:

. xtset xwaveid wave

. xtreg outcome treat##wave, fe
note: 1.treat omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      5,716
Group variable: xwaveid                         Number of groups  =      3,094

R-sq:                                           Obs per group:
     within  = 0.0068                                         min =          1
     between = 0.0016                                         avg =        1.8
     overall = 0.0002                                         max =          2

                                                F(2,2620)         =       8.90
corr(u_i, Xb)  = -0.0783                        Prob > F          =     0.0001

---------------------------------------------------------------------------------
        outcome |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
          treat |
       Treated  |          0  (omitted)
                |
           wave |
        Wave 2  |    .055467   .0146629     3.78   0.000     .0267148    .0842191
                |
     treat#wave |
Treated#Wave 2  |  -.0708387   .0168099    -4.21   0.000    -.1038008   -.0378766
                |
          _cons |    .171515   .0047576    36.05   0.000     .1621859     .180844
----------------+----------------------------------------------------------------
        sigma_u |  .21878326
        sigma_e |  .25962105
            rho |    .415255   (fraction of variance due to u_i)
---------------------------------------------------------------------------------
F test that all u_i=0: F(3093, 2620) = 1.23                  Prob > F = 0.0000

So far so good. However, in a prior study that did NOT have panel data, I estimated DID models and then constructing nifty figures that sought to show the estimated effect, along with the parallel paths assumption. Those figures look like this:

Click image for larger version

Name: didplot.png
Views: 1
Size: 48.9 KB
ID: 1454966

I'd like to produce similar charts in the current analysis; however, with the fixed-effect estimation the wave-1 point estimates are not identified. My question is, would the following be a reasonable (or semi-reasonable) way to recover estimates of those wave-1 levels in order to produce the figure:

Code:

. predict xbu, xbu

. table treat if wave==1, c(mean xbu)

----------------------
    treat |  mean(xbu)
----------+-----------
Untreated |   .1241663
  Treated |   .1877528
----------------------

This produces the same wave-1 estimates that I get if I simply ignore the panel nature of the data:

Code:

. reg outcome treat##wave, cluster(xwaveid )

Linear regression                               Number of obs     =      5,716
                                                F(3, 3093)        =      13.16
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0057
                                                Root MSE          =     .27429

                               (Std. Err. adjusted for 3,094 clusters in xwaveid)
---------------------------------------------------------------------------------
                |               Robust
        outcome |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
          treat |
       Treated  |   .0635864   .0104371     6.09   0.000     .0431221    .0840507
                |
           wave |
        Wave 2  |   .0588923   .0138013     4.27   0.000     .0318317     .085953
                |
     treat#wave |
Treated#Wave 2  |  -.0769198   .0159763    -4.81   0.000    -.1082451   -.0455946
                |
          _cons |   .1241663   .0085618    14.50   0.000      .107379    .1409536
---------------------------------------------------------------------------------

. margins treat#wave

Adjusted predictions                            Number of obs     =      5,716
Model VCE    : Robust

Expression   : Linear prediction, predict()

-----------------------------------------------------------------------------------
                  |            Delta-method
                  |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
       treat#wave |
Untreated#Wave 1  |   .1241663   .0085618    14.50   0.000      .107379    .1409536
Untreated#Wave 2  |   .1830587    .011172    16.39   0.000     .1611534    .2049639
  Treated#Wave 1  |   .1877527    .005969    31.45   0.000     .1760492    .1994563
  Treated#Wave 2  |   .1697253   .0060542    28.03   0.000     .1578546     .181596
-----------------------------------------------------------------------------------

But this feels like cheating. More specifically, is this in general a valid, or semi-valid approach?

Thanks!

Tags: None

Announcement

Diff-in-diff with panel data, fixed-effect estimation and "faking" the baseline difference