Effect across panels and years using difference in differences on a panel data

Asaf Yancu

Join Date: May 2021
Posts: 38

Effect across panels and years using difference in differences on a panel data

04 Aug 2022, 13:19

Hello, all Statalists!
I want to measure the effect of rail station opening in certain city on the unemployment rate of the citizens of this city.

In particular, my dataset is a panel data for years 2010-2019.
I explore the opening of rail stations in cities (240, 874, 7700, 9200) on the same day in October 2016.
"code" variable is zip code and "unemployed_rate" is the unemployment rate

Thus,

Code:

* Define treatment
gen treat = 0
replace treat = 1 if (code == 9200) | (code == 7700) | (code == 874) | (code == 240)

* Define post treatment period
gen post = (year>2016)

* Define interaction term
gen treatXpost = treat*post

I constructed a simple difference in differences model using year and city fixed-effects:

Click image for larger version

Name: מסך 2022-08-04 220340.png
Views: 1
Size: 8.3 KB
ID: 1676448

Here is a subset of my dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double unemployed_rate float(treat post treatXpost) int(year code)
5.1819999999999995 1 0 0 2010  240
 5.051666666666668 1 0 0 2011  240
 4.669999999999999 1 0 0 2012  240
4.8933333333333335 1 0 0 2013  240
 4.400833333333335 1 0 0 2014  240
 3.750833333333333 1 0 0 2015  240
3.5975000000000006 1 0 0 2016  240
 3.045833333333334 1 1 1 2017  240
3.2591666666666663 1 1 1 2018  240
3.4866666666666664 1 1 1 2019  240
 8.174166666666666 1 0 0 2010  874
             7.675 1 0 0 2011  874
            7.6675 1 0 0 2012  874
 7.560833333333335 1 0 0 2013  874
 6.923333333333334 1 0 0 2014  874
 6.148333333333333 1 0 0 2015  874
 5.636666666666666 1 0 0 2016  874
5.0441666666666665 1 1 1 2017  874
 5.184166666666667 1 1 1 2018  874
 5.230833333333332 1 1 1 2019  874
6.3954545454545455 0 0 0 2010 2800
 6.121666666666667 0 0 0 2011 2800
 5.351666666666667 0 0 0 2012 2800
             6.265 0 0 0 2013 2800
 6.279999999999999 0 0 0 2014 2800
 5.345833333333333 0 0 0 2015 2800
 5.319999999999999 0 0 0 2016 2800
 4.803333333333334 0 1 0 2017 2800
 4.550833333333334 0 1 0 2018 2800
              4.64 0 1 0 2019 2800
              9.03 0 0 0 2010 6700
 8.739999999999998 0 0 0 2011 6700
 8.588333333333333 0 0 0 2012 6700
 8.024166666666666 0 0 0 2013 6700
            7.1475 0 0 0 2014 6700
             6.625 0 0 0 2015 6700
 6.050000000000001 0 0 0 2016 6700
 5.246666666666667 0 1 0 2017 6700
             5.155 0 1 0 2018 6700
5.3933333333333335 0 1 0 2019 6700
             8.693 1 0 0 2010 7700
 8.064166666666669 1 0 0 2011 7700
              7.32 1 0 0 2012 7700
 7.219166666666667 1 0 0 2013 7700
 6.536666666666667 1 0 0 2014 7700
6.0841666666666665 1 0 0 2015 7700
 5.583333333333334 1 0 0 2016 7700
 4.894166666666666 1 1 1 2017 7700
 4.870833333333334 1 1 1 2018 7700
 5.370000000000002 1 1 1 2019 7700
 9.121000000000002 1 0 0 2010 9200
 8.931666666666667 1 0 0 2011 9200
            8.7475 1 0 0 2012 9200
 8.623333333333335 1 0 0 2013 9200
            8.1325 1 0 0 2014 9200
 7.823333333333333 1 0 0 2015 9200
 7.217499999999999 1 0 0 2016 9200
6.1191666666666675 1 1 1 2017 9200
5.8725000000000005 1 1 1 2018 9200
 5.758333333333335 1 1 1 2019 9200
end

Following my model specification, I run this fixed-effect regression:

Code:

. xtset code year

Panel variable: code (strongly balanced)
 Time variable: year, 2010 to 2019
         Delta: 1 unit

. xtreg unemployed_rate treat post treatXpost i.year, fe
note: treat omitted because of collinearity.
note: 2019.year omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =         60
Group variable: code                            Number of groups  =          6

R-squared:                                      Obs per group:
     Within  = 0.8959                                         min =         10
     Between = 0.0007                                         avg =       10.0
     Overall = 0.4467                                         max =         10

                                                F(10,44)          =      37.86
corr(u_i, Xb) = 0.0002                          Prob > F          =     0.0000

------------------------------------------------------------------------------
unemployed~e | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       treat |          0  (omitted)
        post |  -2.733078   .2969419    -9.20   0.000    -3.331525   -2.134631
  treatXpost |  -.0794973   .2529775    -0.31   0.755    -.5893399    .4303453
             |
        year |
       2011  |  -.3352424   .2443995    -1.37   0.177    -.8277972    .1573123
       2012  |  -.7084369   .2443995    -2.90   0.006    -1.200992   -.2158821
       2013  |   -.668298   .2443995    -2.73   0.009    -1.160853   -.1757432
       2014  |  -1.195798   .2443995    -4.89   0.000    -1.688353   -.7032432
       2015  |   -1.80302   .2443995    -7.38   0.000    -2.295575   -1.310465
       2016  |  -2.198437   .2443995    -9.00   0.000    -2.690992   -1.705882
       2017  |  -.1209722   .2443995    -0.49   0.623     -.613527    .3715825
       2018  |  -.1644444   .2443995    -0.67   0.505    -.6569992    .3281103
       2019  |          0  (omitted)
             |
       _cons |   7.765937   .1728165    44.94   0.000     7.417648    8.114226
-------------+----------------------------------------------------------------
     sigma_u |  1.2343438
     sigma_e |  .42331229
         rho |  .89476537   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(5, 44) = 85.02                      Prob > F = 0.0000

I have two questions:

How can I see the effect of the treatment across cities? Namely, how can I see effect of the station opening on (code == 9200) | (code == 7700) | (code == 874) | (code == 240) separately?
How can I see the effect of the treatment across cities and over time? This is, I want to explore the effect of the station opening on code == 9200 in 2010, 2011, ..., 2019 and on code == 7700 in 2010, 2011, ..., 2019...

Many Thanks!

Last edited by Asaf Yancu; 04 Aug 2022, 13:26.

Tags: None

Maxence Morlet

Join Date: Mar 2021

Posts: 653
#2

04 Aug 2022, 13:29

You have a very small number of observations, as an aside, I would use

Code:

vce(robust)

as an option, although you may want to consult Stock and Watson (2006) on that matter.

treat drops because it is collinear with the unit fixed effects.

The way I see it, you have three options, although you are really constrained by the fact that there are only 60 observations (that really does not play in your favour).

- Within-between mixed effects model, described in McNeish and Kelley (2019)

- Sample splits (although to be fair I do not think you have sufficient information for this)

- Drop

Code:

treatXpost

and replace it with

Code:

i.year#i.treated

: this gives you the year-specific differential effect, relative to the base period, of treatment on the outcome.
Comment
Asaf Yancu

Join Date: May 2021

Posts: 38
#3

04 Aug 2022, 14:13

Thank you Maxence.
I know my subset consists of only 60 observations, but as I mentioned above, this is a subset of my database.
For the effect across cities, I am looking for more elegant way than sample splits, i.e., a solution that make use of code like your code for year-specific differential effect.
Comment
Asaf Yancu

Join Date: May 2021

Posts: 38
#4

05 Aug 2022, 00:22

To clarify my intention: I ultimately want to graph the effect of the treatment on each city and across time.
I want to graph the effect of the station opening on code == 9200 in 2010, 2011, ..., 2019 and on code == 7700 in 2010, 2011, ..., 2019...
Thanks!
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#5

05 Aug 2022, 00:34

OK so two solutions:

- A within-between mixed effects model as in McNeish and Kelley (2019)

- Instead of interacting treatment and post, interact a dummy for each city with a dummy for each year. I am not sure if this is identifiable or whether your model will just be fully saturated if you do this. Try it with pooled OLS first perhaps.
Comment

Announcement

Effect across panels and years using difference in differences on a panel data

Comment

Comment

Comment

Comment