Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Effect across panels and years using difference in differences on a panel data

    Hello, all Statalists!
    I want to measure the effect of rail station opening in certain city on the unemployment rate of the citizens of this city.

    In particular, my dataset is a panel data for years 2010-2019.
    I explore the opening of rail stations in cities (240, 874, 7700, 9200) on the same day in October 2016.
    "code" variable is zip code and "unemployed_rate" is the unemployment rate

    Thus,
    Code:
    * Define treatment
    gen treat = 0
    replace treat = 1 if (code == 9200) | (code == 7700) | (code == 874) | (code == 240)
    
    * Define post treatment period
    gen post = (year>2016)
    
    * Define interaction term
    gen treatXpost = treat*post
    I constructed a simple difference in differences model using year and city fixed-effects:
    Click image for larger version

Name:	מסך 2022-08-04 220340.png
Views:	1
Size:	8.3 KB
ID:	1676448


    Here is a subset of my dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double unemployed_rate float(treat post treatXpost) int(year code)
    5.1819999999999995 1 0 0 2010  240
     5.051666666666668 1 0 0 2011  240
     4.669999999999999 1 0 0 2012  240
    4.8933333333333335 1 0 0 2013  240
     4.400833333333335 1 0 0 2014  240
     3.750833333333333 1 0 0 2015  240
    3.5975000000000006 1 0 0 2016  240
     3.045833333333334 1 1 1 2017  240
    3.2591666666666663 1 1 1 2018  240
    3.4866666666666664 1 1 1 2019  240
     8.174166666666666 1 0 0 2010  874
                 7.675 1 0 0 2011  874
                7.6675 1 0 0 2012  874
     7.560833333333335 1 0 0 2013  874
     6.923333333333334 1 0 0 2014  874
     6.148333333333333 1 0 0 2015  874
     5.636666666666666 1 0 0 2016  874
    5.0441666666666665 1 1 1 2017  874
     5.184166666666667 1 1 1 2018  874
     5.230833333333332 1 1 1 2019  874
    6.3954545454545455 0 0 0 2010 2800
     6.121666666666667 0 0 0 2011 2800
     5.351666666666667 0 0 0 2012 2800
                 6.265 0 0 0 2013 2800
     6.279999999999999 0 0 0 2014 2800
     5.345833333333333 0 0 0 2015 2800
     5.319999999999999 0 0 0 2016 2800
     4.803333333333334 0 1 0 2017 2800
     4.550833333333334 0 1 0 2018 2800
                  4.64 0 1 0 2019 2800
                  9.03 0 0 0 2010 6700
     8.739999999999998 0 0 0 2011 6700
     8.588333333333333 0 0 0 2012 6700
     8.024166666666666 0 0 0 2013 6700
                7.1475 0 0 0 2014 6700
                 6.625 0 0 0 2015 6700
     6.050000000000001 0 0 0 2016 6700
     5.246666666666667 0 1 0 2017 6700
                 5.155 0 1 0 2018 6700
    5.3933333333333335 0 1 0 2019 6700
                 8.693 1 0 0 2010 7700
     8.064166666666669 1 0 0 2011 7700
                  7.32 1 0 0 2012 7700
     7.219166666666667 1 0 0 2013 7700
     6.536666666666667 1 0 0 2014 7700
    6.0841666666666665 1 0 0 2015 7700
     5.583333333333334 1 0 0 2016 7700
     4.894166666666666 1 1 1 2017 7700
     4.870833333333334 1 1 1 2018 7700
     5.370000000000002 1 1 1 2019 7700
     9.121000000000002 1 0 0 2010 9200
     8.931666666666667 1 0 0 2011 9200
                8.7475 1 0 0 2012 9200
     8.623333333333335 1 0 0 2013 9200
                8.1325 1 0 0 2014 9200
     7.823333333333333 1 0 0 2015 9200
     7.217499999999999 1 0 0 2016 9200
    6.1191666666666675 1 1 1 2017 9200
    5.8725000000000005 1 1 1 2018 9200
     5.758333333333335 1 1 1 2019 9200
    end
    Following my model specification, I run this fixed-effect regression:
    Code:
    . xtset code year
    
    Panel variable: code (strongly balanced)
     Time variable: year, 2010 to 2019
             Delta: 1 unit
    
    . xtreg unemployed_rate treat post treatXpost i.year, fe
    note: treat omitted because of collinearity.
    note: 2019.year omitted because of collinearity.
    
    Fixed-effects (within) regression               Number of obs     =         60
    Group variable: code                            Number of groups  =          6
    
    R-squared:                                      Obs per group:
         Within  = 0.8959                                         min =         10
         Between = 0.0007                                         avg =       10.0
         Overall = 0.4467                                         max =         10
    
                                                    F(10,44)          =      37.86
    corr(u_i, Xb) = 0.0002                          Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
    unemployed~e | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           treat |          0  (omitted)
            post |  -2.733078   .2969419    -9.20   0.000    -3.331525   -2.134631
      treatXpost |  -.0794973   .2529775    -0.31   0.755    -.5893399    .4303453
                 |
            year |
           2011  |  -.3352424   .2443995    -1.37   0.177    -.8277972    .1573123
           2012  |  -.7084369   .2443995    -2.90   0.006    -1.200992   -.2158821
           2013  |   -.668298   .2443995    -2.73   0.009    -1.160853   -.1757432
           2014  |  -1.195798   .2443995    -4.89   0.000    -1.688353   -.7032432
           2015  |   -1.80302   .2443995    -7.38   0.000    -2.295575   -1.310465
           2016  |  -2.198437   .2443995    -9.00   0.000    -2.690992   -1.705882
           2017  |  -.1209722   .2443995    -0.49   0.623     -.613527    .3715825
           2018  |  -.1644444   .2443995    -0.67   0.505    -.6569992    .3281103
           2019  |          0  (omitted)
                 |
           _cons |   7.765937   .1728165    44.94   0.000     7.417648    8.114226
    -------------+----------------------------------------------------------------
         sigma_u |  1.2343438
         sigma_e |  .42331229
             rho |  .89476537   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(5, 44) = 85.02                      Prob > F = 0.0000
    I have two questions:
    1. How can I see the effect of the treatment across cities? Namely, how can I see effect of the station opening on (code == 9200) | (code == 7700) | (code == 874) | (code == 240) separately?
    2. How can I see the effect of the treatment across cities and over time? This is, I want to explore the effect of the station opening on code == 9200 in 2010, 2011, ..., 2019 and on code == 7700 in 2010, 2011, ..., 2019...
    Many Thanks!
    Last edited by Asaf Yancu; 04 Aug 2022, 13:26.

  • #2
    You have a very small number of observations, as an aside, I would use
    Code:
    vce(robust)
    as an option, although you may want to consult Stock and Watson (2006) on that matter.

    treat drops because it is collinear with the unit fixed effects.

    The way I see it, you have three options, although you are really constrained by the fact that there are only 60 observations (that really does not play in your favour).

    - Within-between mixed effects model, described in McNeish and Kelley (2019)

    - Sample splits (although to be fair I do not think you have sufficient information for this)

    - Drop
    Code:
    treatXpost
    and replace it with
    Code:
    i.year#i.treated
    : this gives you the year-specific differential effect, relative to the base period, of treatment on the outcome.

    Comment


    • #3
      Thank you Maxence.
      I know my subset consists of only 60 observations, but as I mentioned above, this is a subset of my database.
      For the effect across cities, I am looking for more elegant way than sample splits, i.e., a solution that make use of code like your code for year-specific differential effect.

      Comment


      • #4
        To clarify my intention: I ultimately want to graph the effect of the treatment on each city and across time.
        I want to graph the effect of the station opening on code == 9200 in 2010, 2011, ..., 2019 and on code == 7700 in 2010, 2011, ..., 2019...
        Thanks!

        Comment


        • #5
          OK so two solutions:

          - A within-between mixed effects model as in McNeish and Kelley (2019)

          - Instead of interacting treatment and post, interact a dummy for each city with a dummy for each year. I am not sure if this is identifiable or whether your model will just be fully saturated if you do this. Try it with pooled OLS first perhaps.

          Comment

          Working...
          X