Diff-in-diff collinearity problem with leads and lags

Sebastian Andersson

Join Date: Mar 2019
Posts: 4

Diff-in-diff collinearity problem with leads and lags

21 Mar 2019, 08:54

Dear Statlist users,

This is my first post so I hope I've done everything according to Statalist practice. If not, please point out any wrong-doings so that I may improve for future posts. Thanks!

I'm currently writing my master thesis in economics where I'm going to use a difference-in-difference (DID) strategy. I use export data from 2016M1-2016M12 for a treatment group and 195 control groups. Treatment occurs at month 7.

I have primarily followed this guide http://www.princeton.edu/~otorres/DID101.pdf but I have encountered a collinearity problem when I was to prove the parallel trends and the dynamic effects in the same regression (works fine when I do leads and lags separately). As I have 195 control groups my intention was to instead use leads and lags as recommended by https://stats.stackexchange.com/ques...mon-trend-betw and also done by Autor (2003) (http://economics.mit.edu/files/589 (p24 for regression output and p26 for graph)).

However, when doing so my leads and lags are being omitted due to collinearity. This is my code

Code:

*Make data into a timeseries
xtset Country Month, monthly

*Create lnValue
gen lnValue = ln(Value)

*Create DID components
*1. Timedummy
gen time = (Month>=7) & !missing(Month)

*2. Treatmentdummy
gen treated = (Country==165) & !missing(Country)

*3. Create interaction
gen did = time*treated

*Create leads for parallel trends assumption
gen did1=f.did
gen did2=ff.did
gen did3=fff.did
gen did4=ffff.did
gen did5=fffff.did

*Treatment lags for dynamic effects
gen did11=l.did
gen did22=ll.did
gen did33=lll.did
gen did44=llll.did
gen did55=lllll.did

reg lnValue time treated did1 did2 did3 did4 did5 did did11 did22 did33 did44 did55 i.Country i.Month, cluster(Country)

which results in

Code:

                           (Std. Err. adjusted for 197 clusters in Country)
------------------------------------------------------------------------------
             |               Robust
     lnValue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        time |   -.105447   .1623735    -0.65   0.517    -.4256704    .2147765
     treated |    7.64953   .0811867    94.22   0.000     7.489418    7.809641
        did1 |          0  (omitted)
        did2 |          0  (omitted)
        did3 |          0  (omitted)
        did4 |          0  (omitted)
        did5 |          0  (omitted)
         did |   .0272714   .1623735     0.17   0.867     -.292952    .3474949
       did11 |          0  (omitted)
       did22 |          0  (omitted)
       did33 |          0  (omitted)
       did44 |          0  (omitted)
       did55 |          0  (omitted)
          .
          .
          /Tons of FE estimates/
          .
          .
       _cons |   7.953271   .0811867    97.96   0.000     7.793159    8.113382

I noticed that Autor (2003) doesn't have a constant but by applying noconstant the problem still remains. It's also the same when removing the time and country fixed effects.

Any help would be very much appreciated, thanks!

//Sebastian

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

21 Mar 2019, 12:53

You can't have 11 lags and leads (including the 0 lag = original variable) in a data set where there are only 12 time periods. Remember that when you include, for example, three lags and three leads in the data set, you lose the first and last three observations for each country because there is not enough data forward or behind to calculate lags or leads for those observations. So as you add more lags and leads, you whittle down your sample. By the time you get up to 5 lags and leads, only months 6 and 7 remain in the estimation sample. At that point, if you know time and treated, you already can calculate both did and all of its lags and leads for those two months, hence the colinearity. You are simply attempting to do the impossible here. Either expand the time range at both ends, or drop down to just one or two lags and leads.

By the way, you don't need to explicitly create those lag and lead variables. You can save yourself typing time (and opportunities for errors) by taking advantage of Stata's time series operators:

Code:

tsset country month regress lnvalue time treated L(-5/5).did // etc.

will automatically incorporate all those lags and leads, as well as the original did. See -help tsvarlist- for details.
2 likes
Comment
Sebastian Andersson

Join Date: Mar 2019

Posts: 4
#3

22 Mar 2019, 01:53

Thank you Clyde!

Best regards,
Sebastian
Comment
Sebastian Andersson

Join Date: Mar 2019

Posts: 4
#4

15 Apr 2019, 00:42

Originally posted by Clyde Schechter View Post

You can't have 11 lags and leads (including the 0 lag = original variable) in a data set where there are only 12 time periods...

Hi again,

I continue with another question, though relevant to the same problem...

I've extended my dataset to go from 2016m1 to 2017m12 and also included product data to the panel. What I want is to be able to evaluate the dynamic effects of treatment as previoulsy. However, this time by doing it on half year basis so that I would have a lag for the 6th and 12th period post treatment. However, they get omitted due to the reasons Clyde mentioned above. My questions are then:

a) is my only possibility of doing this by adding more observations or is there any other possible ways of doing what I want without using the DID lags?

b) by adding more observations, wouldn't it be the case that the estimates becomes less precise as they are based on observations further away from the exogenous treatment?

For those interested, my code is now

Code:

*Make data into timeseries egen id = group(country product) tsset id month *Crete lnValue gen lnvalue = ln(value) *Create DID components gen time = (month>=7) & !missing(month) gen treated = (country==25) & !missing(country) gen did = time*treated *Regress reg lnvalue time treated did L(6).did L(12).did i.country i.month i.product, r cluster(country)

Thanks,
Sebastian
Comment
Michael Duarte

Join Date: Feb 2020

Posts: 6
#5

28 Mar 2020, 03:28

Dear Statalist users,

I hope you are going well. First of all, this is my second post on statalist, so I apologize if this is not the correct way to do it.

I need your help. I am writing a Master thesis, which contains multiple time periods and multiple groups. Basically, I want to study a Policy fare change in public transport in Switzerland in some states and take other states as controls.

I have basically a set-up pretty similar to the one presented by David Autor (2003). This is why I am trying to replicate this figure from his paper, in order to check my DiD assumption.

I am trying to replicate this table in order to then Apply to my Master thesis. Any of you know how can I replicate this figure on stata?

I have tried multiple codes, namely the "coefplot", and since yesterday I was unable to reproduce this figure.

I really hope that any of you can help me. Thank you so much in advance for the help provided.

Best,

Michael Duarte
Comment

Announcement

Diff-in-diff collinearity problem with leads and lags

Comment

Comment

Comment

Comment