Difference in Differences with "fake" treatment?

Cooper Felix

Join Date: Sep 2015

Posts: 84
#1

Difference in Differences with "fake" treatment?

13 Aug 2018, 20:38

Dear folks,

I'm trying to use DiD to analyze the impact of a legislative policy on large firms. The time variable (post_y2000) is clear and is defined as 1 if year > 2000 and 0 otherwise. For the treated variable, I assume that any firms with a total assets of at least 1 billion will be considered as LARGE(=1). As you can tell from this design, the same firm could be labelled as LARGE (=1) or not before/after the policy shock, depending on the change of its assets over time. I have an unbalanced sample ranging from Y1996-2015, the total firm-year observations are 40000+ (# of firms=5000+).

Option 1:

Code:

xtreg outcome i1.LARGE#i1.post_y2000 i.year, fe vce(cluster firm_id)

Option 2:

Code:

xtreg outcome i1.LARGE##i.post_y2000 i.year, fe vce(cluster firm_id)

Note that for the second specification, Stata generates coefficients for LARGE, post_y2000, and LARGE#post_y2000. I suspect this is due to the fact that (1) the treated variable is not constant over time even for the same firm (therefore I am getting the main effect of LARGE); and (2) Stata omits one year to estimate the main effect of post_y2000 because of the sequence of variables.

My main questions are:
1. Can anyone tell me which of the above specifications is correct if I want to adopt a DiD strategy?
2. Do I have a valid identification strategy?

Thanks!

Last edited by Cooper Felix; 13 Aug 2018, 20:42.
Tags: Causal effect, difference-in-differences, identification, treatment, xtreg
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

13 Aug 2018, 20:54

Well, since your firms can migrate in and out of the treated group, you can't use a classical DID strategy. What you are using is a similar technique known as generalized DID. The estimated treatment effect is still the coefficient of the interaction term.

As for your two model specifications, if LARGE and post_y2000 are both coded 0/1, then I don't see any difference between them. If you are actually getting different outputs from these two, I would need to see the actual results to try to understand what is going on.

Do I have a valid identification strategy?

Well, there are lots of issues even with classical DID, despite its widespread use and acceptance. But suffice it to say that your modeling approach is within the realm of generalized DID and partakes of whatever validiity one can ascribe to DID as an identification strategy in any case. To support the validity of your identification strategy, you will probably want to do some robustness checks, and also look into parallel trends to the extent that this design permits (which is somewhat limited).

May I suggest you take a look at https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf.
1 like
Comment

Cooper Felix

Join Date: Sep 2015
Posts: 84

13 Aug 2018, 21:30

Originally posted by Clyde Schechter View Post

As for your two model specifications, if LARGE and post_y2000 are both coded 0/1, then I don't see any difference between them. If you are actually getting different outputs from these two, I would need to see the actual results to try to understand what is going on.

May I suggest you take a look at https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf.

Clyde,

Thanks for your quick response and suggestions! I appreciate that. Here is the output I got after estimating both options with a another policy shock (Y1997-2015, Time=(year>=2009)). As you can tell from the results, the treatment effects are significant in both cases but slightly different from each other.

Code:

. xtreg Debt i1.LARGE#i1.Time i.fyear ,fe vce(cluster id)

Fixed-effects (within) regression               Number of obs      =     51774
Group variable: gvkey                           Number of groups   =      6724

R-sq:  within  = 0.0122                         Obs per group: min =         1
       between = 0.0014                                        avg =       7.7
       overall = 0.0010                                        max =        19

                                                F(19,6723)         =     14.34
corr(u_i, Xb)  = -0.0534                        Prob > F           =    0.0000

                                 (Std. Err. adjusted for 6,724 clusters in id)
------------------------------------------------------------------------------
             |               Robust
        Debt |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  LARGE#Time |
        1 1  |  -.1405809   .0600679    -2.34   0.019    -.2583329   -.0228288
             |
       fyear |
       1998  |  -.0045652   .0581103    -0.08   0.937    -.1184798    .1093494
       1999  |   .0201603   .0611493     0.33   0.742    -.0997116    .1400323
       2000  |   .0944122   .0648395     1.46   0.145    -.0326937    .2215181
       2001  |  -.1613457   .0670758    -2.41   0.016    -.2928355   -.0298558
       2002  |  -.3377028   .0651349    -5.18   0.000    -.4653879   -.2100178
       2003  |    -.53123   .0636486    -8.35   0.000    -.6560015   -.4064584
       2004  |   -.413719   .0640347    -6.46   0.000    -.5392473   -.2881908
       2005  |  -.4172214   .0657385    -6.35   0.000    -.5460898    -.288353
       2006  |  -.3443016   .0662505    -5.20   0.000    -.4741736   -.2144297
       2007  |  -.3554322    .067528    -5.26   0.000    -.4878085   -.2230559
       2008  |  -.2934677   .0708821    -4.14   0.000     -.432419   -.1545163
       2009  |  -.4966254    .069738    -7.12   0.000     -.633334   -.3599168
       2010  |  -.4961773   .0693712    -7.15   0.000    -.6321667   -.3601879
       2011  |  -.5371058   .0696556    -7.71   0.000    -.6736528   -.4005588
       2012  |  -.4678401    .071422    -6.55   0.000    -.6078499   -.3278303
       2013  |  -.5071272   .0704326    -7.20   0.000    -.6451973    -.369057
       2014  |  -.5179306   .0716024    -7.23   0.000    -.6582941   -.3775671
       2015  |  -.5630094   .0711292    -7.92   0.000    -.7024452   -.4235736
             |
       _cons |   6.063471   .0545309   111.19   0.000     5.956573    6.170369
-------------+----------------------------------------------------------------
     sigma_u |  2.6638949
     sigma_e |  1.6920924
         rho |  .71251839   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg Debt i1.LARGE##i1.Time i.fyear ,fe vce(cluster id)
note: 2009.fyear omitted because of collinearity

Fixed-effects (within) regression               Number of obs      =     51774
Group variable: gvkey                           Number of groups   =      6724

R-sq:  within  = 0.0128                         Obs per group: min =         1
       between = 0.0129                                        avg =       7.7
       overall = 0.0214                                        max =        19

                                                F(20,6723)         =     14.40
corr(u_i, Xb)  = 0.0774                         Prob > F           =    0.0000

                                 (Std. Err. adjusted for 6,724 clusters in id)
------------------------------------------------------------------------------
             |               Robust
        Debt |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     1.LARGE |   .2169561   .0512187     4.24   0.000     .1165512    .3173611
      1.Time |  -.4608997   .0704945    -6.54   0.000    -.5990913   -.3227081
             |
  LARGE#Time |
        1 1  |  -.1920466   .0631816    -3.04   0.002    -.3159026   -.0681906
             |
       fyear |
       1998  |  -.0003771   .0581252    -0.01   0.995    -.1143208    .1135666
       1999  |   .0248761   .0611335     0.41   0.684    -.0949649    .1447171
       2000  |   .1039183   .0649047     1.60   0.109    -.0233154    .2311521
       2001  |  -.1549608   .0671097    -2.31   0.021    -.2865172   -.0234044
       2002  |  -.3302322   .0651262    -5.07   0.000    -.4579002   -.2025641
       2003  |  -.5225682   .0636821    -8.21   0.000    -.6474052   -.3977311
       2004  |   -.401639   .0640964    -6.27   0.000    -.5272883   -.2759896
       2005  |  -.4022637   .0658544    -6.11   0.000    -.5313593   -.2731682
       2006  |  -.3268594   .0663555    -4.93   0.000    -.4569372   -.1967815
       2007  |  -.3375771   .0676024    -4.99   0.000    -.4700992    -.205055
       2008  |  -.2743114   .0708727    -3.87   0.000    -.4132444   -.1353783
       2009  |          0  (omitted)
       2010  |   .0020116   .0366674     0.05   0.956    -.0698681    .0738912
       2011  |  -.0376444   .0366098    -1.03   0.304    -.1094112    .0341225
       2012  |    .031096   .0394778     0.79   0.431     -.046293    .1084851
       2013  |  -.0066876   .0406765    -0.16   0.869    -.0864264    .0730513
       2014  |  -.0163358   .0403868    -0.40   0.686    -.0955068    .0628352
       2015  |  -.0605946   .0425875    -1.42   0.155    -.1440795    .0228903
             |
       _cons |   5.965849   .0591461   100.87   0.000     5.849904    6.081794
-------------+----------------------------------------------------------------
     sigma_u |  2.6316314
     sigma_e |   1.691632
         rho |  .70761328   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

13 Aug 2018, 21:57

Sorry for my lousy eyes. Yes, of course they are different: the difference between # and ##. I have formed such a habit of always using the ## notation unless there is a strong reason to do otherwise (which, as it happens, there is in this case), that I sometimes don't notice that somebody has used #.

Now, you have here introduced a variable, time, that was not mentioned in your original post, so I don't quite know what to make of that. Assuming here that the intervention you are assessing was applied only in year 2000 and beyond, the post_y2000 variable is what you needed. Perhaps time is the same variable? If not, what is it?

On the assumption that your time variable is the same as post_y2000, then the correct specification of the generalized DID model is the first one, the one that has only a single #. The first key ingredient to the generalized DID model are a variable that takes on the value 1 in precisely those observations where the gvkey is actually receiving the intervention that year, and 0 elsewhere. So this would be precisely i.LARGE#i.post_y2000. The second key ingredient is gvkey fixed effects, which -xtreg- gives you "for free." And the third key ingredient is time fixed effects, which i.fyear gives you. So Voila! The first model has it all (assuming time is the same thing as post_y2000.)
2 likes
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#5

13 Aug 2018, 22:09

Originally posted by Clyde Schechter View Post

On the assumption that your time variable is the same as post_y2000, then the correct specification of the generalized DID model is the first one, the one that has only a single #. The first key ingredient to the generalized DID model are a variable that takes on the value 1 in precisely those observations where the gvkey is actually receiving the intervention that year, and 0 elsewhere. So this would be precisely i.LARGE#i.post_y2000. The second key ingredient is gvkey fixed effects, which -xtreg- gives you "for free." And the third key ingredient is time fixed effects, which i.fyear gives you. So Voila! The first model has it all (assuming time is the same thing as post_y2000.)

Yes, indeed, the variable of Time is the same as post_y2000 (I changed the name for simplicity, sorry about the confusion). One last question, if the first option (with a single #) reflects a generalized DID design, then what does the second option (with double pound signs) mean - how should I interpret the interaction term from the second option?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

13 Aug 2018, 22:40

The second model is just mis-specified. It's what you would use if you had a classic DID design, but you don't, so you can't use it. It has no interpretation.
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#7

13 Aug 2018, 22:45

Originally posted by Clyde Schechter View Post

The second model is just mis-specified. It's what you would use if you had a classic DID design, but you don't, so you can't use it. It has no interpretation.

I see, thanks Clyde for your patience and tremendous help!
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#8

14 Aug 2018, 09:52

Originally posted by Clyde Schechter View Post

The second model is just mis-specified. It's what you would use if you had a classic DID design, but you don't, so you can't use it. It has no interpretation.

Clyde,

Instead of using a time variable (i.e., post_y2000), can I use the following specification instead? Does it still reflect the idea of generalized DID? It seems to me the following model gives me a much detailed look, but I wanted to make sure I did this correctly. Thanks again.

Code:

xtreg outcome i.LARGE##i.year, fe vce(cluster firm_id)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#9

14 Aug 2018, 10:41

No. That will generate an i.LARGE term which is not wanted in this situation. You can use

Code:

xtreg outcome i.LARGE#i.year i.year, fe vce(cluster firm_id)

Its true that this gives you a more fine-grained look at what is going on. But be careful what you wish for. With a large number of years, you will be looking at a large number of interaction terms that you then have to make sense of. Make sure the added detail is worth the trouble.
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#10

14 Aug 2018, 12:43

Originally posted by Clyde Schechter View Post

No. That will generate an i.LARGE term which is not wanted in this situation. You can use

Code:

xtreg outcome i.LARGE#i.year i.year, fe vce(cluster firm_id)

Its true that this gives you a more fine-grained look at what is going on. But be careful what you wish for. With a large number of years, you will be looking at a large number of interaction terms that you then have to make sense of. Make sure the added detail is worth the trouble.

Got it, thanks for the advice.
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#11

15 Aug 2018, 00:07

Originally posted by Clyde Schechter View Post

May I suggest you take a look at https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf.

Clyde, thanks again for the suggestion. After reading the slides you mentioned, I'm still confused about the following parts:

1. Page 34 "Simple Extensions" - What are 'placebo laws' and how do I fit models according to these laws?

2. Page 36 "Phased in effect using lags" - According to my understanding, I should fit a model with the following code, right (assuming policy shock was in y2000, and I have a panel data from 1997 to 2003)?

Code:

tab year, g(y) xtreg outcome i1.treated#i1.y2000 i1.treated#i1.y1999 i1.treated#i1.y1998 i1.treated#i1.y1997 i.year, fe vce(cluster firm_id)

3. Page 38 "Testing for pre-treatment trends" - Similar to Q2, I guess I just need to specify the following in Stata, right?

Code:

xtreg outcome i1.treated#i1.y2000 i1.treated#i1.y2001 i1.treated#i1.y2002 i1.treated#i1.y2003 i.year, fe vce(cluster firm_id)

4. Can you please let me know what the major purposes are for doing Q2 and Q3?

5. Page 42 - I read the paper cited in the slides (Autor 2003), but I'm not sure I completely understand the meaning of plotting Fig. 3 (Page 42). Also, I'm not familiar with graphing in Stata, can you kindly point out which commands I should consider to replicate Fig. 3?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#12

15 Aug 2018, 00:56

One of the weaknesses of identification of causal effect in DID (generalized or classical) is that what you really identify in the model is that something changed at some point(s) in time, and it did so differently for those exposed to the intervention compared to those not so exposed. The point(s) in time are, in particular, the time at which the intervention was initiated. But it is also possible that other things were going on at the same time, and it is somewhat coincidental that the interventions were initiated then as well. Or, it may be worse than coincidental: the interventions may have been implemented precisely in response to an ongoing change in the very outcome. (Governments may adopt stricter law enforcement practices in response to rising crime, for example.) So a finding from a DID analysis can be tripped up by these things. To strengthen the attribution of the change in outcomes to the intervention, as opposed to other things going on at or near the same times, one can revise the analysis using different times than the real intervention times (typically ones fairly close to the real intervention times)--and you should see the estimated effect shrink or disappear when you do that. If you do not see that happen, then your DID analysis may have stumbled on a coincidence or on reverse causality. This type of additional analysis is sometimes called a "placebo test."

For your second question, these are secondary analyses that may or may not be needed depending on the nature of the intervention under study and the nature of the response the intervention is expected to produce. If an intervention is not likely to produce an immediate change in the outcome, but rather a delayed or gradually mounting change in the outcome, then the regular DID (generalized or classical) analysis may fail to detect anything. But using the lagged versions of the time variable, you can pick them up. So, let's say the expectation is that after the intervention it will take 2 years for an effect to be noticeable. Then if an intervention begins in 2006, testing for a change in outcome in 2006 will find nothing, but testing for a change in 2008 will. So using the actual outcomes at a given time and use lagged treatment variables will produce a model that is more in line with the expected process: the 2008 outcome will be matched with the 2006 (start of intervention) treatment status. Whether one should do this modeling with a single or several lagged variables depends on whether the delayed response is pretty much regular (always 2 years--use a single second-lag variable) or varies among entities (you might need to use several lagged variables to capture the different possibilities).

For your third question, this is another kind of validity check. If an outcome is, say, drifting gradually upward over time in some group of entities, and if we look for a change in outcome at a point in time, which coincides with the start of an intervention, that lies in the middle of that period of upward drift, we will find a change. But that is a spurious finding: the change was already in progress before the intervention. By using lead treatment variables, you can determine whether in fact there were already emerging changes in the outcome prior to the actual intervention dates. Again, whether one or several leads are needed depends on whether the extent of pre-intervention trend is limited to one time period or extends back over a longer period.

For both your second and third questions, the code you suggest is not correct. Remember the T in the linked slide set is the "under treatment" variable, not time. (Time is denoted by lower case t subscripts in that slide set.) So you need to use lagged and linked versions of the interaction term LARGE#pre_post or LARGE#time. I think the best way to do this and be sure of getting it right is to actually create a separate interaction variable. -gen under_treatment = LARGE#pre_post-. Then -xtset gvkey year-. Then, for lags, do -xtreg outcome L1.under_treatment i.year, fe-. If you need several lags you can include L1.under_treatment, L2.under_treatment... etc. See -help tsvarlist- for more details about lag and lead operators. (FWIW, you can actually combine the lag and lead operators with the factor variable notation instead of first creating a separate under_treatment variable--but I think it gets difficult to read and is not intuitive, so I advise against it.)

The figure you refer to is a good way to visualize the data prior to (or after) analysis. You calculate a new variable, number of years since intervention by subtracting the time the intervention began for a given gvkey from the current time in the observation. (This will be negative for observations preceding the intervention.) Then you can average the outcomes for each value of this time since intervention variable and then just do a regular connected line plot of the average outcomes against time since intervention. What you would, ideally, see, is some trend prior to intervention that then makes a sharp bend precisely at the time of intervention and continues on its new course from then on.

Stata graphics is a huge topic and cannot be effectively taught in a Forum post. (It is also one that I don't really know in depth.) But for the particular graph in question, the central points and lines connecting them would come from a -graph twoway connect- plot, and the error bars around them wold be obtained by ovelaying an -rcap- plot on top of that.
1 like
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#13

15 Aug 2018, 09:35

Originally posted by Clyde Schechter View Post

For your second question, these are secondary analyses that may or may not be needed depending on the nature of the intervention under study and the nature of the response the intervention is expected to produce. If an intervention is not likely to produce an immediate change in the outcome, but rather a delayed or gradually mounting change in the outcome, then the regular DID (generalized or classical) analysis may fail to detect anything. But using the lagged versions of the time variable, you can pick them up. So, let's say the expectation is that after the intervention it will take 2 years for an effect to be noticeable. Then if an intervention begins in 2006, testing for a change in outcome in 2006 will find nothing, but testing for a change in 2008 will. So using the actual outcomes at a given time and use lagged treatment variables will produce a model that is more in line with the expected process: the 2008 outcome will be matched with the 2006 (start of intervention) treatment status. Whether one should do this modeling with a single or several lagged variables depends on whether the delayed response is pretty much regular (always 2 years--use a single second-lag variable) or varies among entities (you might need to use several lagged variables to capture the different possibilities).

Clyde,

I really appreciate your detailed response, which is extremely helpful! To make sure I fully understand what you suggested, I wanted to show you the models I am going to estimate (assuming there was a policy shock in 2006), can you let me know which one is correct if I want to detect if there is a phased in effect? In order not to lose too many observations with the unbalanced panel, I only included one lag.

Code:

gen treated=1 if LARGE==1 gen post2006=1 if year>=2006 gen under_treatment = treated*post2006 xtset gvkey year Option A: xtreg outcome L1.under_treatment i.year, fe Option B: xtreg outcome L(0/1).under_treatment i.year, fe Option C: xtreg f.outcome under_treatment i.year, fe
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#14

15 Aug 2018, 11:55

If by a phased-in effect, you mean that there is some effect in 2006 and perhaps a greater one in 2007, it would be Option B.
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#15

15 Aug 2018, 12:45

Originally posted by Clyde Schechter View Post

If by a phased-in effect, you mean that there is some effect in 2006 and perhaps a greater one in 2007, it would be Option B.

Thanks so much Clyde, I learned a lot from you! It was very nice of you spending time to answer my questions!
Comment

Announcement