How to calculate Difference before and after treatment for panel data? [Difference in Differences]

Tahseen Hasan

Join Date: Feb 2018

Posts: 33
#1

How to calculate Difference before and after treatment for panel data? [Difference in Differences]

06 Jan 2019, 14:15

Hi everyone,

I have a panel data on which I am trying to perform DID. The difficulty I am facing is that my treatment variable is applied at multiple time periods.

As a result, I can't compare Year X (post-treatment) - Year Y (pre-treatment) as my treatment does not occur at a set year.

I was hoping if someone could please help me with the code for this example dataset I created.

Question: In the following dataset, how can I calculate the difference between post-treatment Income and pre-treatment Income, so that I can use that in my DID model.

Information: Using propensity score matching (psmatch2), and controlling for Asset, I want to perform a Difference-in-Difference to see how mergers (indicated by YMerge) affect firm's Income. The treatment group are those firm-years that experience a merger, the control group are the matched firms. As a rule, the merger (ymerge) always occurs in the next year. I am interested in seeing how Income is affected in the actual year of merger ( ymerge[_n+0] OR year[_n+1] ) and 1 year after the merger takes place ( ymerge[_n+1] OR year[_n+2] )

What I need help with: For my dataset, I want to be able to calculate the difference between post-Treatment Income and pre-Treatment Income. After I have this, I will be able to use it in my psmatch2 DID model.

Dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int(id year asset income ymerge) 111 2000 10 10 2001 111 2001 30 40 . 111 2002 50 90 . 111 2003 50 100 2004 111 2004 90 120 . 111 2005 110 190 . 333 2000 15 10 2001 333 2001 20 45 2002 333 2002 60 90 . 333 2003 80 110 2004 333 2004 125 160 . 333 2005 175 240 2006 333 2006 190 290 . 333 2007 240 380 . 555 2000 40 10 . 555 2001 45 20 2002 555 2002 75 85 . 555 2003 130 195 . 555 2004 140 215 . end

Easier to read:

For reference:

I basically want to replicate the same procedure found in the following tutorial:

What they did is they calculated 'd_earn' which is the difference between re78 (real earnings for 1978) and re75 (real earnings for 1975) for each observation.

Then they included d_earn in their matched psmatch2 DID model to calculate ATT:

I am not being able to replicate their procedure because they have a cross-sectional dataset consisting of 2 years (technically 3) where there is a clearly defined pre-treatment period and a post-treatment period.

However in my panel dataset because the treatment occurs in multiple periods over time, I cannot clearly define a pre and post treatment period the way they have managed to.

Here is the link to the psmatch2 DID reference file I attached here if anyone wanted to take a look at it. It's a really nice step-by-step tutorial:

https://www.empiwifo.uni-freiburg.de...g_solution.doc

Last edited by Tahseen Hasan; 06 Jan 2019, 14:19.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

06 Jan 2019, 14:58

I don't understand the data, nor the problem, sufficiently clearly.

Look at ID 111. There are two different values of ymerge. So which is it? Or does that mean that id 111 receives treatment on two occasions, once in 2001 and again in 2004? If the last, do you assume that the treatment effect is the same on both occasions (strong assumption)?

Also, you refer to "seeing how Income is affected in the actual year of merger ( ymerge[_n+0] OR year[_n+1] ) and 1 year after the merger takes place ( ymerge[_n+1] OR year[_n+2] )." In most cases, ymerge[_n+1] is missing when ymerge[_n] is not. (I see only one exception in your example data.) I have the vague sense that you are trying to define some limited period of treatment effectiveness, say, running from the year specified in ymerge to the following year and the one following that. Is that correct? That is, ymerge, ymerge+1, and ymerge+2. And that means that after ymerge+1 the ID is effectively back in the same state as if it had never been treated? Am I right so far?

What happens with ID 333. That ID is treated in both 2001 and 2002 (if I'm interpreting things correctly. So in, for example, year 2003, that ID is in the last year of treatment effect from the 2001 event and in the penultimate year of treatment effect from the 2002 event. Are those overlapping treatment effects additive? Supra-additive? Sub-additive? No synergy or interference at all--i.e. 2003 is just a year with a treatment effect no bigger or smaller than if there had been only 1 treatment?
1 like
Comment
Tahseen Hasan

Join Date: Feb 2018

Posts: 33
#3

06 Jan 2019, 16:29

Hey Clyde, thank you for the response. I'll do my best to explan.

You are trying to define some limited period of treatment effectiveness,

Now that you mention it, it seems that I made a mistake here by trying to define it as a limited period of treatment because that is not the case in reality. Please ignore the first dataset I uploaded as I made this case a lot more complicated than it needed to be. If its ok please allow me to start fresh.

In this case, all I am trying to figure out is how should I perform a DID estimate (in regards to before treatment vs. after treatment) when the same subject receives multiple treatments over time.

So let me reframe the data (dataex provided at the bottom):

So if we look at ID 111:

They are receiving a treatment at year 2000 and then again in year 2003. There is no limited time effect.

In this instance, how am I supposed to calculate the pre-treatment Income and post-treatment Income in order to use it in a DID model?

I understand how the "multiple treatment periods" model works in theory for DID, but I do not know how to put it into practice using Stata.

Source: https://stats.stackexchange.com/ques...e-time-periods

As I mentioned earlier, all I want to do is replicate this tutorial. But my difficulty lies in the fact that in my case my subjects receive several treatments over time. In the case of the tutorial, the subjects simply receive a single uniform treatment at a single year (re78).

For reference:

I basically want to replicate the same procedure found in the following tutorial:

What they did is they calculated 'd_earn' which is the difference between re78 (real earnings for 1978) and re75 (real earnings for 1975) for each observation.

Then they included d_earn in their matched psmatch2 DID model to calculate ATT:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int(id year income) byte treat 111 2000 10 1 111 2001 40 0 111 2002 90 0 111 2003 100 1 111 2004 120 0 111 2005 190 0 333 2000 10 1 333 2001 45 1 333 2002 90 0 333 2003 110 1 333 2004 160 0 333 2005 240 1 333 2006 290 0 333 2007 380 0 555 2000 10 0 555 2001 20 1 555 2002 85 0 555 2003 195 0 555 2004 215 0 end

Last edited by Tahseen Hasan; 06 Jan 2019, 16:39.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#4

06 Jan 2019, 21:40

OK, I think I have a better understanding of your problem now. Actually, I am more confident that I understood it correctly in #2 when I asked you all those questions.

Your problem is that you are trying to apply an analysis that is simply not designed for the kind of data you have. The analyses you have cited, both the one from stackexchange and the one using -psmatch2- are designed for a situation where each ID undergoes the treatment at most one time--there are multiple treatment times only in the sense that different IDs undergo the treatment at different time (as opposed to treatment beginning simultaneously for all IDs). Your data are not like that. You have the same ID undergoing treatment more than once. Neither of the designs you have been exploring will handle that properly.

In fact, the situation is sufficiently uncommon that there is no standard name for it that I know of. Each such situation is its own special case. The difficulty is precisely the one you are stumbling over: how to distinguish pre- from post-. The simplest approach is to avoid the problem by only including the first treatment episode for each ID. All data from the second treatment and beyond are discarded. In that case you have a clean problem which you already know how to approach.

If you don't want to discard the data from the second treatment on, then you need to construct a much more complicated model that specifies whether each treatment has the same effect as the first one, or whether the effects grow, or perhaps, shrink with each subsequent treatment. You have to also consider whether the effects of treatment i are still present when treatment j (j > i) occurs, and if so whether the combined effects are additive, sub-additive, or supra-additive (i.e. no impact, interference, or synergy). You have to also consider how long the effects of each treatment can be assumed to last, and whether they taper off gradually or die abruptly. And you have to consider whether this pattern is the same for each treatment, or differs for different orders of treatment. In other words, there are many parameters that have to be stipulated for such modeling. It is because of this complexity, I believe, that this type of analysis is seldom undertaken.

In any case, I think you should stop beating your head against the wall here. You will not be able to apply either of the analyses you have in mind to this kind of data. It is like trying to put a square peg in a round hole--it isn't going to happen.
1 like
Comment
Tahseen Hasan

Join Date: Feb 2018

Posts: 33
#5

07 Jan 2019, 01:26

Hey Clyde, I think I will just have to switch to looking for a good IV perhaps. I definitely did learn a lot from this experience thanks to you.

Thank you again for all the help and advice you've given me over the last few days I really appreciate it.
Comment

Announcement

How to calculate Difference before and after treatment for panel data? [Difference in Differences]

Comment

Comment

Comment

Comment