Creating treatment and after Variables for Diff-in-diff

Amy ZZ

Join Date: Apr 2016

Posts: 1
#1

Creating treatment and after Variables for Diff-in-diff

24 Apr 2016, 13:05

Hi everyone,
I think I'm experiencing a brain fart because this task feels like it shouldn't be giving me this hard of a time.
I am supposed to run a diff-in-diff regression for before declaration of bankruptcy and after bankruptcy that occurred Dec 1, 1994.

So far my treatment command is:
gen treatment=1 if orange==1 & year<=1994 & month<12
Is this code right for the treatment group?

I'm stumped on the code for the after group. I was thinking:
gen after=1 if orange==1 & year<=1994 & ...(but how do I code for any day after Dec 1st?)

Best,
Amy (struggling and on the verge of pulling my hair out)
Tags: Time Series, treatment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

24 Apr 2016, 13:36

So let's get our terminology straight first. In a DID design, there are two key variables. One of them distinguishes the treatment condition from the control condition. We usually call that variable treatment. Entities in both groups are observed both before and after the treatment is initiated. We distinguish those time periods by another variable that indicates before and after using 0/1. Let's call that variable pre_post.

Now, one thing that will help you is getting a real date variable. Having a separate variable for month and year is rarely useful in Stata: it's difficult to even code for things like correct chronological order. So before you even think about DID issues, get your data fixed up:

Code:

gen monthly_date = mofd(mdy(month, 1, year)) format monthly_date %tm assert missing(monthly_date) == missing(month, year)

You can get further information about how to work with dates and times in Stata by reading the corresponding section in the [D] user manual. There is a lot there, and even the most experienced among us end up referring back to it often to refresh ourselves on the details. But do learn the concept of Stata internal dates which are intervals of time from a reference point. (In this instance, monthly dates are the number of months from January 1960.) By representing dates and times this way, we can use the variables easily in analysis: they sort chrnologically and differences between them correspond to elapsed time intervals. Stata provides many functions for calculating these internal dates from human-readable strings or separate numerical values of month day and year. Stata also provides display formats that can be attached to them so that when we look at output we see things like 1994m12 instead of 419.

So now your pre_post variable is easy to code:

Code:

gen byte pre_post = monthly_date < tm(1994m12) if !missing(monthly_date)

Note that the pre_post variable is coded exactly the same way in the treatment and control groups: it depends only on time, not on whether or not the treatment was ever applied to that entity.

Now you need your treatment variable. This variable needs to be coded 1 for every observation of every entity in the treatment group, including those observations that took place before the treatment occurred. Similarly it is coded 0 for every observation of every entity in the control group. I'm guessing that this variable is related to your variable orange, perhaps it even is orange. Nothing in your post really explains it.

Once you have those two variables, running the DID analysis is pretty simple:

Code:

xtset firm_id monthly_date xtreg outcome i.treat##i.prepost, fe

In the -xtset- command you replace "firm_id" with the actual variable that identifies units of observations in your data
.
Of course, there may be covariates you wish to include. And if you have a large enough number of panels, you will want to use -vce(cluster panelvar)-. And you may want to try a random effects model.

Note that in the fixed effects model, the treat variable will be omitted because it does not vary within firms. That is expected and is not a problem. What you are most interested in is the treat#prepost interaction term, which will not be omitted.

As an aside, there is a preference in this community for using our real first and last names as our username on the forum. It promotes a sense of professionalism and collegiality. You cannot change this yourself by editing your profile, but you can click on "CONTACT US" (right corner below) and message the forum administrator requesting he change your username.
1 like
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#3

02 Jul 2017, 13:23

As per your quote above
"Note that the pre_post variable is coded exactly the same way in the treatment and control groups: it depends only on time, not on whether or not the treatment was ever applied to that entity."

How to generate treatment group and control group in before pre_post ( before treatment). Also during pre_post there is only control group as at that time treatment is not implemented.

Last edited by Neeraj Kumar; 02 Jul 2017, 13:48.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

02 Jul 2017, 14:00

If you only have pre-treatment data for the control group, and none for the treatment group, then you don't have the data you need for a diff-in-diff analysis and you need to go back and get pre-treatment data on the treatment group. There is no way to code you way out of that problem if that's what you actually have.

Or you may be misunderstanding. The observations made before treatment began on those who ended up in the treatment group should have the group variable coded as "treatment" at all times, both before and after treatment was actually implemented. So it may be just a matter of going back to your data and replacing the value of your treatment variable with treat for any observation prior to implementation for the entities that ultimately ended up being treated. This was already explained in #2, where I said

Now you need your treatment variable. This variable needs to be coded 1 for every observation of every entity in the treatment group, including those observations that took place before the treatment occurred. Similarly it is coded 0 for every observation of every entity in the control group. I'm guessing that this variable is related to your variable orange, perhaps it even is orange. Nothing in your post really explains it.
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#5

02 Jul 2017, 14:24

Thank you so much for your reply and now I understand where I was wrong and it solved my problem. I really appreciate your efforts.
Comment
Faruque Sunny

Join Date: May 2018

Posts: 31
#6

05 May 2018, 08:43

Hello Mr. Clyde Schechter,

I am having difficulties to create treatment condition and control condition as well. I am trying to run DID regression to find out the impact of a decision support tool on farmers yield productivity.

I have total sample of 168 farmers which is divided into 2 farmers group (group A and B), each group conatins 84 farmers. In Group A, 42 farmers are from region "a" and other 42 from region 'b". For Group B the farmers and their regions segmented similarly. The data set also contains before (2016) and after (2017) condition of the respondents.

In 2016 none of the farmers from Group A and B used this facility. But In 2017 "Group A" farmers have adopted but "Group B" farmers did not.

I kindly request you to please suggest me how to create treatment and control condition for this study.

Thank you very much in advance.

Sincerely,
Faruque
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

05 May 2018, 12:08

Your explanation of the general problem is clear. But you do not show any example data, so, not knowing how your data are organized, I cannot give you code. Please post back with an example of your data. Use the -dataex- command to do that. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment

Agbenyo Wonder

Join Date: Nov 2019
Posts: 5

07 Feb 2020, 04:06

Hello, Mr. Clyde Schechter can you please help me on the command to differentiate my control group from treated groups. I am working on crop insurance on cocoa farmers income in Ghana and I have a total of 600 samples. The programme was rolled in some parts of the Ashanti region in Ghana. Out of the 6 districts, I gather data from, it was revealed that only two of the districts were treated and the rest four were not treated. Attached is a sample of my data, please
----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(ACF MS Gen Edu Hsz Fexp AgeF) int MAI byte Plns
46 2 1 3 8 10  4 1500 0
50 2 1 1 4 17  5 1650 0
35 1 1 3 6  6  3 1000 0
46 4 1 3 4 11  3 1900 0
50 2 1 1 6 19  3 1850 0
39 2 1 2 3  8  5  890 0
45 2 1 1 4 15  6 1760 0
37 2 1 1 3  9  5  790 0
48 2 2 1 3 13  3    . 0
53 2 1 1 7 20  5 1800 0
35 2 1 1 4 11  6  480 0
43 2 1 1 4  9  5 1570 0
43 2 2 1 3 10  3 2690 0
58 2 1 2 8 23  5  890 0
45 2 2 1 5 10  5    . 0
40 2 1 2 4  5  4 1350 0
50 5 2 2 6 20 10  950 0
36 2 1 1 2  6  7 1580 0
40 2 1 1 3  7  4    . 0
53 2 2 1 5 30  3 1500 0
50 2 1 1 8 17  5  580 0
50 2 2 2 7 13  6  700 0
35 2 1 2 2  6  2  970 0
47 2 2 2 4  5  5 1000 0
49 2 1 2 4 12  3 1800 0
55 2 1 1 8 21 12  800 0
39 2 1 2 2  4  6 2040 0
50 2 2 1 6 18 15  750 0
36 2 1 3 2  5 10  640 0
55 2 1 1 7 23  2 1830 0
50 2 1 3 6 13  8 1830 0
58 2 2 2 7 30  5 1900 0
35 2 2 2 3  6  4 2100 0
54 5 1 2 7 16  6    . 0
37 1 1 1 4  5  4 2010 0
41 2 1 2 2 12  8 1000 0
52 2 1 1 8 12  5 1070 0
54 2 2 1 6 30  7    . 0
55 2 1 1 7 24  5 1200 0
35 1 1 1 2 10  3  500 0
57 4 1 2 7 23  6 1750 0
48 2 1 2 8 14  4 2940 0
39 2 1 2 3 13  3 2400 0
56 2 2 1 6 19 12    . 0
55 2 1 1 7 19  9 2800 0
47 2 2 2 3 15 14 1790 0
51 5 1 1 4 23  5 2800 0
53 2 1 1 6 12  5 1700 0
48 2 1 2 5 11  3 1800 0
40 2 2 2 4  9  2 1300 0
54 2 1 2 6 11  4 1750 0
47 2 1 1 5 14  3 1400 0
49 2 2 1 6  5  5  700 0
54 2 1 1 9 20  6  800 0
50 2 2 1 6 25  4  300 0
49 2 1 1 5 18  6  700 0
55 2 1 2 7 20  7 1000 0
56 2 2 2 6 18  5 2100 0
35 2 1 1 4 12  4  500 0
48 2 1 2 6  8  6 5000 0
31 2 1 2 4  8  5 1950 0
46 2 1 2 3 20  7 2080 0
50 2 1 2 5 23  7 1600 0
56 2 1 3 7 16  5  690 0
43 2 1 2 3 13  3  900 0
44 2 2 1 4 12  4  900 0
53 5 2 2 5 20  4  850 0
45 2 2 2 4 12  6 1800 0
53 2 1 2 7 24  4 1500 0
52 4 2 2 6 20  6    . 0
50 2 2 3 5 32  4 1050 0
30 1 1 2 4 21  6 1200 0
35 2 2 2 3  4  5 1750 0
39 2 1 4 3  5  6    . 0
31 2 1 1 3  7  7 1500 0
38 2 1 2 5  4  4 2650 0
50 2 1 4 7 20  5    . 0
42 2 1 2 4 12  9 1450 0
49 5 2 2 4  5 10 1470 0
48 2 1 4 3 10  4 1740 0
50 2 1 3 6 20  6    . 0
37 2 2 4 4  4  5  400 0
52 4 1 2 8 19  4  680 0
46 2 2 1 4 20  6 2350 0
40 2 1 3 4 12  6 1450 0
31 1 2 4 2  8  9 1290 0
40 2 1 3 3  3  8 1800 0
43 2 2 1 4  5  6 4000 0
35 2 1 4 3  6  4 1600 0
51 2 2 1 5 16  8 1480 0
46 2 2 2 3 21  9 1370 0
40 2 1 2 2  8 10 1460 0
58 5 1 2 6 16  5 1590 0
43 4 1 1 3  9  7 1640 0
45 2 1 1 4  7 10 1000 0
37 2 2 2 2  4  5 1850 0
48 2 2 2 5 10  4 1500 0
36 2 2 2 2  4  3 1490 0
40 4 1 1 3  5  5 1350 0
30 1 1 3 3  9 10 4000 0
end

------------------ copy up to and including the previous line ------------------

This is my first time trying statalist so kindly pardon me if I did not do something right. Thanks and looking forward to your wonderful suggestions.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

07 Feb 2020, 12:14

Well, I do not see enough information in your post to answer your question. Your data example is nicely done with -dataex-, thank you. But I have no idea what any of the variable represent. I gather that you want to create a variable which indicates two out of the 6 districts, but I cannot even tell which of the variables tells us what district the observation refers to. I can't even make an educated guess because none of your variables takes on 6 distinct values. Even if I knew the right variable, you haven't told me how you would know which of the districts are the two that rolled out the plan.

Please post back with clarification.
Comment
Steffen Mauch

Join Date: Dec 2021

Posts: 37
#10

19 Jan 2022, 04:06

Code:

gen byte pre_post = monthly_date < tm(1994m12) if !missing(monthly_date)

Clyde Schechter Do I understand correctly that the variable pre_post is equal to 1 for the pre-treatment period and 0 for the post-treatemnt period?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#11

19 Jan 2022, 11:31

The way I wrote it, yes pre is 1, and post is 0. If you prefer to have pre = 0 and post = 1, then it's

Code:

gen byte pre_post = monthly_date > tm(1994m12) if !missing(monthly)date)

Either way is suitable for subsequent analysis. You just have to be sure you know which you're doing when you interpret the results.
Comment
Steffen Mauch

Join Date: Dec 2021

Posts: 37
#12

19 Jan 2022, 12:57

Thanks for the clarification!
Comment

Announcement