DID Model with multiple treatment Group

liang liang

Join Date: Jan 2019

Posts: 1
#1

DID Model with multiple treatment Group

12 Jan 2019, 09:15

Dear All,

I am in the midst of doing up my thesis paper and have encountered some problem with regards to D-I-D Model with multiple treatment groups. Couldn't find anything online so I was hoping that fellow users here could help me out!

Basically I am trying to find out the effects of school closure on property prices. I have 3 different groups here.
Group 1 (treatment group) - Property 0km to 1km
Group 2 (treatment group) - Property 1km to 2km
Group 3 (control group) - Property 2km to 4km

May I know how I could incorporate them into one single equation? Thank you in advance for all your help!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

12 Jan 2019, 11:01

So assuming you have the other components of a DID analysis: an outcome variable (let's call it prices), a pre-post variable that distinguishes the pre-closure era from the post-closure era (call it pre_post, coded 0 before the date the schools were all closed, and 1 thereafter), and that your group is represented by a single variable (call it group) that takes on the values 1, 2, and 3, the basic DID model is:

Code:

appropriate_fixed_effects_regression_command outcome i.group##i.pre_post, fe // NOTE: ##, NOT #

You may add covariates to that and, depending on circumstances, ask for cluster-robust vce().

That said, school closure sounds like that kind of thing that does not happen simultaneously in different places. That is, it may not be possible to define a pre_post variable simply in terms of a single date cutoff. If that is what's happening in your data, then you cannot do a classical DID analysis, you must use generalized DID. To understand the theory of generalized DID, see https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf.

Implementation of generalized DID is somewhat different. The group variable remains the same. The pre_post variable now becomes 0 before closure of the school for this observation, and 1 thereafter. You also need a time variable that defines the eras before the first closure, between each closure, and after the last. Call that one era. Then the analysis goes:

Code:

appropriate_fixed_effects_regression_command outcome i.group#i.pre_post i.era, fe // NOTE: #, NOT ##

and, again, there may need to be covariates, and perhaps cluster robust vce.

Finally, I have to say I find your description of the 3 groups a bit puzzling. They seem to be defined by size of a property parcel. How is that a treatment or control status in a study of the effects of school closure? Do you mean that you have two treatment groups: school closed vs school not closed, and you want to study the effects of that treatment on the assumption that the size of the property may modify the closure effect? If that is the case, then the setup is different. And one might also raise the question whether it makes sense to look at property size as a 3-level discrete variable; a continuous variable might be more sensible. Anyway, if this is the case, please post back for different guidance.
1 like
Comment

Veronika Valcikova

Join Date: Dec 2019
Posts: 34

19 Dec 2019, 12:58

Dear all Stata users,
I am working on interesting but challenging (just finishing my bachelor degree) theme from developing countries.

I have 2 control groups (matrilineal and bilateral societies) and 1 treatment group (patrilineal societies). Each record in the dataset is a woman of childbearing age. The smallest geographic unit is the region. Some of the regions are inhabited by pure matrilineal societies, other regions are inhabited by all 3 groups etc. (for all the combination, please see the table below). For this reason, my treatment is a continuous variable (I need to use shares for number of matrilineal, patrilineal and bilateral people living in the certain region - as pure patrilineal and matrilineal societies live only in 5 out of 19 regions).

M…matrilineal, P…patrilineal, O…other/bilateral

Codes for region	Shares
1	0.2061M + 0.7939P
2	0.7089 P+0.2793 O + 0.0118M
3	100% P
4	0.3135M + 0.6519P + 0.0346 O
5	0.8428 M+ 0.1572 O
6	0.4318M + 0.5682 O
7	100% O
8	100% M
9	100% M
10	0.6135 M + 0.3865 O
11	0.7421 P + 0.2579 O
12	0.6182 P + 0.3818 O
13	0.6176 P + 0.3824 O
14	100% O
15	100% P
16	100% P
19	0.3894 P + 0.6106 O
20	100% P
50	100% O

Example of how I coded these shares:

Code:

gen matrilineal=0
replace matrilineal=1 if geo_tz1996_2015==8 | geo_tz1996_2015==9
replace matrilineal=0.2061 if geo_tz1996_2015==1
replace matrilineal=0.0118 if geo_tz1996_2015==2
…
replace matrilineal=0.6135 if geo_tz1996_2015==10

gen patrilineal=0
replace patrilineal=1 if geo_tz1996_2015==3 |geo_tz1996_2015==15 | geo_tz1996_2015==16| geo_tz1996_2015==20
replace patrilineal=0.7939 if geo_tz1996_2015==1
….
replace patrilineal=0.3894 if geo_tz1996_2015==19
 
gen bilateral=0
replace bilateral=1 if geo_tz1996_2015==7 | geo_tz1996_2015==14 | geo_tz1996_2015==50
replace bilateral=0.2793 if geo_tz1996_2015==2
replace bilateral=0.0346 if geo_tz1996_2015==4
…
replace bilateral=0.6106 if geo_tz1996_2015==19

This is my basic regression:

Code:

tab geo_tz1996_2015, gen(fe)
regress  kid_died  patrilineal  after  impact age  urban  population_factor fe1 fe2 fe3 fe4 fe5 fe6 fe7 fe8 fe9 fe10 fe11 fe12 fe13 fe14 fe15 fe16 fe17 fe18 fe19, robust

where
geo_tz1996_2015 ...regions, region Dodoma is coded as number 1, region Lindi as number 8 etc.
kid_died … child mortality in the certain region (also a share – number of all children that died compared to all children that were born in the certain region and certain year)
patrilineal … treatment variable = the share! (the assumption is that the patrilineal societies response
to a change in law and the matrilineal/bilateral not)
After… before and after 1999 (the Land Acts of 1999 established women’s land rights)
1996, 1999 are the years before the intervention
2004, 2010 and 2015 are the year’s after the intervention
Impact = patrilineal*after
fe1 – fe19 … fixed effects for each region, the 50^th was left out
age, urban, population_factor …control variables

The Stata omit one matrilineal, one patrilineal and one bilateral region, which is of course (regression is in the attachment).

I took a look on the data with the collapse command in order to understand how Stata computes the impact coefficient:

Code:

collapse (mean) kid_died, by (after patrilineal)

However, after calculations in Excel I never got the right result, I only know what to do if the treatment is binary variable. What would be the best way to conduct this analysis in the most appropriate way? And how does Stata compute the impact when the treatment is continuous/ share variable?
Normally, it is: Impact= (Treatment after(Treat=1) – Treatment before) – (Control after(Treat=0) - Control before)

In my case, are the shares serving as weights?
(Treatment after(Treat=1)–Treatment before) – (Treatment after (Treat=0.7930) – Treatment before)- ….. - (Control after(Treat=0) - Control before)

Thank you very much.
Kind regards, Veronika

Attached Files

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

19 Dec 2019, 14:09

Because your treatment variable is continuous, the interpretation is a bit different. For the moment, let's imagine that there were no after or impact variables and we just regressed kid_died on patrilineal. The coefficient of patrilineal would then be the marginal effect of patrilineality on child mortality, or, in simpler terms, it would be the expected rate of difference in kid_died per unit difference in patrilineality.

With an interaction model, it is a bit more complicated. There are two rates of difference in kid_died per unit difference in patrilineality: one for thetime period before the law (after = 0) and a different one for the later time period (after = 1). The difference between those rates is your estimate of the effect of the law. That difference is given as the coefficient of the interaction term (which you have called impact).

To fully understand your results more easily, I suggest you revisit your data management and analysis and avail yourself of the more modern syntax that makes these things easier. Get rid of the impact variable and get rid of all those fe variables. Instead of those fe variables, just have a single variable that identifies the region. Let's call that region. And instead of calculating an interaction term, let Stata do it automatically with factor variable notation:

Code:

regress kid_died c.patrilineal##i.after c.age i.urban c.population_factor c.matrilineal i.fe margins after, dydx(patrilineal)

The output of the -margins- command will show you the estimated rates of difference in kid_date per unit difference in patrilineality in both the before and after time periods. The difference between them is still given by the coefficient of the interaction term in the -regress- output.
1 like
Comment
Veronika Valcikova

Join Date: Dec 2019

Posts: 34
#5

25 Dec 2019, 06:43

Dear Clyde, you helped me really a lot. I would like to thank you very much!
Kind regards, Veronika
Comment

Veronika Valcikova

Join Date: Dec 2019
Posts: 34

20 Jan 2020, 14:46

Dear Clyde and all Stata users, my next question consider the same setting as described in #3, respectively #4. Now, I need to show if the parallel trend assumption holds or not. I am wondering if I should weight the particular region with its patrilineal/matrilineal or bilateral share according to the table bellow.

Number of region	Region	Matrilineal share	Patrilineal share	Bilateral share
1	Dodoma	0.2061	0.7939	0
2	Arusha_Manyara	0.0118	0.7089	0.2793
3	Kilimanjaro	0	1	0
4	Tanga	0.3135	0.6519	0.0346
5	Morogoro	0.8428	0	0.1572
6	Pwani	0.4318	0	0.5682
7	Dar Es Salaam	0	0	1
8	Lindi	1	0	0
9	Mtwara	1	0	0
10	Ruvuma	0.6135	0	0.3865
11	Iringa_Njombe	0	0.7421	0.2579
12	Mbeya	0	0.6182	0.3818
13	Singida	0	0.6176	0.3824
14	Tabora	0	0	1
15	Rukwa+Katavi	0	1	0
16	Kigoma	0	1	0
19	GKMSS	0	0.3894	0.6106
20	Mara	0	1	0
50	PTWZ	0	0	1

Example of weighting of kid_died:
Weight for patrilineal kinship in the region 4 (Tanga) is 0.6519 (Tanga is 65.19% patrilineal).

Code:

gen patrilineal=0
replace patrilineal=1 if geo_tz1996_2015==3 | geo_tz1996_2015==15 | geo_tz1996_2015==16| geo_tz1996_2015==20
replace patrilineal=0.7939 if geo_tz1996_2015==1
replace patrilineal=0.7089 if geo_tz1996_2015==2
replace patrilineal=0.6519 if geo_tz1996_2015==4
replace patrilineal=0.7421 if geo_tz1996_2015==11
replace patrilineal=0.6182 if geo_tz1996_2015==12
replace patrilineal=0.6176 if geo_tz1996_2015==13
replace patrilineal=0.3894 if geo_tz1996_2015==19

average child mortality for all five years for region Tanga coded as 4:

Code:

egen mean_kid_died_4p_1996=mean(kid_died) if year==1996&(geo_tz1996_2015==4)
egen mean_kid_died_4p_1999=mean(kid_died) if year==1999&(geo_tz1996_2015==4)
egen mean_kid_died_4p_2004=mean(kid_died) if year==2004&(geo_tz1996_2015==4)
egen mean_kid_died_4p_2010=mean(kid_died) if year==2010&(geo_tz1996_2015==4)
egen mean_kid_died_4p_2015=mean(kid_died) if year==2015&(geo_tz1996_2015==4)

variable for considering also the share of patrilineal societies:

Code:

gen mean_kid_died_p=0

replace mean_ kid_died_p=patrilineal*mean_kid_died_4p_1996 if year==1996 & geo_tz1996_2015==4
replace mean_ kid_died_p=patrilineal*mean_kid_died_4p_1999 if year==1999 & geo_tz1996_2015==4
replace mean_ kid_died_p=patrilineal*mean_kid_died_4p_2004 if year==2004 & geo_tz1996_2015==4
replace mean_ kid_died_p=patrilineal*mean_kid_died_4p_2010 if year==2010 & geo_tz1996_2015==4
replace mean_ kid_died_p=patrilineal*mean_kid_died_4p_2015 if year==2015 & geo_tz1996_2015==4

….. did this for all the regions

then in mean_kid_died_p are the averages for each year and each region considering the regional patrilineal share, in the case of Tanga region, the patrilineal variable has the value 0.6519

Then I take the average of all patrilineal regions (the regions are already weighted with the patrilineal share) for every year:

Code:

egen mean_kid_died_p_1996=mean(mean_kid_died_p) if year==1996
egen mean_kid_died_p_1999=mean(mean_kid_died_p) if year==1999
egen mean_kid_died_p_2004=mean(mean_kid_died_p) if year==2004
egen mean_kid_died_p_2010=mean(mean_kid_died_p) if year==2010
egen mean_kid_died_p_2015=mean(mean_kid_died_p) if year==2015

And finally, I create one variable which include all the information together

Code:

gen avg_kid_died_p=0
replace avg_kid_died_p=mean_kid_died_p_1996 if year==1996
replace avg_kid_died_p=mean_kid_died_p_1999 if year==1999
replace avg_kid_died_p=mean_kid_died_p_2004 if year==2004
replace avg_kid_died_p=mean_kid_died_p_2010 if year==2010
replace avg_kid_died_p=mean_kid_died_p_2015 if year==2015

The same I do for matrilineal and bilateral, and afterward I use
line avg_kid_died_m avg_kid_died_p avg_kid_died_b year, xline(2004)
to get the graph.

Or should I use the second possible approach:
In order to have two groups, I have found out what is the median value of patrilineal share in the regions. Then, I take a look at the regions and assign them to “treatment” if their patrilineal share is above the median and to “control” with patrilineal shares below the median.

median patrilineal, by (geo_tz1996_2015)
greater than the median: yes => treatment group: 1, 2, 3, 4, 11, 12, 13, 15, 16, 20
greater than the median: no => control group: 5, 6, 7, 8, 9, 10, 14, 19, 50

Thank you very much for your time and help.
Kind regards,
Veronika Valcikova

Comment

Veronika Valcikova

Join Date: Dec 2019

Posts: 34
#7

22 Jan 2020, 01:35

parallel trends as above, sorry

Last edited by Veronika Valcikova; 22 Jan 2020, 01:38. Reason: copy of text, sorry
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#8

22 Jan 2020, 14:04

I don't follow what you are trying to do in the code you show in #6. But I think you are trying to fit a square peg into a round hole. Your "treatment" is continuous valued, it is not a grouping variable.
Comment

Veronika Valcikova

Join Date: Dec 2019
Posts: 34

23 Jan 2020, 04:40

Dear Clyde,
thank you for your response!

I was looking for the most proper way, how to show the parallel trends because I am required to do this in the bachelor thesis. As I have actually 3 possible ethnicities within one region (matrilineal, bilateral, patrilineal=treatment group) I was struggling with the representation. I decided for the following approach: I found out what is the median value of patrilineal share in the regions, then I made from all the regions with patrilineal share above the median value the treatment group, and from all the regions with share under the median the control group.

Code:

median patrilineal, by(geo_tz1996_2015)

Treatment: 1, 2, 3, 4, 11, 12, 13, 15, 16, 20
Control: 5, 6, 7, 8, 9, 10, 14, 19, 50

Number of region	Region	Matrilineal share	Patrilineal share	Bilateral share
1	Dodoma	0.2061	0.7939	0
2	Arusha_Manyara	0.0118	0.7089	0.2793
3	Kilimanjaro	0	1	0
4	Tanga	0.3135	0.6519	0.0346
5	Morogoro	0.8428	0	0.1572
6	Pwani	0.4318	0	0.5682
7	Dar Es Salaam	0	0	1
8	Lindi	1	0	0
9	Mtwara	1	0	0
10	Ruvuma	0.6135	0	0.3865
11	Iringa_Njombe	0	0.7421	0.2579
12	Mbeya	0	0.6182	0.3818
13	Singida	0	0.6176	0.3824
14	Tabora	0	0	1
15	Rukwa+Katavi	0	1	0
16	Kigoma	0	1	0
19	GKMSS	0	0.3894	0.6106
20	Mara	0	1	0
50	PTWZ	0	0	1

Do you think this is absolutely not allowed or do you have any other idea how to solve the task "Show if the parallel trends assumption holds".
Thank you very much. Veronika

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#10

23 Jan 2020, 14:17

I suppose if I were doing a bachelor's thesis this seems like one of the least unreasonable approaches one might take. But it isn't really correct, and I wouldn't really put much credence in the results it produces. I can't get past the fact that what you are being asked to do is simply completely unreasonable and I don't see any good way to do it. The parallel trends assumption just doesn't mean anything here as far as I can see. I suppose what you have done will get them off your back and they might convince themselves that there is some usefulness to those results.

Since I'm unable to get past my conviction that you are being asked to do nonsense and I can't think of a good way to do it, I have a better suggestion for you: speak to your advisor and ask him or her what they have in mind here. Perhaps there is some perfectly reasonable and clear thing they want you to do that I'm just unable to imagine.

Sorry I can't be more helpful.
Comment
Veronika Valcikova

Join Date: Dec 2019

Posts: 34
#11

23 Jan 2020, 14:21

Dear Clyde,
still I appreciate your opinion and your time, it helps me to be more confident as I also thought that the whole thing does not may be reasonable.
Thank you.
Comment
Veronika Valcikova

Join Date: Dec 2019

Posts: 34
#12

10 Apr 2020, 05:21

Dear Dr. Schechter

[code]
regress kid_died c.patrilineal##i.after c.age i.urban c.population_factor c.matrilineal i.fe
margins after, dydx(patrilineal)[code]

I have question to this notation, c.patrilineal##i.after respectively.

These are now my final regressions:

1)

Code:

regress share_of_dead_kids c.patrilineal##i.after c.population_factor c.Protestant_share c.Muslim_share c.Catholic_share c.matrilineal c.avg_cropland_regions c.wealths_avg c.reads_control c.radio c.antetnusno_avg c.biofwthtsdrmdhs_avg c.age i.urban i.geo_tz1996_2015, robust

and

2)

Code:

probit at_least_one_child_died c.patrilineal##i.after c.population_factor c.Protestant_share c.Muslim_share c.Catholic_share c.matrilineal c.avg_cropland_regions c.wealths_avg c.reads_control c.radio c.antetnusno_avg c.biofwthtsdrmdhs_avg c.age i.urban i.geo_tz1996_2015, robust

However, if a create the variable impact instead of c.patrilineal##i.after, the results change enormously! Why is that possible?

1)
after#c.patrilineal
1 | -0.0118872

impact -0.0403407

2)
after#c.patrilineal
1 | -.0005508

impact -0.2836952

Thank you. Veronika

Last edited by Veronika Valcikova; 10 Apr 2020, 05:43.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#13

10 Apr 2020, 10:53

What is this variable impact? How did you create it? And what did you do with it in the regressions? Show all the relevant code and outputs.

Last edited by Clyde Schechter; 10 Apr 2020, 10:59.
Comment

Announcement