Problem with Difference in difference specification

Peter Green

Join Date: Apr 2019
Posts: 3

Problem with Difference in difference specification

03 Apr 2019, 12:08

Hi all,
I am sorry if this topic has been covered before. However, I did my research but failed.

I am working on a paper where I want to estimate the effect of a green bond issuance on specific variables, such as ESG score and CO2 emmission. For this I gatherd a database with firms that issued a green bond and I matched each green bond issuer with a similar firm that issued a bond in the same year but not a green bond.

In simple, my dataset now looks like this:

Year	ID	Country	Industry	GB issuer	ESG score	Co2	Issue date
2015	1	1	1	1	75	80	2016
2016	1	1	1	1	80	75	2016
2017	1	1	1	1	85	75	2016
2015	2	2	2	0	60	70	-
2016	2	2	2	0	65	70	-
2017	2	2	2	0	65	80	-
2015	3	1	1	0	75	90	-
2016	3	1	1	0	60	75	-
2017	3	1	1	0	55	60	-
2015	4	2	2	1	80	40	2017
2016	4	2	2	1	90	30	2017
2017	4	2	2	1	95	20	2017
2018	4	2	2	1	99	10	2017

I want to use a difference in difference specification with the following regressions:

Code:

ESG score it = αi + αc* αt + αs*αt +  β *green bond

i stands for firm and t for years.

where αi stands for fixed firm effects,
αc* αt stands for country by year fixed effects.
αs*αt stands for industry by year fixed effects
β is the dummy variable for GB issuer and should measure the effect of the GB issuance.

Now I tried to generate year dummies, which should be equal to 1 after the issuance of the GB, but I couldnt figure it out. Since the dependent variable then becomes the year dummy? ( e.g. measure the ESG score at 2015 for either a GB issuer and a normal bond issuer)

Could anybody help me with running this regression in stata?

many thanks,

Peter

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30353
#2

03 Apr 2019, 14:34

There are several things wrong here.

First, you state that in building your data set you matched green bond issuers with non-green bond issuers. But your data example shows nothing about this pairing. When you have matched-pair data you must do a matched-pair analysis, and that requires a variable that shows which observations belong to the same pair.

Moreover, because you have longitudinal observations on these firms, which are nested in matched pairs, and which in turn appear to be nested in industries and countries (or industries crossed with countries), you are going to need a multi-level mixed effects model here.

Next, in a difference in differences model, the coefficient of green bond will not estimate the effect of issuing green bonds. Rather it will be some interaction term involving green bond issuance and time.

Next, in your example data, it appears that any given industry country combination consists exclusively of green bond issuers or exclusively of non-green bond issuers. That will make it impossible to distinguish green bond issuing effects from industry and country effects. (Perhaps your real data does not have this limitation.)

When you post back with clarifications, please show a real example of your data, and to make it useful to those who want to help you, use the -dataex- command to do that. Tables of the kind you show require extensive manipulation to import into Stata to develop code for. Moreover, they may obscure other problems in your data that would show up with a real Stata data example but might be covered over in a table, such as inappropriate data storage types or other metadata. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment

Peter Green

Join Date: Apr 2019
Posts: 3

04 Apr 2019, 10:05

Dear Mr. Schechter,

Thank you for your extensive answer! I am sorry that did it wrong last post, I will try to formulate it correct this time.

As you see in my dataset I have a Pair ID. I gathered the matched firms manually based on same country, same SIC code, bond issuer in the same year and closest peer.

The matching seems now somewhat ambiguous. I tried to match my dataset using propensity score with nearest neighbor but I have a hard time with creating the pairs. I might implement this later, since I have a dataset with 20 peer companies for each Greenbond issuer filtered on country, industry. Where I now selected in manually, it might be a better idea to match it by propensity score based on variables such as size, net income etc.

But for now, my question is on the regression. I want to capture the effect of the issuance of a green bond ( treatment firms) on ESG score/ EnvironmentPillarScore/ Environmental innovation / Co2 etc.
I am following a paper which estimates this with the following regression ( same as first post):

Code:

y_it = α_i + α_c* α_t + α_s*α_t + β *green bond issuer.

y is either ESG score,EnvironmentPillarScore, environmental innovation or Co2
αi stands for fixed firm effects,
αc* αt stands for country by year fixed effects.
αs*αt stands for industry by year fixed effects

Now greenbond issuer is a dummy variable, then β would measure the difference in difference outcome in variable y between treated and matched control firms right?

My question now is, how can I run this regression while taking account for these multple fixed effects. Or in other words, how can I measure the effect of the green bond issuance on ESG. ( by looking at the differences in ESG of green bond issuers vs comparable firms)

Would intuition says that this would simply be:

Code:

reg ESGScore i.Country1 i.Industry1 i.IssueYear Greenissuer

Now I gave each each matched firm the same IssueYear as the corresponding Green bond firm, since this regression should only capture the effect after the year of green bond issuance.

I hope the question is now somewhat clearer. My apologies for my poor understanding of econometric s.

Many thanks in advance!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ID Pairid) int year byte Greenissuer float(Country1 Industry1) int IssueYear double(ESGScore EnvironmentPillarScore EnvirInn Co)
1 1 2007 1 8 6 2013 67.6086641391614 69.4915254237288 17.7966101694915 84330000
1 1 2008 1 8 6 2013 66.0844882729211 62.7743634767339 8.20895522388059 78300000
1 1 2009 1 8 6 2013 57.0404718693284 68.9783281733746 43.6842105263157 91605300
1 1 2010 1 8 6 2013 64.1722391304347 72.5882352941176 60.4 81246570
1 1 2011 1 8 6 2013 . 72.5882352941176 60.4 81246570
1 1 2012 1 8 6 2013 . 64.8529411764706 61.25 71016962
1 1 2013 1 8 6 2013 . 66.8494516450648 53.8135593220339 80357560
1 1 2014 1 8 6 2013 62.3333263528229 64.1963872163038 53.9370078740157 64300000
1 1 2015 1 8 6 2013 62.3333263528229 64.1963872163038 53.9370078740157 64300000
1 1 2016 1 8 6 2013 62.6485668995859 67.6653270003653 49.3788819875776 47700000
1 1 2017 1 8 6 2013 60.7799401197604 67.2948221204649 39.8203592814371 50500000
1 1 2018 1 8 6 2013 60.7799401197604 67.2948221204649 39.8203592814371 50500000
1 1 2019 1 8 6 2013 . . . .
2 1 2007 0 8 6 2013 . . . .
2 1 2008 0 8 6 2013 . . . .
2 1 2009 0 8 6 2013 55.3093626482213 80.0255754475703 81.3043478260869 6327480
2 1 2010 0 8 6 2013 68.6110217391304 74.6823529411765 81.2 6312319
2 1 2011 0 8 6 2013 68.6110217391304 74.6823529411765 81.2 6866789
2 1 2012 0 8 6 2013 63.6654448938321 64.7514819881441 59.3023255813953 7742360
2 1 2013 0 8 6 2013 70.8539964040536 80.9486952675807 94.7368421052631 7168989
2 1 2014 0 8 6 2013 69.0263975155279 78.6659663865546 95 7883762
2 1 2015 0 8 6 2013 75.3300021039343 83.7058823529411 96.6836734693877 7883762
2 1 2016 0 8 6 2013 73.8073254870129 84.3787515006002 96.3203463203463 7948731
2 1 2017 0 8 6 2013 73.8073254870129 85.8034122740005 96.3203463203463 7948731
2 1 2018 0 8 6 2013 73.7109356950327 87.1968787515005 95.9183673469387 8527443
2 1 2019 0 8 6 2013 . . . .
3 2 2007 1 13 6 2014 56.075 55.0588235294118 92 2484112
3 2 2008 1 13 6 2014 64.6115384615384 50.9049773755656 57.6923076923076 2430000
3 2 2009 1 13 6 2014 66.6953703703703 . 68.5185185185185 2179768
3 2 2010 1 13 6 2014 66.275974025974 . 73.2142857142857 1905794
3 2 2011 1 13 6 2014 . . 67.8571428571428 1930168
3 2 2012 1 13 6 2014 . 35.6092436974789 46.4285714285714 2341574
3 2 2013 1 13 6 2014 . 27.4340770791075 50 2697975.8
3 2 2014 1 13 6 2014 38.9681372549019 29.4117647058823 58.3333333333333 1624000
3 2 2015 1 13 6 2014 54.2577380952381 33.9495798319327 58.5714285714285 1495719
3 2 2016 1 13 6 2014 53.3106725146199 35.1307189542483 54.1666666666666 1112251
3 2 2017 1 13 6 2014 47.4527777777777 34.4362745098039 61.1111111111111 1142575
3 2 2018 1 13 6 2014 45.3827683615819 34.4362745098039 61.1111111111111 1186122
3 2 2019 1 13 6 2014 . . . .
4 2 2007 0 13 6 2014 . . . .
4 2 2008 0 13 6 2014 . . . .
4 2 2009 0 13 6 2014 . . . .
4 2 2010 0 13 6 2014 . . . .
4 2 2011 0 13 6 2014 . . . .
4 2 2012 0 13 6 2014 . . . .
4 2 2013 0 13 6 2014 . . . .
4 2 2014 0 13 6 2014 . . . .
4 2 2015 0 13 6 2014 22.5753311258278 19.5948578106739 44.7019867549668 .
4 2 2016 0 13 6 2014 25.3806527187534 20.6521739130434 40.6832298136645 .
4 2 2017 0 13 6 2014 33.3787924151696 31.5163790066925 76.0479041916167 .
4 2 2018 0 13 6 2014 37.4248914840843 38.3070301291248 97.2560975609756 .
4 2 2019 0 13 6 2014 . . . .
5 3 2007 1 8 14 2014 60.9610983981693 72.2506393861892 50 5706274
5 3 2008 1 8 14 2014 45.5392857142857 47.6470588235294 38 92372109
5 3 2009 1 8 14 2014 72.0466119528619 . 79.6296296296296 100541004
5 3 2010 1 8 14 2014 72.0466119528619 . 79.6296296296296 100541004
5 3 2011 1 8 14 2014 . . 91.0714285714285 112575205
5 3 2012 1 8 14 2014 . 61.2394957983193 91.0714285714285 157943664
5 3 2013 1 8 14 2014 . 74.340770791075 91.3793103448275 .
5 3 2014 1 8 14 2014 75.8198581560283 75.2941176470588 88.3333333333333 .
5 3 2015 1 8 14 2014 73.6133284241531 74.9579831932773 84.2857142857142 .
5 3 2016 1 8 14 2014 73.6133284241531 74.9579831932773 84.2857142857142 .
5 3 2017 1 8 14 2014 67.7682291666666 73.406862745098 72.2222222222222 124005237
5 3 2018 1 8 14 2014 71.7249737945492 74.3872549019608 72.2222222222222 92813298
5 3 2019 1 8 14 2014 . . . .
6 3 2007 0 8 14 2014 61.6539179104477 34.297108673978 61.9402985074626 24476000
6 3 2008 0 8 14 2014 74.759649122807 69.3151887620719 74.2105263157894 23422106
6 3 2009 0 8 14 2014 79.5424528301886 79.6059933407325 92.9245283018867 21313000
6 3 2010 0 8 14 2014 79.5424528301886 78.6117647058824 92.9245283018867 15771000
6 3 2011 0 8 14 2014 75.6675 72.0098039215686 90 18237580
6 3 2012 0 8 14 2014 73.5749999999999 57.5523429710867 51.25 18237580
6 3 2013 0 8 14 2014 69.2757768361582 57.5523429710867 43.6440677966101 18969000
6 3 2014 0 8 14 2014 77.4321760916249 59.2610597958191 57.0866141732283 18969000
6 3 2015 0 8 14 2014 73.3314268512944 73.3209819360815 59.2715231788079 22532000
6 3 2016 0 8 14 2014 75.0088932806324 73.6852356836774 64.2857142857142 19478000
6 3 2017 0 8 14 2014 75.0088932806324 70.2007749207467 64.2857142857142 19478000
6 3 2018 0 8 14 2014 69.2365269461077 70.2007749207467 51.1976047904191 23931000
6 3 2019 0 8 14 2014 . . . .
7 4 2007 1 11 14 2014 68.4295348837209 86.1176470588235 74 1682019
7 4 2008 1 11 14 2014 70.1967147435897 76.5837104072398 51.9230769230769 1462189
7 4 2009 1 11 14 2014 67.5066425120773 . 50 1918585
7 4 2010 1 11 14 2014 79.6367781155015 84.9789915966386 98.2142857142857 1920107
7 4 2011 1 11 14 2014 . . 98.2142857142857 2088761
7 4 2012 1 11 14 2014 . 84.7689075630252 98.2142857142857 1993072
7 4 2013 1 11 14 2014 . 84.1784989858012 94.8275862068965 1411792
7 4 2014 1 11 14 2014 68.5797872340425 79.2156862745098 91.6666666666666 1257816
7 4 2015 1 11 14 2014 68.4836734693877 76.1344537815125 87.1428571428571 1802318
7 4 2016 1 11 14 2014 73.3226950354609 80.8006535947712 90.2777777777777 1692110
7 4 2017 1 11 14 2014 66.1921296296296 69.1993464052287 87.5 1620147
7 4 2018 1 11 14 2014 66.1921296296296 69.1993464052287 87.5 1620147
7 4 2019 1 11 14 2014 . . . .
8 4 2007 0 11 14 2014 40.5157248157248 60.8903020667726 63.5135135135135 3527001
8 4 2008 0 11 14 2014 40.5157248157248 60.8903020667726 63.5135135135135 3527001
8 4 2009 0 11 14 2014 40.5157248157248 60.8903020667726 63.5135135135135 3527001
8 4 2010 0 11 14 2014 40.5157248157248 60.8903020667726 63.5135135135135 3527001
8 4 2011 0 11 14 2014 40.5157248157248 60.8903020667726 63.5135135135135 3527001
8 4 2012 0 11 14 2014 40.5157248157248 60.8903020667726 63.5135135135135 3527001
8 4 2013 0 11 14 2014 40.5157248157248 60.8903020667726 63.5135135135135 3527001
8 4 2014 0 11 14 2014 40.5157248157248 60.8903020667726 63.5135135135135 3527001
8 4 2015 0 11 14 2014 40.5157248157248 60.8903020667726 63.5135135135135 3527001
end

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30353
#4

04 Apr 2019, 14:44

Now greenbond issuer is a dummy variable, then β would measure the difference in difference outcome in variable y between treated and matched control firms right?

No, not right. The DID estimator is the coefficient of an interaction term between Greenissuer and a pre-post issue variable.

Code:

gen pre_post = (year >= IssueYear) regress ESGScore i.Greenissuer##i.pre_post

is the basic code for the basic model.

Now, let's leave aside for the moment the matched pair design and just consider the data as unpaired. So there are green bond issuers and non-green bond issuers. We have data on their outcome (ESGScore) both before and after they issue their bonds. A problem with the code shown above is that it fails to account for the repeated observations on the same IDs over time. You can overcome that problem by using fixed-effects regression:

Code:

xtset ID year xtreg ESGScore i.Greenissuer##i.pre_post, fe

Now, I understand you want to also include country and industry fixed effects. But that is impossible. At least within the example data you have shown, each ID is always in the same country and same industry every year. So the country and industry variables will necessarily be colinear with the ID-level fixed effects and will be omitted if you try to include them. (Try it--you'll see.) So you have to forget about those: they are simply not estimable from this kind of data. If your complete data does not show this pattern of consistency of country and industry within ID, then that's a different story--but just from the names of the variables, that kind of consistency is what I expect.

Fixed-effects models like this are the usual preference in econometric analysis because they provide consistent estimation and rely on fewer assumptions. But they are not the only analyses possible. In particular, if it is crucial to estimate the effects of country and industry, you can escape from fixed effects estimation and go to a hybrid model

Code:

gen interaction = 1.Greenissuer#1.pre_post xthybrid ESGScore Greenissuer pre_post interaction Industry1 Country1, family(gaussian) /// link(identity) clusterid(ID)

-xthybrid- is written by Francisco Perales and Reinhard Schunck, and is available from SSC. Be sure to read the help file before using -xthybrid-, as it explains how to read the output, which is not standard Stata regression output.
Comment
Peter Green

Join Date: Apr 2019

Posts: 3
#5

05 Apr 2019, 04:58

Dear Mr. Schechter,

Once again many thanks for your extensive answer. You really help me out here.

I see the problem with the fixed effects on ID, country and industry, each ID is always in the same country and indystry. I will have to drop this assumption.

However, about the assumption that it is unpaired I don't follow you on that. In my dataset I have matched each green bond firm with a non-green bond firm from the same country and industry. How should I use this in my regressions, since the code you provide now does not account for this. Is it even possible to examine the effect of green bond issuance on the ESGScore variable with this data? should I use another approach/regression?

Many thanks in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30353
#6

05 Apr 2019, 13:06

I said to set aside the matching because it requires an somewhat different analytic approach, and I wanted to at least get through the explanation of your other questions without adding in those complications.

Given that the data is paired, and there are longitudinal observations on each ID within the pair, you cannot reduce this model to less than 3 levels. So you must use -mixed- because -xthybrid- only goes up to 2 levels. That means that you have do your own creation of the within- and between- variables, including for the interaction. So it looks like this:

Code:

gen pre_post = (year >= IssueYear) gen interaction = 1.Greenissuer#1.pre_post foreach v of varlist Greenissuer pre_post interaction { by ID, sort: egen b_`v' = mean(`v') gen w_`v' = `v' - b_`v' } mixed ESGScore w_* b_* || Pairid: || ID:

In your example data, the w_ and b_ interaction coefficients are rather different from each other, and I'm not 100% certain from your problem description which one is what you seek. But I imagine that it is the w_interaction coefficient that interests you: it represents the DID estimate of the effect on an individual ID;s ESGScore that ensues when the ID issues a Green, as opposed to a non-green, bond.
Comment
Paola Portaccio

Join Date: Feb 2020

Posts: 7
#7

26 Apr 2020, 07:36

Hi Peter, I have the same problem. How did you solve the DID analysis, considering the matching generated with the propensity score? Could you help me by writing your STATA codes?
Comment

Announcement

Problem with Difference in difference specification

Comment

Comment

Comment

Comment

Comment

Comment