Difference in Difference

Jenice Riz

Join Date: Sep 2021

Posts: 4
#1

Difference in Difference

19 Sep 2021, 02:09

Hello everyone,
I require to run the difference in difference regression on panel data. Particularly, I need to examine the effect of CEO marital transition effect (from married to single) on the dividend payouts of the firm. I have provided the following details:
gvkey is the firm ID.

married is the dummy variable of the marital status of the CEO-- 1 is for the single and 0 is for the married

execid is the executive ID of the CEO

Dividends is the total dividends payout of the firm.

mtm is the Married to Married transition of the CEO

mts is the Married to Single transition of the CEO

I need to run the following regression:

Dividends= ^{a+ b* post+ c*Post*treatment+d* Treatment+ controls}

For that:
I need to create a dummy variable "Treatment" that equals 1 if a firm is a Married-to-Single CEO transition firm, and 0 if a firm is a Married-to-Married CEO transition firm

A variable "Post" which needs to be coded as 1 if a year is after a CEO transition, and 0 if a year is before a CEO transition.

An additional variable "Post1" specifying 2 years Pre and Post--or 3 Years window. I can change the period to see the change in effect if any)==> This is additional

The interaction of Treatment*Post.

Am I missing any other information that might be helpful for you to write the code? Please guide me.

Thanks in advance.

* Example generated by -dataex-. To install: ssc install dataex
clear
input long gvkey byte married int fyear long execid float dividends byte(mtm stm)
1004 0 1994 9248 7.65 0 0
1004 0 1995 9248 7.676 0 0
1004 0 1996 9249 7.976 1 0
1004 0 1997 9249 9.118 0 0
1004 0 1998 9249 9.375 0 0
1004 0 1999 9249 9.218 0 0
1004 0 2000 9249 9.157 0 0
1004 0 2001 9249 4.43 0 0
1004 0 2002 9249 .797 0 0
1004 0 2003 9249 0 0 0
1004 0 2004 9249 0 0 0
1004 0 2005 9249 0 0 0
1004 0 2006 9249 0 0 0
1004 0 2007 9249 0 0 0
1013 0 1993 6 0 0 0
1013 0 1994 6 0 0 0
1013 0 1995 6 0 0 0
1013 0 1996 6 0 0 0
1013 0 1997 6 0 0 0
1013 0 1998 6 0 0 0
1013 0 1999 6 0 0 0
1013 0 2000 6 0 0 0
1013 0 2001 22879 0 1 0
1013 0 2002 22879 0 0 0
1013 0 2003 22879 0 0 0
1013 0 2004 10203 0 1 0
1013 0 2005 10203 0 0 0
1013 0 2006 10203 0 0 0
1013 0 2007 10203 0 0 0
1013 0 2008 10203 0 0 0
1034 0 1993 6392 3.873 0 0
1034 0 1994 6390 3.893 1 0
1034 0 1995 6390 3.914 0 0
1034 0 1996 6390 3.928 0 0
1034 0 1997 6390 4.198 0 0
1034 0 1998 6390 4.651 0 0
1034 0 1999 16904 5.061 1 0
1034 1 2000 11170 6.526 0 1
1034 1 2001 11170 7.541 0 0
1034 1 2002 11170 9.235 0 0
1034 1 2003 11170 9.32 0 0
1034 1 2004 11170 9.404 0 0
1034 1 2005 11170 9.481 0 0
1034 0 2006 31853 9.84 0 0
1034 0 2007 31853 0 0 0
1045 0 1993 58 49 0 0
1045 0 1994 58 66 0 0
1045 0 1995 58 5 0 0
1045 0 1996 58 0 0 0
1045 0 1997 58 0 0 0
1045 0 1998 3661 0 1 0
1045 0 1999 3661 0 0 0
1045 0 2000 3661 0 0 0
1045 0 2001 3661 0 0 0
1045 0 2002 3661 0 0 0
1045 0 2003 14591 0 1 0
1045 0 2004 14591 0 0 0
1045 0 2005 14591 0 0 0
1045 0 2006 14591 0 0 0
1045 0 2007 14591 0 0 0
1045 0 2008 14591 0 0 0
1055 0 1993 79 0 0 0
1055 0 1994 79 0 0 0
1055 0 1995 79 0 0 0
1055 1 1996 2260 0 0 1
1056 0 1999 1891 0 0 0
1056 0 2000 1891 0 0 0
1056 0 2001 1891 0 0 0
1056 0 2002 1891 0 0 0
1056 0 2003 1891 0 0 0
1056 0 2004 1891 0 0 0
1056 0 2005 1891 0 0 0
1056 0 2006 1891 0 0 0
1072 0 2000 23167 24.463 0 0
1072 0 2001 23168 26.201 1 0
1072 0 2001 23168 26.201 0 0
1072 0 2002 23168 26.146 0 0
1072 0 2002 23168 26.146 0 0
1072 0 2003 23168 26.048 0 0
1072 0 2003 23168 26.048 0 0
1072 0 2004 23168 26.022 0 0
1072 0 2004 23168 26.022 0 0
1072 0 2005 23168 25.862 0 0
1072 0 2005 23168 25.862 0 0
1072 0 2006 23168 25.819 0 0
1072 0 2006 23168 25.819 0 0
1072 0 2007 23168 27.466 0 0
1072 0 2007 23168 27.466 0 0
1075 0 1993 759 17.466 0 0
1075 0 1994 759 72.115 0 0
1075 0 1995 759 80.855 0 0
1075 0 1996 759 89.614 0 0
1075 0 1997 759 96.16 0 0
1075 0 1998 759 103.849 0 0
1075 0 1999 10739 112.311 0 0
1075 0 2000 10739 120.733 0 0
1075 0 2001 10739 129.199 0 0
1075 0 2002 10739 137.721 1 0
1075 0 2003 10739 157.417 0 0
1075 0 2004 10739 166.772 0 0
end

Last edited by Jenice Riz; 19 Sep 2021, 02:12.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

19 Sep 2021, 09:27

There is something wrong with your "panel" data and you will not make progress on this until that is fixed.

For panel data you must have only one observation per panel in each time period.You have many instances of duplicate observations. While it is simple enough to eliminate them with -duplicates drop-, the presence of duplicates in the data set usually reflects some error in data management--which means there may be other things wrong with the data as well. Worse still, for gvkey 1072 in fyear 2001 there are two different observations and they disagree on variable mtm. There may be other such anomalies in your full data set. You need to identify all such contradictions, review the data, and figure out how to select which observation is correct (or how to synthesize a correct observation from the correct ones, or omit that combination of gvkey and fyear altogether if it is not possible to get correct data for it.)

Next, although it is perfectly legal to code a married variable 0 = married and 1 = single (Stata will not care), it is very confusing for people to work with that because the widely used convention for yes-no variables is 0 = no and 1 = yes (which permits logical operations & and | to correspond go arithmetic operations * and +, or functions min and max). If it wree my data set, I would reverse that coding.

Even more confusing in your case is having a variable stm, whose name suggests it stands for single to married. But in the one instance in your data where stm == 1, we find a transition from married == 0 to married == 1, i.e. married to single. So stm seems to be inconsistent with your stated coding of the marital variable.

In the code below, I ignore the stm variable, and I assumed married == 1 means single and married == 0 means married (your stated coding). This code will not work properly until you fix the problem of duplicate and duplicate-inconsistent observations.

Code:

xtset gvkey fyear by gvkey (fyear): gen byte married_to_single = married == 0 & L.married == 1 by gvkey (fyear): gen byte post = sum(married_to_single) replace post = min(post, 1) by gvkey (fyear): egen byte treatment = max(post) by gvkey (fyear): gen post1 = inlist(1, married_to_single, L1.married_to_single, /// F1.married_to_single)

Note that in the above I did not generate an interaction variable between treatment and post. There is no need to do that unless you are using an ancient version of Stata. When you do your regression, if you wish to include a treatment*post interaction term, just write i.treatment##i.post in the list of regression variables (and omit the treatment and post terms) and Stata will automatically do the necessary calculations to include treatment, post, and their interaction in the regression.) Do learn about Stata's factor-variable notation by reading -help fvvarlist-.

Finally, unless by some fantastic coincidence, all of the married-to-single transitions in your data set occur in the same year, this data is not suitable for a difference-in-differences analysis. You will instead need to do a generalized difference-in-differences analysis. For that analysis, the variable created above called post takes the place of the interaction term. Bear in mind that this variable called "post" is not actually a post variable of the type normally used in difference-in-differences interaction since it never takes on the value 1 for a non-treated form. But this makes it just right for generalized difference-in-differences.

Thank you for using -dataex- on your very first post.
Comment
Jenice Riz

Join Date: Sep 2021

Posts: 4
#3

20 Sep 2021, 10:44

Dear Sir, thanks for your kind guidance.

Please accept my apologies for taking much of your time as I have very little information about these methods as I am trying to learn it for the first time for my master thesis. Particularly, I am following the methodology of the paper below:

Francis, Bill B., et al. "Are female CFOs less tax aggressive? Evidence from tax aggressiveness." The Journal of the American Taxation Association 36.2 (2014): 171-202.

==> In creating the sample, they do the following:
“We construct our CFO transition sample using the following filters: (1) both pre- and post-transition CFOs must be in office consecutively for at least three years excluding the transition year; (2) if a firm changes its CFOs more than once, then we only count the first change and drop the subsequent changes for that firm”.

==> They follow the following research design to see the transition effect first:

TAX_AGGi,t=b0 + b1POSTit+controls...................Equation 1

where TAX_AGGi,t represents the three tax aggressiveness measures for firm i in year t. POSTi,t captures CFO gender effect on tax aggressiveness and is an indicator variable that equals 1 if a firm-year is after a (male-to-female) CFO transition, and 0 if a firm-year is before a CFO transition.

==> For DID, they explain the details as below:
We first construct a control sample of firms that change their CFOs from male to male. We then pool the treatment sample (i.e., firms that switch from male CFOs to female CFOs) and the control sample. We create a dummy variable FEMALE that equals 1 if a firm is a male-to-female CFO transition firm, and 0 if a firm is a male-to-male CFO transition firm. We add an interaction term POST _ FEMALE into Equation (1) using the pooled sample. Again POST is coded as 1 if a year is after a CFO transition, and 0 if a year is before a CFO transition. If female CFOs are less tax aggressive than male CFOs, then we expect the coefficient on the interaction variable to be significantly negative.

In contrast to their paper, I have marital status data instead of female CFO and wish to examine its effect on the dividend behavior. I can keep the first transition of the CEO marital status and drop thesubsequent changes for that firm (I am not sure how to do that). I need your kind suggestion and it would mean a lot to me as I am trying my best to learn but codes are not something where I am good at. I can follow if you reverse the coding for the marital status of the variable (Marriage--married=1 and single==0). I am again providing the dataset for your reference for coding of CEO marital status. Below are the details of the data variable and data:

1. gvkey==firm ID
2. married= marital status (Married=1 and single=0)-- I reversed the coding as per your suggestions
3. execid= CEO ID to help calculate the CEO transition
4. dividends= dividends payout of the firm
I will need codes for creating the same sampling technique the above paper has followed

I will do the transition effect (single to married).

I will do the did ((single to married sample (treatment) and single to single (control sample))

Thanks, Sir, for your time.

* Example generated by -dataex-. To install: ssc install dataex
clear
input long gvkey int fyear float married long execid float dividends
1004 1994 1 9248 7.65
1004 1995 1 9248 7.676
1004 1996 1 9249 7.976
1004 1997 1 9249 9.118
1004 1998 1 9249 9.375
1004 1999 1 9249 9.218
1004 2000 1 9249 9.157
1004 2001 1 9249 4.43
1004 2002 1 9249 .797
1004 2003 1 9249 0
1004 2004 1 9249 0
1004 2005 1 9249 0
1004 2006 1 9249 0
1004 2007 1 9249 0
1013 1993 1 6 0
1013 1994 1 6 0
1013 1995 1 6 0
1013 1996 1 6 0
1013 1997 1 6 0
1013 1998 1 6 0
1013 1999 1 6 0
1013 2000 1 6 0
1013 2001 1 22879 0
1013 2002 1 22879 0
1013 2003 1 22879 0
1013 2004 1 10203 0
1013 2005 1 10203 0
1013 2006 1 10203 0
1013 2007 1 10203 0
1013 2008 1 10203 0
1034 1993 1 6392 3.873
1034 1994 1 6390 3.893
1034 1995 1 6390 3.914
1034 1996 1 6390 3.928
1034 1997 1 6390 4.198
1034 1998 1 6390 4.651
1034 1999 1 16904 5.061
1034 2000 0 11170 6.526
1034 2001 0 11170 7.541
1034 2002 0 11170 9.235
1034 2003 0 11170 9.32
1034 2004 0 11170 9.404
1034 2005 0 11170 9.481
1034 2006 1 31853 9.84
1034 2007 1 31853 0
1045 1993 1 58 49
1045 1994 1 58 66
1045 1995 1 58 5
1045 1996 1 58 0
1045 1997 1 58 0
1045 1998 1 3661 0
1045 1999 1 3661 0
1045 2000 1 3661 0
1045 2001 1 3661 0
1045 2002 1 3661 0
1045 2003 1 14591 0
1045 2004 1 14591 0
1045 2005 1 14591 0
1045 2006 1 14591 0
1045 2007 1 14591 0
1045 2008 1 14591 0
1055 1993 1 79 0
1055 1994 1 79 0
1055 1995 1 79 0
1055 1996 0 2260 0
1056 1999 1 1891 0
1056 2000 1 1891 0
1056 2001 1 1891 0
1056 2002 1 1891 0
1056 2003 1 1891 0
1056 2004 1 1891 0
1056 2005 1 1891 0
1056 2006 1 1891 0
1072 2000 1 23167 24.463
1072 2001 1 23168 26.201
1072 2001 1 23168 26.201
1072 2002 1 23168 26.146
1072 2002 1 23168 26.146
1072 2003 1 23168 26.048
1072 2003 1 23168 26.048
1072 2004 1 23168 26.022
1072 2004 1 23168 26.022
1072 2005 1 23168 25.862
1072 2005 1 23168 25.862
1072 2006 1 23168 25.819
1072 2006 1 23168 25.819
1072 2007 1 23168 27.466
1072 2007 1 23168 27.466
1075 1993 1 759 17.466
1075 1994 1 759 72.115
1075 1995 1 759 80.855
1075 1996 1 759 89.614
1075 1997 1 759 96.16
1075 1998 1 759 103.849
1075 1999 1 10739 112.311
1075 2000 1 10739 120.733
1075 2001 1 10739 129.199
1075 2002 1 10739 137.721
1075 2003 1 10739 157.417
1075 2004 1 10739 166.772
end
[/CODE]

Last edited by Jenice Riz; 20 Sep 2021, 10:48.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#4

20 Sep 2021, 11:12

Thank you for changing the coding of the married variable.

This is confusing. Here you say you are interested in the transition from single to married, but in #1 you said you were interested in the transition from married to single. In the code below I work with married to single transitions. If you really mean single to married, then you need to change the command that defines the variable mchange accordingly.

I notice that you still have duplicate observations in your data. You must remove them for the code to work properly. And, again, before just removing them, you should investigate why they are there--their presence is often suggestive of data management errors, and that may entail other things wrong with your data. There is no point in analyzing the wrong data!

Anyway, here is code to set up the variables you need:

Code:

xtset gvkey fyear // KEEP ONLY ONE EXEC TRANSITION PER FIRM by gvkey (fyear), sort: gen n_exec = sum(execid != L1.execid) by gvkey: keep if n_exec <= 2 // VERIFY NO EXEC CHANGES MARITAL STATUS WHILE WORKING AT SAME FIRM by gvkey execid (married), sort: assert married[1] == married[_N] // IDENTIFY TRANSITIONS FROM MARRIED TO SINGLE by gvkey (fyear), sort: gen byte mchange = (married == 0 & L1.married == 1) // GENERATE INDICATOR FOR TREATMENT VS CONTROL AND PRE VS POST TRANSITION by gvkey (fyear): gen post = n_exec > 1 assert inlist(post, 0, 1) by gvkey (fyear): gen treatment = sum(mchange) assert inlist(treatment, 0, 1) drop n_exec

Note: This code defines the treatment variable as 1 when there is a transition from married to single. It is 0 if there is no change in marital status, or if there is a change from single to married. I'm not sure if that's what you want. Maybe you want to just exclude changes from single to married altogether? Or consider them as yet a third category? Your excerpt of the study you are mimicking does not explain what they did for female to male transitions (which would be the analog of a single to married transition here.)
Comment
Jenice Riz

Join Date: Sep 2021

Posts: 4
#5

20 Sep 2021, 11:34

Dear Clyde, thank you so much. The code is working fine for me. I apologise for not communicating well. My variable of interest for the treatment is married to single that you have calculated. However, for the control, I want the transition from married to married.
My treatment sample is Married to single

My match sample is married to married
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30065

20 Sep 2021, 12:54

So you need to exclude any firms where there is a single to married transition. This changes the code slightly:

Code:

xtset gvkey fyear

//  KEEP ONLY ONE EXEC TRANSITION PER FIRM
by gvkey (fyear), sort: gen n_exec = sum(execid != L1.execid)
by gvkey: keep if n_exec <= 2

//  VERIFY NO EXEC CHANGES MARITAL STATUS WHILE WORKING AT SAME FIRM
by gvkey execid (married), sort: assert married[1] == married[_N]

//  DROP ANY FIRMS WHERE THE CHANGE IS FROM SINGLE TO MARRIED
sort gvkey fyear
gen mchange_exclude = (married == 1 & L.married == 0)
by gvkey (fyear): egen exclude = max(mchange_exclude)
drop if exclude
drop mchange_exclude exclude


//  IDENTIFY TRANSITIONS FROM MARRIED TO SINGLE
by gvkey (fyear), sort: gen byte mchange = (married == 0 & L1.married == 1)

//  GENERATE INDICATOR FOR TREATMENT VS CONTROL AND PRE VS POST TRANSITION
by gvkey (fyear): gen post = n_exec > 1
assert inlist(post, 0, 1)
by gvkey (fyear): gen treatment = sum(mchange)
assert inlist(treatment, 0, 1)
drop n_exec

Comment

Jenice Riz

Join Date: Sep 2021

Posts: 4
#7

21 Sep 2021, 06:13

Dear Sir,
thank you so much. The code worked perfectly fine for me. Much appreciated.
Comment

Announcement

Difference in Difference

Comment

Comment

Comment

Comment

Comment

Comment