Creating Dummy Variable each company-year with multiple observations

Roy te Riele

Join Date: May 2021

Posts: 11
#1

Creating Dummy Variable each company-year with multiple observations

09 May 2021, 10:16

Hi Guys,

I'm currently writing my thesis about the influence of the tenure of outside directors on earnings quality. I also want to investigate whether having financial expertise has an effect on the relationship between tenure and earnings quality. I have broken down a sample of companies from the USA in the period from 2010-2019 with several outside directors per company year and I want to use a dummy variable that shows the value of 1 if at least one outside director financial expertise has in a company year. I have currently created the following do file to create this dummy variable:

gen OUTSIDE = 0
replace OUTSIDE = 1 if (classification=="I")
drop if OUTSIDE == 0
gen cusip_6d = substr(CUSIP, 1, 6)
drop CUSIP
gen Financial_Expertise = 0
replace Financial_Expertise = 1 if Financial_Expert == "Yes"
sort cusip_6d year
duplicates drop cusip_6d year, force

I have attached a snap of the cleansed data which is generated after implementing the do-file. As you can see there is a dummy variable created under the name Financial_Expertise but there is an indication problem within it. In some cases, the dummy variable Financial_Expertise indicates a value of 0 in a given company year while in that specific company-year there is an outside director with financial expertise. This is for example the case for the first dummy-variable of AAON. In the first picture, the dummy variable shows an ''0'' in the company year 2010 which means that none of the outside directors is in possession of financial expertise. However, in the second picture you can see that there is indeed one outside director with financial expertise.

Do you guys have any idea how I can adjust my do-file so that the dummy variable shows a 'correct' value in terms of the presence of a financial expert per company year?

I would like to hear from you and thank you in advance

Kind Regards,
Roy
Attached Files
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

09 May 2021, 12:00

Scrapping the code you show in #1, you can do this:

Code:

keep if classification=="I" gen cusip_6d = substr(CUSIP, 1, 6) drop CUSIP by cusip_6d year, sort: egen any_director_fin_exp = max(Financial_expert == "Yes")

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

As an aside, -duplicates drop cusip_6d year, force- is giant red flag. What you are doing with that is arbitrarily selecting one observation per cusip_6d year and discarding others that have different information. You needed to use the -force- option because Stata is trying to protect you from throwing away information by mistake. It may be that the only variables that you are interested in from that point on all agree for any particular cusip_6d year combination. But it is very easy later on to forget what has gotten you to this point, and if you make further calculations using variables that originally differed, you will get arbitrary, irreproducible, incorrect results. So the safe way to code this is to first -drop- all of the variables that no longer matter, and then run -duplicates drop cusid_6 year- (with no -force- option). If you mistakenly neglect to drop a variable that is not constant within cusid_6 year, Stata will detect that and give you an error message, so you are aware of the mistake.
Comment
Roy te Riele

Join Date: May 2021

Posts: 11
#3

09 May 2021, 12:36

Hi Mr. Schechter,

Thank you very much for your fast respond and information. I have reproduced your code in Stata and it indeed produced the dummy variable any_director_fin_exp. But is it also possible to make this dummy at company level? So that I can see whether at least one outside director has financial expertise per company year.

At the moment, there are still several observations per company year in the dataset.

Thank you in advance and I would love to hear from you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#4

09 May 2021, 12:40

But is it also possible to make this dummy at company level? So that I can see whether at least one outside director has financial expertise per company year.

I don't understand your question. You seem to be contradicting yourself.

The code shown in #2 calculates, for each cusip_6d in each year, a variable showing whether or not any outside director has financial expertise. Based on "So that I can see whether..." I thought that is what you want. But your question suggests you want it calculated only once for each company, showing whether any outside director in any year has ever had financial expertise. It's hard for me to see how that could be useful, but I don't work in finance/economics, so maybe it is. In that case, it's:

Code:

by cusip_6d, sort: egen any_director_fin_exp = max(Financial_expert == "Yes")
Comment
Roy te Riele

Join Date: May 2021

Posts: 11
#5

09 May 2021, 13:04

''The code shown in #2 calculates, for each cusip_6d in each year, a variable showing whether or not any outside director has financial expertise''. This is indeed the way I want to sort the dataset. The thing is that, despite the fact that the given do-file filters per company year, there are several year observations for one company. For example, AAON has 4 observations over the year 2010, but I'd like to see one observation of this to have insight whether at least one outside director is in possession of financial expertise per company per year.

Apologies for describing the problem badly, it's my first time working with this program.

Thank you in advance and I would love to hear from you.

Kind regards,
Roy
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#6

09 May 2021, 13:33

Oh, maybe I understand. It's not a matter of calculating the variable at the company-year level: that's already been done. It's that you then want to reduce the data set to the company-year level. OK, then in that case, just follow the code in #2 with:

Code:

collapse (first) any_director_fin_exp, by(company year)

Note: If there are other variables that you also want to carry along to this resulting data set, you can list them before the comma. Just be sure that they are variables that take on the same value in all observations for a given company in a given year, not variables that are properties of the individual directors.
Comment
Roy te Riele

Join Date: May 2021

Posts: 11
#7

10 May 2021, 02:13

Yess it works! Thank you very much for your responses and information

Kind regards,
Roy
Comment

Announcement

Creating Dummy Variable each company-year with multiple observations

Comment

Comment

Comment

Comment

Comment

Comment