Complicated index calculation (weighted time)!

River Huang

Join Date: Mar 2016
Posts: 1908

Complicated index calculation (weighted time)!

15 Jan 2017, 02:02

I have a very complicated question about calculating a weighted index for each firm (2,000+) and each year (over the 1999-2015 period). Thus, I try to split the question into a few pieces so that it is easier for me to describe my question and for you to provide answer.

Basically, I'd like to calculate the weighted time (proportion) of Directors appointed after the CEO assumed office for each firm and every year.

HTML Code:

 list, sep(0)

     +-----------------------------------------------------------------------------+
     | firm   year   Director    D_start       D_end   CEO    C_start        C_end |
     |-----------------------------------------------------------------------------|
  1. |    1   2010         D1   7/1/2008   8/31/2011    C1   1/1/2009   12/31/2012 |
  2. |    1   2010         D2   6/1/2009   5/31/2012    C1   1/1/2009   12/31/2012 |
  3. |    1   2010         D3   8/1/2009   7/31/2012    C1   1/1/2009   12/31/2012 |
  4. |    2   2010         T1   7/1/2008   1/31/2010    C2   1/1/2009   12/31/2012 |
  5. |    2   2010         T4   2/1/2010   8/31/2011    C2   1/1/2009   12/31/2012 |
  6. |    2   2010         T2   6/1/2009   5/31/2012    C2   1/1/2009   12/31/2012 |
  7. |    2   2010         T3   8/1/2009   7/31/2012    C2   1/1/2009   12/31/2012 |
     +-----------------------------------------------------------------------------+

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte firm int year str2 Director int(D_start D_end) str2 CEO int(C_start C_end)
1 2010 "D1" 17714 18870 "C1" 17898 19358
1 2010 "D2" 18049 19144 "C1" 17898 19358
1 2010 "D3" 18110 19205 "C1" 17898 19358
2 2010 "T1" 17714 18293 "C2" 17898 19358
2 2010 "T4" 18294 18870 "C2" 17898 19358
2 2010 "T2" 18049 19144 "C2" 17898 19358
2 2010 "T3" 18110 19205 "C2" 17898 19358
end
format %tdnn/dd/CCYY D_start
format %tdnn/dd/CCYY D_end
format %tdnn/dd/CCYY C_start
format %tdnn/dd/CCYY C_end

Take firm 1 in year 2000 as an example, D1 became a director before the CEO assumed office, while D2 and D3 were after CEO. In this simple case, the index is equal to (1+1)/3 (two out of three directors were later than the CEO entered the office). How can I use Stata to do this?

But things get more complicated because directors may change over time.

Take firm 2 for another instance, director T1 steps down on 1/31/2010 and T4 becomes director thereafter (from 2/1/2010). In this case, the index is calculated as (1 (T2)+1 (T3) + 11/12 (T2))/3. How can I use Stata to do this?

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

15 Jan 2017, 12:10

I don't fully understand your logic for firm 2. In particular, I don't understand why the denominator is 3 and not 4. I realize that director T1 was appointed before C2 started--but in the example for firm 1, that counted as a weight of zero and did not result in exclusion from the denominator (3 in that case). So I'm going to assume that you meant (1+1+11/12+0)/4 for firm 2.

Here is some code that, if my interpretation is correct, works in your example.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte firm int year str2 Director int(D_start D_end) str2 CEO int(C_start C_end) 1 2010 "D1" 17714 18870 "C1" 17898 19358 1 2010 "D2" 18049 19144 "C1" 17898 19358 1 2010 "D3" 18110 19205 "C1" 17898 19358 2 2010 "T1" 17714 18293 "C2" 17898 19358 2 2010 "T4" 18294 18870 "C2" 17898 19358 2 2010 "T2" 18049 19144 "C2" 17898 19358 2 2010 "T3" 18110 19205 "C2" 17898 19358 end format %tdnn/dd/CCYY D_start format %tdnn/dd/CCYY D_end format %tdnn/dd/CCYY C_start format %tdnn/dd/CCYY C_end // VERIFY START & END DATES ARE IN ORDER assert D_start <= D_end assert C_start <= C_end // VERIFY EACH DIRECTOR OVERLAPS WITH CEO AND WITH YEAR assert !((D_end < C_start) | (D_start > C_end)) assert !((D_end < mdy(1, 1, year)) | (D_start > mdy(12, 31, year))) // CALCULATE FRACTION OF THIS YEAR THAT DIRECTOR SERVED gen begin_this_year = max(D_start, mdy(1, 1, year)) gen end_this_year = min(D_end, mdy(12, 31, year)) gen wt = (end_this_year-begin_this_year+1)/(mdy(12,31,year)-mdy(1,1,year)+1) replace wt = 0 if D_start < C_start // ZERO IF DIRECTOR PRECEDES CEO egen overall_weight = mean(wt), by(CEO year)

Now, there are some other ways this code departs from your original specification. Instead of counting service in whole months, this code looks at the actual dates, so instead of 11/12 for T4, we have a weight based on the fraction of days in 2010 that T4 served. This is slightly different from 11/12.

I also put in some guarding -assert- statement at the beginning to verify that in every observation in the data set, the director actually served some time in the year specified by the year variable, and that the same is true of the CEO. That makes it possible to just calculate a simple average of the weights for each director to get the overall weight, without having to count up how many of the observations have to be excluded from the denominator as not in universe.

I also verified that each start date is not later than the corresponding end date.

Finally, there is one issue that is not raised by your example and I have not dealt with. In your data, each CEO starts on Jan 1 of a year and ends on Dec 31 of some year. I suspect that is not true throughout your data. It is not clear, if a CEO served only, say, 6 months out of a given year, how you want to figure the weights on the directors in this situation. But perhaps with this code as a start, you can figure out the rest.
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#3

15 Jan 2017, 16:56

Dear Clyde, many thanks four your helpful answer.

Regarding your first question:

Originally posted by Clyde Schechter View Post

I don't fully understand your logic for firm 2. In particular, I don't understand why the denominator is 3 and not 4. I realize that director T1 was appointed before C2 started--but in the example for firm 1, that counted as a weight of zero and did not result in exclusion from the denominator (3 in that case). So I'm going to assume that you meant (1+1+11/12+0)/4 for firm 2.

The reason for dividing by 3 (despite that there are "four" persons in the year 2010 for firm 2) is that T1 and T4 together serve one-year term (I am calculating the time in their position) as a normal director does. So, any suggestion to modify the code to account for this situation?

Originally posted by Clyde Schechter View Post

Now, there are some other ways this code departs from your original specification. Instead of counting service in whole months, this code looks at the actual dates, so instead of 11/12 for T4, we have a weight based on the fraction of days in 2010 that T4 served. This is slightly different from 11/12.

Thanks, I didn't make myself clear, but your final answer (in terms of days) is exactly what I need.

Originally posted by Clyde Schechter View Post

Finally, there is one issue that is not raised by your example and I have not dealt with. In your data, each CEO starts on Jan 1 of a year and ends on Dec 31 of some year. I suspect that is not true throughout your data. It is not clear, if a CEO served only, say, 6 months out of a given year, how you want to figure the weights on the directors in this situation. But perhaps with this code as a start, you can figure out the rest.

You are absolutely right (I made up the data). In fact, it is my next question. In practice, both directors (as in firm 2) and CEO may change in the same year. This part bothers me very much. I am trying to make up another simple data set to illustrate this possibility and seeking further help soon. Thanks again.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#4

15 Jan 2017, 17:40

The reason for dividing by 3 (despite that there are "four" persons in the year 2010 for firm 2) is that T1 and T4 together serve one-year term (I am calculating the time in their position) as a normal director does. So, any suggestion to modify the code to account for this situation?

Well, my first thought is to add up the total number of director days and divide that by the length of the year (365 or 366) to get the number of director positions. That would work with your example. But I imagine that there are situations where that will not be correct. For example, if a director withdraws or dies unexpectedly, there may be a lag before a replacement is recruited. And, less likely but I suppose possible, there might be occasional situations where a director and his/her replacement overlap briefly, or where the size of the board is changed in the middle of a year. But maybe what you need would be something like the proportion of all director days for the year that are served by directors who started after the CEO. If that would work for you, then:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte firm int year str2 Director int(D_start D_end) str2 CEO int(C_start C_end) 1 2010 "D1" 17714 18870 "C1" 17898 19358 1 2010 "D2" 18049 19144 "C1" 17898 19358 1 2010 "D3" 18110 19205 "C1" 17898 19358 2 2010 "T1" 17714 18293 "C2" 17898 19358 2 2010 "T4" 18294 18870 "C2" 17898 19358 2 2010 "T2" 18049 19144 "C2" 17898 19358 2 2010 "T3" 18110 19205 "C2" 17898 19358 end format %tdnn/dd/CCYY D_start format %tdnn/dd/CCYY D_end format %tdnn/dd/CCYY C_start format %tdnn/dd/CCYY C_end // VERIFY START & END DATES ARE IN ORDER assert D_start <= D_end assert C_start <= C_end // VERIFY EACH DIRECTOR OVERLAPS WITH CEO AND WITH YEAR assert !((D_end < C_start) | (D_start > C_end)) assert !((D_end < mdy(1, 1, year)) | (D_start > mdy(12, 31, year))) // CALCULATE # OF DAYS OF THIS YEAR THAT DIRECTOR SERVED gen days_served = min(D_end, mdy(12, 31, year)) - max(D_start, mdy(1, 1, year)) + 1 by CEO year, sort: egen numerator = total(cond(D_start < C_start), 0, days_served) by CEO year: egen denominator = total(days_served) gen wt = numerator/denominator

Of course, that still doesn't deal with a CEO who doesn't serve the full year. When you decide how you want to handle that situation, post back and I'll see if I can figure that part out.
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#5

16 Jan 2017, 01:33

Originally posted by Clyde Schechter View Post

Well, my first thought is to add up the total number of director days and divide that by the length of the year (365 or 366) to get the number of director positions. That would work with your example. But I imagine that there are situations where that will not be correct. For example, if a director withdraws or dies unexpectedly, there may be a lag before a replacement is recruited. And, less likely but I suppose possible, there might be occasional situations where a director and his/her replacement overlap briefly, or where the size of the board is changed in the middle of a year. But maybe what you need would be something like the proportion of all director days for the year that are served by directors who started after the CEO. If that would work for you, then:

Of course, that still doesn't deal with a CEO who doesn't serve the full year. When you decide how you want to handle that situation, post back and I'll see if I can figure that part out.

Dear Clyde,

The new code worked just fine. Thanks a lot. I know we are almost there.

I have made some modifications of the orginal data to demonstrate another complication that both CEOs and driectors can change in a particular year. By focusing on firm 2, in the year 2010, T1 left the position and T4 took the position immediately (At this moment, let's first ignore your concerns such as overlap, gaps, and so on).

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte firm int year str2 Director int(D_start D_end) str3 CEO int(C_start C_end) 1 2010 "D1" 17714 18870 "C1" 18110 19358 1 2010 "D2" 18049 19144 "C1" 18110 19358 1 2010 "D3" 18110 19205 "C1" 18110 19358 2 2010 "T1" 17714 18293 "C2a" 17898 18808 2 2010 "T4" 18294 19389 "C2a" 17898 18808 2 2010 "T2" 18049 19144 "C2a" 17898 18808 2 2010 "T3" 18110 19205 "C2a" 17898 18808 2 2011 "T4" 18294 19389 "C2a" 17898 18808 2 2011 "T2" 18049 19144 "C2a" 17898 18808 2 2011 "T3" 18110 19205 "C2a" 17898 18808 2 2012 "T4" 18294 19236 "C2b" 18809 19904 2 2012 "T5" 19237 19966 "C2b" 18809 19904 2 2012 "T2" 18049 19144 "C2b" 18809 19904 2 2012 "T6" 19145 19874 "C2b" 18809 19904 2 2012 "T3" 18110 19205 "C2b" 18809 19904 2 2012 "T7" 19206 19935 "C2b" 18809 19904 end format %tdnn/dd/CCYY D_start format %tdnn/dd/CCYY D_end format %tdnn/dd/CCYY C_start format %tdnn/dd/CCYY C_end

In the year 2011, the CEO (C2a) left the office (2011/6/30), and a new CEO (C2b) assumed office (2011/7/1). Since it is difficult (if not impossible) to merge a lot of directors (in my case, in a file) and a CEO (in another file) in a particular year, I reported the old CEO but notice that, starting from 2011/7/1 (2011/6/30 plus 1 day), a new CEO is taking office.
Here, I guess that I need to calculate two separate indcies, one for the C2a tenure and the other for C2b tenure, and then add them together.

In the year 2012, T4, T2 and T3 left the job, and T5, T6 and T7 became the new directors.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#6

16 Jan 2017, 11:01

Here, I guess that I need to calculate two separate indcies, one for the C2a tenure and the other for C2b tenure, and then add them together.

But given that the CEOs didn't serve for a full year, it isn't clear to me what each of these separate indices should be. Could you explain how you would do that in this example?
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#7

16 Jan 2017, 17:12

Originally posted by Clyde Schechter View Post

But given that the CEOs didn't serve for a full year, it isn't clear to me what each of these separate indices should be. Could you explain how you would do that in this example?

First, please note that I changed the starting date of CEO (C2a) to avoid confusion below.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte firm int year str2 Director int(D_start D_end) str3 CEO int(C_start C_end) 1 2010 "D1" 17714 18870 "C1" 18110 19358 1 2010 "D2" 18049 19144 "C1" 18110 19358 1 2010 "D3" 18110 19205 "C1" 18110 19358 2 2010 "T1" 17714 18293 "C2a" 17898 18778 2 2010 "T4" 18294 19389 "C2a" 17898 18778 2 2010 "T2" 18049 19144 "C2a" 17898 18778 2 2010 "T3" 18110 19205 "C2a" 17898 18778 2 2011 "T4" 18294 19389 "C2a" 17898 18778 2 2011 "T2" 18049 19144 "C2a" 17898 18778 2 2011 "T3" 18110 19205 "C2a" 17898 18778 2 2012 "T4" 18294 19236 "C2b" 18779 19874 2 2012 "T5" 19237 19966 "C2b" 18779 19874 2 2012 "T2" 18049 19144 "C2b" 18779 19874 2 2012 "T6" 19145 19874 "C2b" 18779 19874 2 2012 "T3" 18110 19205 "C2b" 18779 19874 2 2012 "T7" 19206 19935 "C2b" 18779 19874 end format %tdnn/dd/CCYY D_start format %tdnn/dd/CCYY D_end format %tdnn/dd/CCYY C_start format %tdnn/dd/CCYY C_end

My initial thought is to divide the year 2011 (the CEO did not serve for a full year) into two subperiods.

The first period is before 2011/5/31, where the CEO is C2a (old CEO); and the second is after 2011/6/1, where the CEO is C2b (new CEO).

I'd like to calculate the sub-index for the first and second period, respectively.

Suppose that the sub-index value is I1 in the first period and I2 in the second period, then the total index for the year 2011 is calculated as a weighted average (of sub-indices I1 and I2) such as (approximately using months in a year) I1*(5/12)+I2*(7/12). But, as you know, using days is preferable.

I think that the calculation of the index for each subperiod is similar to the case that CEO serves a full year, but note that in our first subperiod the valid days are (roughly) 150 (5 months) rather than 365 days (a year), and in our second subperiod the valid days are (roughly) 210+ (7 months) rather than 365 days (a year).

I hope this explanation is clear. However, please fell free to let me know if you have further questions. Thanks again.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30095

16 Jan 2017, 17:54

I'm really not sure if I understand what's needed her correctly or not. But is this it?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte firm int year str2 Director int(D_start D_end) str3 CEO int(C_start C_end)
1 2010 "D1" 17714 18870 "C1"  18110 19358
1 2010 "D2" 18049 19144 "C1"  18110 19358
1 2010 "D3" 18110 19205 "C1"  18110 19358
2 2010 "T1" 17714 18293 "C2a" 17898 18778
2 2010 "T4" 18294 19389 "C2a" 17898 18778
2 2010 "T2" 18049 19144 "C2a" 17898 18778
2 2010 "T3" 18110 19205 "C2a" 17898 18778
2 2011 "T4" 18294 19389 "C2a" 17898 18778
2 2011 "T2" 18049 19144 "C2a" 17898 18778
2 2011 "T3" 18110 19205 "C2a" 17898 18778
2 2012 "T4" 18294 19236 "C2b" 18779 19874
2 2012 "T5" 19237 19966 "C2b" 18779 19874
2 2012 "T2" 18049 19144 "C2b" 18779 19874
2 2012 "T6" 19145 19874 "C2b" 18779 19874
2 2012 "T3" 18110 19205 "C2b" 18779 19874
2 2012 "T7" 19206 19935 "C2b" 18779 19874
end
format %tdnn/dd/CCYY D_start
format %tdnn/dd/CCYY D_end
format %tdnn/dd/CCYY C_start
format %tdnn/dd/CCYY C_end

//    VERIFY START & END DATES ARE IN ORDER
assert D_start <= D_end
assert C_start <= C_end

//    VERIFY EACH DIRECTOR OVERLAPS WITH CEO AND WITH YEAR
assert !((D_end < C_start) | (D_start > C_end))
assert !((D_end < mdy(1, 1, year)) | (D_start > mdy(12, 31, year)))

gen ceo_begin_year = max(C_start, mdy(1,1,year))
gen ceo_end_year = min(C_end, mdy(12,31,year))
gen days_served_with_ceo = min(D_end, ceo_end_year) - max(D_start, ceo_begin_year) + 1
by firm year, sort: egen denominator = total(days_served_with_ceo)
by firm year: egen numerator = total(cond(D_start < C_start), 0, days_served_with_ceo)
gen wt = numerator/denominator

I'm not sure if this gets you what you need or not. If this isn't it, post back showing where and how my results differ from what you need and I'll give it another try.

By the way, there is an inconsistency in the D_end date for director T4: it is shown as 8/31/2012 in 2012, and as 1/31/2013 in 2010 and 2011. That shouldn't be, should it? In particular, the 8/31/2012 end date occurs in year 2012. If it's really supposed to be 1/31/2013, then the calculation for year 2012 is wrong.

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#9

16 Jan 2017, 18:18

Originally posted by Clyde Schechter View Post

I'm not sure if this gets you what you need or not. If this isn't it, post back showing where and how my results differ from what you need and I'll give it another try.

By the way, there is an inconsistency in the D_end date for director T4: it is shown as 8/31/2012 in 2012, and as 1/31/2013 in 2010 and 2011. That shouldn't be, should it? In particular, the 8/31/2012 end date occurs in year 2012. If it's really supposed to be 1/31/2013, then the calculation for year 2012 is wrong.

First, thanks for the answer. I will check it out (in case I find some problems, I will post another inquiry).

Second, sorry for my mistake (for T4). For consistency reason, I prefer to specify the term of T4 to be ended on 8/31/2012 so that T5 becomes the successive director. The data set should be:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte firm int year str2 Director int(D_start D_end) str3 CEO int(C_start C_end) 1 2010 "D1" 17714 18870 "C1" 18110 19358 1 2010 "D2" 18049 19144 "C1" 18110 19358 1 2010 "D3" 18110 19205 "C1" 18110 19358 2 2010 "T1" 17714 18293 "C2a" 17898 18778 2 2010 "T4" 18294 19236 "C2a" 17898 18778 2 2010 "T2" 18049 19144 "C2a" 17898 18778 2 2010 "T3" 18110 19205 "C2a" 17898 18778 2 2011 "T4" 18294 19236 "C2a" 17898 18778 2 2011 "T2" 18049 19144 "C2a" 17898 18778 2 2011 "T3" 18110 19205 "C2a" 17898 18778 2 2012 "T4" 18294 19236 "C2b" 18779 19874 2 2012 "T5" 19237 19966 "C2b" 18779 19874 2 2012 "T2" 18049 19144 "C2b" 18779 19874 2 2012 "T6" 19145 19874 "C2b" 18779 19874 2 2012 "T3" 18110 19205 "C2b" 18779 19874 2 2012 "T7" 19206 19935 "C2b" 18779 19874 end format %tdnn/dd/CCYY D_start format %tdnn/dd/CCYY D_end format %tdnn/dd/CCYY C_start format %tdnn/dd/CCYY C_end

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#10

17 Jan 2017, 18:16

Originally posted by Clyde Schechter View Post

I'm really not sure if I understand what's needed her correctly or not. But is this it?

I'm not sure if this gets you what you need or not. If this isn't it, post back showing where and how my results differ from what you need and I'll give it another try.

Dear Clyde,

Please use the following data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte firm int year str2 Director int(D_start D_end) str3 CEO int(C_start C_end) 1 2010 "D1" 17714 18870 "C1" 18110 19358 1 2010 "D2" 18141 19144 "C1" 18110 19358 1 2010 "D3" 18110 19205 "C1" 18110 19358 2 2010 "T1" 17714 18293 "C2a" 17898 18778 2 2010 "T4" 18294 19236 "C2a" 17898 18778 2 2010 "T2" 17684 19144 "C2a" 17898 18778 2 2010 "T3" 18110 19205 "C2a" 17898 18778 2 2011 "T4" 18294 19236 "C2a" 17898 18778 2 2011 "T2" 17684 19144 "C2a" 17898 18778 2 2011 "T3" 18110 19205 "C2a" 17898 18778 2 2012 "T4" 18294 19236 "C2b" 18779 19874 2 2012 "T5" 19237 19966 "C2b" 18779 19874 2 2012 "T2" 17684 19144 "C2b" 18779 19874 2 2012 "T6" 19145 19874 "C2b" 18779 19874 2 2012 "T3" 18110 19205 "C2b" 18779 19874 2 2012 "T7" 19206 19935 "C2b" 18779 19874 end format %tdnn/dd/CCYY D_start format %tdnn/dd/CCYY D_end format %tdnn/dd/CCYY C_start format %tdnn/dd/CCYY C_end

If I read your code correctly, while the code calculated the I1 index in the first sub-period, it didn't take into account the second subperiod of 2011 (so that I2 is missing).
I guess that we should first check whether there is a CEO change in every particular year.
If no, your original code is perfectly fine.

If yes (there is a change in CEOs, so that there are two CEOs in a particular year), we need to split the whole year into two subperiods as I mentioned above (according to C_end of first CEO, or to C_start (equals to C_end+1 of first CEO) of the second CEO.

It should be clear that the first subperiod (in our case in year 2011, it is before 2011/5/31) is still CEO2a's tenure. For this CEO, we need to calculate the first subindex I1, which is "roughly"( 5 (T4)+5 (T3) )/ (5 (T4)+5 (T3)+5 (T2) )=2/3 (in terms of months), since the starting dates of T4 and T3 are later than that of C2a, and the starting date of T2 is earlier than that of C2a (so is excluded for calculation).

For the remaining 7 months, it is the tenure of the second CEO (C2b). Similarly, we need to calculate the subindex I2 for the second subperiod. Since the starting date of the second CEO (C2b) is 2011/6/1 (=2011/5/31+1), we can find that all existing directors' starting dates are earlier than that of C2b, so that I2 is roughly equal to "( 7 (T4)+7 (T3)+7 (T2) )/( 7 (T4)+7 (T3)+7 (T2) )=1.

Thus, for the whole 2011 year, we calculate a weighted average index from I1 and I2, with weights being 5/(5+7) and 7/(5+7), respectively. The resulting overall index is approximately 0.8611. Note, 5 denotes the tenure of the first CEO (C2a) and 7 denotes the tenure of the second CEO (C2b).

Please do not hesitate to let me know if you find any questions. Thanks.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30095

#11

17 Jan 2017, 19:16

There's something wrong with your example data here. Your data for 2011 ends with CEO C2a's tenure. There is nothing at all shown for the final 7 months of 2011 in Firm 2. The C2b data is all in 2012. I thought of just replacing 2012 by 2011--but that leads to a problem because then we have data for year = 2011 with directors T5, T6, & T7 who don't even start service until 2012. So, to make a workable data example, I also moved their start dates up a year into 2011.

Now when you say

so that I2 is roughly equal to "( 7 (T4)+7 (T3)+7 (T2) )/( 7 (T4)+7 (T3)+7 (T2) )=1.

I think that is backwards, because T2, T3, and T4 all started as directors before CEO C2b began service, so they do not appear in the numerator of the calculation. Rather I believe I2 should be based on T5, T6, and T7 in the numerator and T2 through T7 in the denominator, so (in my revised data below) it would be (4 (T5) + 7 (T6) + 5 (T7)) / (7 (T2) + 4 (T5) + 7 (T3) + 7 (T6) + 7 (T4) + 5(T7)) = 16/37 which is approximately .43. Then the weighted average of I1 (0.66667) and I2 (0.43) becomes (5*.66667 + 7*0.43)/12 which is approximately .53.

So with this revised data, I think this code does it:

Code:

* Example generated by -dataex-. To install: ssc install dataex
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte firm int year str2 Director int(D_start D_end) str3 CEO int(C_start C_end)
1 2010 "D1" 17714 18870 "C1"  18110 19358
1 2010 "D2" 18141 19144 "C1"  18110 19358
1 2010 "D3" 18110 19205 "C1"  18110 19358
2 2010 "T1" 17714 18293 "C2a" 17898 18778
2 2010 "T4" 18294 19236 "C2a" 17898 18778
2 2010 "T2" 17684 19144 "C2a" 17898 18778
2 2010 "T3" 18110 19205 "C2a" 17898 18778
2 2011 "T4" 18294 19236 "C2a" 17898 18778
2 2011 "T2" 17684 19144 "C2a" 17898 18778
2 2011 "T3" 18110 19205 "C2a" 17898 18778
2 2011 "T4" 18294 19236 "C2b" 18779 19874
2 2011 "T5" 18872 19966 "C2b" 18779 19874
2 2011 "T2" 17684 19144 "C2b" 18779 19874
2 2011 "T6" 18780 19874 "C2b" 18779 19874
2 2011 "T3" 18110 19205 "C2b" 18779 19874
2 2011 "T7" 18841 19935 "C2b" 18779 19874
end
format %tdnn/dd/CCYY D_start
format %tdnn/dd/CCYY D_end
format %tdnn/dd/CCYY C_start
format %tdnn/dd/CCYY C_end

//    VERIFY START & END DATES ARE IN ORDER
assert D_start <= D_end
assert C_start <= C_end

//    VERIFY EACH DIRECTOR OVERLAPS WITH CEO AND WITH YEAR
assert !((D_end < C_start) | (D_start > C_end))
assert !((D_end < mdy(1, 1, year)) | (D_start > mdy(12, 31, year)))

gen ceo_begin_year = max(C_start, mdy(1,1,year))
gen ceo_end_year = min(C_end, mdy(12,31,year))
gen ceo_days = ceo_end_year - ceo_begin_year + 1
format ceo_*_year %td
gen days_served_with_ceo = min(D_end, ceo_end_year) - max(D_start, ceo_begin_year) + 1
by firm CEO year, sort: egen denominator = total(days_served_with_ceo)
by firm CEO year: egen numerator = total(cond(D_start < C_start), 0, days_served_with_ceo)
gen CEO_wt = numerator/denominator

//    NOW CALCULATE TIME-WEIGHTED AVERAGE OF CEO WTS TO CREATE FIRM-YEAR WEIGHT
egen flag = tag(firm year CEO)
by firm year, sort: egen fy_numerator = total(ceo_days*CEO_wt*flag)
by firm year: egen fy_denominator = total(ceo_days*flag)
gen firm_year_wt = fy_numerator/fy_denominator

I believe firm_year_wt is what you are looking for.

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#12

17 Jan 2017, 20:05

Originally posted by Clyde Schechter View Post

There's something wrong with your example data here. Your data for 2011 ends with CEO C2a's tenure. There is nothing at all shown for the final 7 months of 2011 in Firm 2. The C2b data is all in 2012.

The raw data offer both starting and end dates for each CEO. In general, the tenures are continuous (I mean, in my case above, the end date of C2a is 2011/5/31, and in the next day, the new CEO assumes office) for successive CEOs (C2a and C2b). I "expand" the data. In general, there is a single CEO in a particular year. However, in case of CEOs change, there are supposed two CEOs (C2a and C2b) in the same year (say, 2011). In order to "merge" with data on directors, I have to "select/choose" a CEO in a particular year (I am not sure if we can handle the case that merging with m:m, so that we have two CEOs in a year). That's the reason that C2b does not appear in 2011 (but actually (s)he is the CEO in the last 7 months!).

As mentioned above, I think that we can identify this situation by first checking whether a CEO serves a full year. If yes, the Stata code is fine. If not, it means, in the same year, there is another new CEO (starting after the old CEO's end date). I conjecture that we can use this information to obtain the sub-index I2 in the second subperiod. How do you think about this possibility?

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#13

17 Jan 2017, 21:03

In the modified example data I used in #11, in 2011 in firm 2, there are two CEOs, C2a and C2b, and C2b's term begins the day after. The code in #11 first calculates a weight for each CEO--there is no need to explicitly test for how many CEO's serve in a year: it calculates a weight for however many there are. Then it takes a time-weighted average of those weights over the firm year. I'm pretty sure it's what you want. Why don't you hand-calculate the results you want for this example and compare them to mine.

As for building the data set, it sounds like you are starting with a directors data set and a CEO data set. The directors data set lists all the directors in each firm in each year. The CEO data set lists however many CEOs there were in that firm in that year. Each data set contains start and end dates. You should emphatically not use -merge m:m- to put them together. Except in very unusual circumstances (which have only happened to me once in 22 years of using Stata!) -merge m:m- produces only jumbled garbage, nothing useful. The command for this situation is -joinby- and you will probably need to use firm and year as the join key variables. That should enable you to match each director for a firm in any year with each CEO who served that firm in the same year. Then you will need to clean out any resulting observations where the CEO and the director do not overlap in that year. It's important to get this right. If you have a data set where CEO 2b supposedly serves in the end of 2011, but there are no 2011 records for his/her firm in 2011 that include him/her (as happend in #10), then you have to build out the data set to create such records, because without them there is no way to compute the index you want.

So something like this for putting them together:

Code:

// CREATE DIRECTORS DATA SET * Example generated by -dataex-. To install: ssc install dataex clear input byte firm int year str2 Director int(D_start D_end) 1 2010 "D1" 17714 18870 1 2010 "D2" 18141 19144 1 2010 "D3" 18110 19205 2 2010 "T1" 17714 18293 2 2010 "T4" 18294 19236 2 2010 "T2" 17684 19144 2 2010 "T3" 18110 19205 2 2011 "T4" 18294 19236 2 2011 "T2" 17684 19144 2 2011 "T3" 18110 19205 2 2011 "T4" 18294 19236 2 2011 "T5" 18872 19966 2 2011 "T2" 17684 19144 2 2011 "T6" 18780 19874 2 2011 "T3" 18110 19205 2 2011 "T7" 18841 19935 end format %tdnn/dd/CCYY D_start format %tdnn/dd/CCYY D_end duplicates drop tempfile directors save `directors' // CREATE CEO DATA SET * Example generated by -dataex-. To install: ssc install dataex clear input byte firm int year str3 CEO int(C_start C_end) 1 2010 "C1" 18110 19358 1 2010 "C1" 18110 19358 1 2010 "C1" 18110 19358 2 2010 "C2a" 17898 18778 2 2010 "C2a" 17898 18778 2 2010 "C2a" 17898 18778 2 2010 "C2a" 17898 18778 2 2011 "C2a" 17898 18778 2 2011 "C2a" 17898 18778 2 2011 "C2a" 17898 18778 2 2011 "C2b" 18779 19874 2 2011 "C2b" 18779 19874 2 2011 "C2b" 18779 19874 2 2011 "C2b" 18779 19874 2 2011 "C2b" 18779 19874 2 2011 "C2b" 18779 19874 end format %tdnn/dd/CCYY C_start format %tdnn/dd/CCYY C_end duplicates drop tempfile ceo save `ceo' // PUT THEM TOGETHER use `ceo', clear joinby firm year using `directors' // REMOVE OBSERVATIONS WHERE CEO & DIRECTOR DO NOT OVERLAP drop if ((D_end < C_start) | (D_start > C_end))

And here's how I would build out the data set so that there are records for each CEO in each firm-year of his/her service. It starts with a "minimal" CEO data set that includes all of the CEOs from your example in #10, but gives only the firm, id, start date and end date:

Code:

// LOAD MINIMAL CEO DATA clear input byte firm str3 CEO int(C_start C_end) 1 "C1" 18110 19358 2 "C2a" 17898 18778 2 "C2b" 18779 19874 end format %tdnn/dd/CCYY C_start format %tdnn/dd/CCYY C_end // BUILD OUT THE DATA TO PROVIDE ONE RECORD // FOR EACH YEAR THE CEO SERVES THE FIRM gen first_year = yofd(C_start) gen last_year = yofd(C_end) expand last_year - first_year + 1 by firm CEO, sort: gen year = first_year + _n - 1 drop first_year last_year

You would save that result as your CEO data file to use for joining with the directors file (which you might need to build out in an analogous way). The resulting joined file would be the input to the code for calculating the weights.
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#14

18 Jan 2017, 04:10

Originally posted by Clyde Schechter View Post

In the modified example data I used in #11, in 2011 in firm 2, there are two CEOs, C2a and C2b, and C2b's term begins the day after. The code in #11 first calculates a weight for each CEO--there is no need to explicitly test for how many CEO's serve in a year: it calculates a weight for however many there are. Then it takes a time-weighted average of those weights over the firm year. I'm pretty sure it's what you want. Why don't you hand-calculate the results you want for this example and compare them to mine.

Dear Clyde,

I can not appreciate more for your very helpful advice. The question has bothered me for months. As suggested, I manually calculate the co-option index (in a rough way), and find that the results are quite similar to yours.

Another very similar index is the non-cop-opted independence. It amounts to calculating the fraction of independent directors who took position BEFORE the CEO assumed office. In particular, I only keep the directors who are independent, and the corresponding CEO information as above. How do I change code in #11 to this end?

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#15

18 Jan 2017, 09:32

If I understand this non co-opted index, it works the same as the one we've been working on up to now, but it includes in the numerator only those directors who started before the CEO, which is the opposite of what we have done before. I believe you can do this by changing:

Code:

by firm CEO year: egen numerator = total(cond(D_start < C_start), 0, days_served_with_ceo)

to

Code:

by firm CEO year: egen numerator = total(cond(D_start >= C_start), 0, days_served_with_ceo)

Everything else would work the same. As with the index worked on in the earlier posts of this thread, this is calculated as the fraction of all director-days during the CEO's term that involved non-co-opted directors. If it is to be based on the fraction of all directors, that would be a different calculation. If you want the code for an index based on directors, not director-days, post back and I'll work on that.
Comment

Announcement

Complicated index calculation (weighted time)!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment