Creating a dummy variable indicating the last change in variables with multiple changes in values

Savvas Papadopoulos

Join Date: Jun 2019

Posts: 6
#1

Creating a dummy variable indicating the last change in variables with multiple changes in values

11 Jun 2019, 03:18

Hello. This is my first post and I would really appreciate any help. I have a panel dataset with information on CEO transitions and CEO marital status. What I want to do first is in firms with multiple CEO transitions to create a dummy variable that will indicate only the last transition. The variables I already have are "firm_id", "year", "ceo_id" and "marital_status". Based on this information I want to create the variable "last_ceo_change". After doing this, I want to create two new dummy variables indicating, for the last ceo transition, when we move from a single to a married CEO, as well as when we move from a married to a single CEO (the variables "single_to_married_last_ceo_change" and "married_to_single_last_ceo_change" respectively). Here is how my data looks like:
firm_id year ceo_id marital_status last_ceo_change single_to_married_last_ceo_chang married_to_single_last_ceo_chang
1 2005 456123 1
1 2006 456123 1
1 2007 456123 1
1 2008 389264 0 1 1 0
1 2009 389264 0 1 1 0
1 2010 389264 0 1 1 0
1 2011 389264 0 1 1 0
1 2012 785214 1 1 1 0
1 2013 785214 1 1 1 0
2 2005 654827 1
2 2006 654827 1
2 2007 142536 0
2 2008 142536 0
2 2009 999632 1 1 0 1
2 2010 999632 1 1 0 1
2 2011 885712 0 1 0 1
2 2012 885712 0 1 0 1
2 2013 885712 0 1 0 1
3 2005 986325 0 1 1 0
3 2006 986325 0 1 1 0
3 2007 986325 0 1 1 0
3 2008 986325 0 1 1 0
3 2009 986325 0 1 1 0
3 2010 335964 1 1 1 0
3 2011 335964 1 1 1 0
3 2012 335964 1 1 1 0
3 2013 335964 1 1 1 0
4 2005 235841 0
4 2006 235841 0
4 2007 235841 0
4 2008 689324 0
4 2009 689324 0
4 2010 995423 0 1 1 0
4 2011 995423 0 1 1 0
4 2012 100235 1 1 1 0
4 2013 100235 1 1 1 0

Thank you in advance for your help!
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

11 Jun 2019, 10:13

What you say you want and what you show in your example are not the same. You say you want an indicator ("dummy") variable to identify the last ceo change. But what you show identifies all observations with any ceo except the first. So I'm going to just ignore your example and write code that calculates what you said in words.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte firm_id int year long ceo_id byte marital_status
1 2005 456123 1
1 2006 456123 1
1 2007 456123 1
1 2008 389264 0
1 2009 389264 0
1 2010 389264 0
1 2011 389264 0
1 2012 785214 1
1 2013 785214 1
2 2005 654827 1
2 2006 654827 1
2 2007 142536 0
2 2008 142536 0
2 2009 999632 1
2 2010 999632 1
2 2011 885712 0
2 2012 885712 0
2 2013 885712 0
3 2005 986325 0
3 2006 986325 0
3 2007 986325 0
3 2008 986325 0
3 2009 986325 0
3 2010 335964 1
3 2011 335964 1
3 2012 335964 1
3 2013 335964 1
4 2005 235841 0
4 2006 235841 0
4 2007 235841 0
4 2008 689324 0
4 2009 689324 0
4 2010 995423 0
4 2011 995423 0
4 2012 100235 1
4 2013 100235 1
end

//    FIND LAST CEO CHANGE
by firm_id (year), sort: gen ceo_change = sum(ceo_id != ceo_id[_n-1]) - 1
by firm_id (year): gen last_ceo_change = (ceo_change == ceo_change[_N]) ///
    & (ceo_change[_n-1] != ceo_change[_N])
    
//    NOTE: ASSUMING MARITAL STUATUS 1 = SINGLE, 0 = MARRIED
//    CHANGE CODE ACCORDINGLY IF I HAVE THIS BACKWARDS
by firm_id (year): egen byte single_to_married = ///
    max(cond(last_ceo_change == 1, marital_status == 0 & marital_status[_n-1] == 1, .))
by firm_id (year): egen byte married_to_single = ///
    max(cond(last_ceo_change == 1, marital_status == 1 & marital_status[_n-1] == 0, .))

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Comment

Savvas Papadopoulos

Join Date: Jun 2019

Posts: 6
#3

12 Jun 2019, 02:46

Thank you very mush for your help and advise Clyde. I will do as you say in the future. I have to apologize for not being very clear about what exactly I want to do. The code you gave me indeed identifies the last CEO change, but it is not exactly what I want. What I am interested in is a code to create a dummy variable that will be 1 for all the observations referring to the last two CEOs in each firm. If you see in the data, there is a firm with only 1 CEO change (i.e., firm 3), a firm with 2 changes (i.e., firm 1) and two firms with 3 changes (i.e., firms 2 and 4). So, for firm 3 I want all observations be equal to 1 since there is only one change and I want to use them all in my analysis. Meanwhile, for firms 1, 2 and 4 I want the variable to be equal 1 for all the observations referring only to the last two CEOs, since I only want these observations for my analysis. Subsequently, I want to create two dummy variables: one that will be equal 1 for all the observations referring to the last two CEOs whenever the former CEO is single (i.e., 0 in my data set) and the latter is married (i.e., 1 in my data set), and a second that will equal 1 for all the observations referring to the last two CEOs whenever the former CEO is married and the latter is single. In cases with only one change, as with firm 3, I want all observations to be equal either 1 or 0 depending on whether the former CEO is married and the latter is single or vice versa. This is what I wanted to show with my data set, but unfortunately I didn't explain it adequately. I attach again the data set with the three variables I want to create (i.e, the last three variables).

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte firm_id int year long ceo_id byte(marital_status last_two_ceos single_married_last_two_ceo married_single_last_two_ceo) 1 2005 456123 1 0 0 0 1 2006 456123 1 0 0 0 1 2007 456123 1 0 0 0 1 2008 389264 0 1 1 0 1 2009 389264 0 1 1 0 1 2010 389264 0 1 1 0 1 2011 389264 0 1 1 0 1 2012 785214 1 1 1 0 1 2013 785214 1 1 1 0 2 2005 654827 1 0 0 0 2 2006 654827 1 0 0 0 2 2007 142536 0 0 0 0 2 2008 142536 0 0 0 0 2 2009 999632 1 1 0 1 2 2010 999632 1 1 0 1 2 2011 885712 0 1 0 1 2 2012 885712 0 1 0 1 2 2013 885712 0 1 0 1 3 2005 986325 0 1 1 0 3 2006 986325 0 1 1 0 3 2007 986325 0 1 1 0 3 2008 986325 0 1 1 0 3 2009 986325 0 1 1 0 3 2010 335964 1 1 1 0 3 2011 335964 1 1 1 0 3 2012 335964 1 1 1 0 3 2013 335964 1 1 1 0 4 2005 235841 0 0 0 0 4 2006 235841 0 0 0 0 4 2007 235841 0 0 0 0 4 2008 689324 0 0 0 0 4 2009 689324 0 0 0 0 4 2010 995423 0 1 1 0 4 2011 995423 0 1 1 0 4 2012 100235 1 1 1 0 4 2013 100235 1 1 1 0 end
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

12 Jun 2019, 12:11

OK. This will do it:

Code:

//    CREATE AN INDICATOR FOR THE LAST TWO CEO'S
by firm_id (year), sort: gen int ceo_num = sum(ceo_id != ceo_id[_n-1])
by firm_id: gen last_two_ceos = inlist(ceo_num, ceo_num[_N], ceo_num[_N]-1)

//    CREATE INDICATORS FOR SINGLE TO MARRIED AND MARRIED TO SINGLE
//    CHANGE IN STATUS BETWEEN CEO'S
//    ASSUMING 1 = SINGLE, 0 = MARRIED
by firm_id (year): gen byte penultimate = (ceo_num == ceo_num[_N] - 1)
by firm_id (year): egen byte marital_pre = max(cond(penultimate, marital_status, .))
by firm_id (year): gen byte marital_post = marital_status[_N]
gen byte single_to_married = (marital_pre == 1 & marital_post == 0)
gen byte married_to_single = (marital_pre == 0 & marital_post == 1)

Of course, you can drop the intermediate variable ceo_num, penultimate, marital_pre, and marital_post if you have no other need for them..

Again, I may have misinterpreted the marital status variable: I take it that 1 = single and 0 = married, but perhaps you mean it the other way around. If so, exchange the values of 0 and 1 in the last two commands shown.

Comment

Savvas Papadopoulos

Join Date: Jun 2019

Posts: 6
#5

13 Jun 2019, 06:36

The code works perfect! Thank you very much Clyde. One additional question though. Does the code works even when there are gaps in the data as when there are no missing observations? I am asking this because in my original data set there are some missing observations in the "marital_status" variable, and I am wondering if the code you gave me takes into consideration missing observations.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

13 Jun 2019, 11:24

I guess that depends on what you mean by "works" when there are gaps in the data. And we have to distinguish gaps (meaning that an observation for some year is simply not present in the data) from observations with missing values of marital status.

The code does not deal explicitly with gaps. But let's think about this particular problem--I think that gaps will not matter unless they are so long that the entire tenure of a CEO is skipped. For example, the last two CEO's will be the same, whether there are missing years, unless the entire tenure of a CEO falls in the missing-years gap. If you are just missing some years within the tenure of a single CEO, or are missing the last few years of a CEO and the first few years of that CEO's successor, we will still have the same last two CEOs. And, of course, if we have the same last two CEOs, then the other variables will also be unchanged.

Do you have any gaps so large that they contain the entire tenure of a CEO? (And if you do, how do you know that?)

Now, a possibly different situation arises when there is not a gap in the data but the value of the marital status variable is missing in some years. My code deals with marital status by assuming that it does not change during the tenure of a CEO. I suppose that is an unrealistic assumption: people do get married and divorced while serving as CEO. I'm not sure how you want to handle this situation. If the second to last CEO started out married and then got divorced, and the new CEO came in as single but got married, is this a single to single transition? married to single? single to married?

I probably should wait for your response to that question. But as I am going to be tied up much of the rest of the day, I offer you code that looks at the last reported marital status of the penultimate CEO and the first reported marital status of the final CEO to characterize changes.

Code:

// CREATE AN INDICATOR FOR THE LAST TWO CEO'S by firm_id (year), sort: gen int ceo_num = sum(ceo_id != ceo_id[_n-1]) by firm_id: gen last_two_ceos = inlist(ceo_num, ceo_num[_N], ceo_num[_N]-1) // CREATE INDICATORS FOR SINGLE TO MARRIED AND MARRIED TO SINGLE // CHANGE IN STATUS BETWEEN CEO'S // ASSUMING 1 = SINGLE, 0 = MARRIED by firm_id (year): gen byte penultimate = (ceo_num == ceo_num[_N] - 1) // IDENTIFY FIRST AND LAST RECORDED MARITAL STATUS FOR EACH CEO gen byte mms = missing(marital_status) by mms firm_id ceo_num (year), sort: gen initial_marital_status = marital_status[1] by mms firm_id ceo_num (year): gen final_marital_status = marital_status[_N] by firm_id ceo_num (mms), sort: replace initial_marital_status = initial_marital_status[1] by firm_id ceo_num (mms): replace final_marital_status = final_marital_status[1] // IDENTIFY FINAL MARITAL STATUS OF PENULTIMATE CEO // AND INITIAL MARITAL STATUS OF FINAL CEO by firm_id (year), sort: egen marital_pre = max(cond(penultimate, final_marital_status, .)) by firm_id (year): egen marital_post = /// max(cond(last_two_ceos & !penultimate, initial_marital_status, .)) // CHARACTERIZE MARITAL STATUS SHIFT FROM PENULTIMATE TO FINAL CEO gen byte single_to_married = (marital_pre == 1 & marital_post == 0) gen byte married_to_single = (marital_pre == 0 & marital_post == 1)
Comment
Savvas Papadopoulos

Join Date: Jun 2019

Posts: 6
#7

14 Jun 2019, 05:29

You are absolutely right Clyde. I have to be more specific about the characteristics of my data set. To start with your first question about gaps, no I don’t have so large gaps that contain the entire tenure of a CEO in my data set. Of course I have gaps because my sample consists of listed firms, and not all firms are listed for all the years in the time period I am examining. About your second group of questions, you guess correctly. I have cases where the marital status of the penultimate CEO and/or the final CEO is not constant, and as you suggest I want to look the last reported marital status of the penultimate CEO and the first reported marital status of the final CEO. And this is what you last code does. Right? In addition to this, I want a code that looks only the CEOs with no constant marital status, but this time for all CEOs and not only for the last two. Given that some CEOs have changes in their marital status more than two times, I want this variable to look only the penultimate and the final marital status change for the CEOs with more than two changes. As before, I want to look changes both from married to single and from single to married.

However, I have two additional variables that are core for my analysis: 1) “married_vs_divorced, and 2) “child_gender”. The first variable is based on a subset of the variable “marital_status”. Specifically, the “marital_status” variable is equal 1 whenever a CEO is married and 0 when a CEO is non-married, where non-married refers to those CEOs who are divorced, never got married or are widowed. For my analysis I also want to look married and divorced CEOs. The difference between the two variables is that the “married_vs_divorced” variable has missing observations. There are three reasons for this: 1) because some CEOs have never got married, 2) some CEOs are widowed, and 3) some CEOs where single and got married during their tenure. Given that I am interested in the last two CEOs, I assume that I can modify your latest code to look the penultimate CEO and the final CEO in a similar way as with the “marital_status” variable. A potential issue, though, could be that for the CEOs who have never got married all observations referring to the penultimate and/or the final CEO are missing. I don’t know if the code must be adjusted to deal with this situation. As above, I also want an additional code to look all CEOs who from married became divorced or vice versa. In cases with more than two changes I want to only look the penultimate and the final change per CEO.

Regarding the second variable, “child_gender”, it indicates the gender of CEOs’ first-born child. As you can see in the example data set, this variable has also some missing observations for two reasons: 1) because there are CEOs who don’t have children, and 2) because the CEO got his/her first child while being a CEO. This variable has exactly the same characteristics with the “married_vs_divorced” variable when it comes to missing observations. That is, all observations referring to the penultimate CEO and/or the final CEO who don’t have children are missing, and some observations are missing simply because the CEO got his/her first child will being CEO. I assume that in order to look only the penultimate and the final CEO I can modify your latest code accordingly. In addition, I want a code that looks all CEOs who got their first child during their tenure.
To close this rather long post, there are few more things that might be good to know about my data set: 1) there are cases where the same CEO appears in two or more different firms during the same or different years, and 2) there are cases where the same CEO appears in the same firm but in different years (e.g., during 2005-2008 and 2012-2013).

Again, thank you very much for your time and help. I really appreciate that.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte firm_id int year long ceo_id byte(marital_status married_vs_divorced child_gender) 1 2005 456123 1 1 0 1 2006 456123 1 1 0 1 2007 456123 0 0 0 1 2008 389264 0 0 . 1 2009 389264 0 0 . 1 2010 389264 1 1 0 1 2011 389264 1 1 0 1 2012 785214 0 0 1 1 2013 785214 0 0 1 2 2005 654827 1 1 . 2 2006 654827 1 1 . 2 2007 142536 0 0 1 2 2008 142536 0 0 1 2 2009 999632 1 1 1 2 2010 999632 1 1 1 2 2011 885712 0 . 0 2 2012 885712 0 . 0 2 2013 885712 0 . 0 3 2005 986325 0 0 0 3 2006 986325 0 0 0 3 2007 986325 0 0 0 3 2008 986325 0 0 0 3 2009 986325 0 0 0 3 2010 335964 1 1 1 3 2011 335964 1 1 1 3 2012 335964 1 1 1 3 2013 335964 1 1 1 4 2005 235841 0 0 1 4 2006 235841 0 0 1 4 2007 235841 0 0 1 4 2008 689324 0 0 0 4 2009 689324 0 0 0 4 2010 995423 0 . . 4 2011 995423 0 . . 4 2012 100235 1 1 1 4 2013 100235 1 1 1 5 2005 689324 0 0 1 5 2006 466738 0 0 1 5 2007 466738 1 1 1 5 2008 466738 0 0 1 5 2009 802134 1 1 0 5 2010 802134 1 1 0 5 2011 802134 0 0 0 5 2012 717123 1 1 1 5 2013 717123 0 0 1 6 2005 232345 1 1 1 6 2006 232345 1 1 1 6 2007 970102 0 0 1 6 2008 970102 0 0 1 6 2009 524361 0 0 0 6 2010 524361 1 0 0 6 2011 524361 1 1 0 7 2009 787675 1 1 . 7 2010 787675 1 1 . 7 2011 919293 0 . 1 7 2012 919293 0 . 1 7 2013 919293 0 . 1 end
Comment
Abdul Ghafoor

Join Date: May 2017

Posts: 15
#8

12 Jun 2021, 01:28

Dear Clyde,

Thank you so much for the above codes. I am following your codes for the transition of marital status (Single=1 and 0 for the married) that you provided. May I request the changes in your codes for the following conditions?
The transition of CEOs from married to single and single to married.

Both pre-and post-transition CEOs are in the data (office) consecutively for at least three years excluding the transition year.

If a firm changes its CEOs more than once, then I wish to count the first change and drop the subsequent changes for that firm.

Example generated by -dataex-. To install: ssc install dataex
clear
input double fyear int ceo_id float(firm_id marital_status)
1994 1369 1004 0
1995 1369 1004 0
1996 4832 1004 0
1997 4832 1004 0
1998 4832 1004 0
1999 4832 1004 0
2000 4832 1004 0
2001 4832 1004 0
2002 4832 1004 0
2003 4832 1004 0
2004 4832 1004 0
2005 4832 1004 0
2006 4832 1004 0
2007 4832 1004 0
1993 663 1013 0
1994 663 1013 0
1995 663 1013 0
1996 663 1013 0
1997 663 1013 0
1998 663 1013 0
1999 663 1013 0
2000 663 1013 0
2001 4242 1013 0
2002 4242 1013 0
2004 4900 1013 0
2005 4900 1013 0
2006 4900 1013 0
2007 4900 1013 0
2008 4900 1013 0
1994 4612 1034 0
1995 4612 1034 0
1996 4612 1034 0
1997 4612 1034 0
1998 4612 1034 0
1999 3551 1034 0
2000 5325 1034 1
2001 5325 1034 1
2002 5325 1034 1
2003 5325 1034 1
2004 5325 1034 1
2005 5325 1034 1
2006 3433 1034 0
2007 3433 1034 0
1993 1012 1045 0
1994 1012 1045 0
1995 1012 1045 0
1996 1012 1045 0
1997 1012 1045 0
1998 749 1045 0
1999 749 1045 0
2000 749 1045 0
2001 749 1045 0
2002 749 1045 0
2003 145 1045 0
2004 145 1045 0
2005 145 1045 0
2006 145 1045 0
2007 145 1045 0
2008 145 1045 0
1993 4039 1055 0
1994 4039 1055 0
1995 4039 1055 0
1996 1218 1055 1
1999 407 1056 0
2000 407 1056 0
2001 407 1056 0
2002 407 1056 0
2003 407 1056 0
2004 407 1056 0
2005 407 1056 0
2006 407 1056 0
2000 4247 1072 0
2001 1789 1072 0
2002 1789 1072 0
2003 1789 1072 0
2004 1789 1072 0
2005 1789 1072 0
2006 1789 1072 0
2007 1789 1072 0
1993 4692 1075 0
1994 4692 1075 0
1995 4692 1075 0
1996 4692 1075 0
1997 4692 1075 0
1998 4692 1075 0
1999 3974 1075 0
2000 3974 1075 0
2001 3974 1075 0
2002 3974 1075 0
2003 3974 1075 0
2004 3974 1075 0
2005 3974 1075 0
2006 3974 1075 0
2007 3974 1075 0
2008 3974 1075 0
1998 3026 1076 0
1999 3026 1076 0
2000 3026 1076 0
2001 3026 1076 0
2002 3026 1076 0
end

Thank you
Comment

Announcement