Tracking changes in dataset with condition

Jarno Stappers

Join Date: Jan 2021

Posts: 15
#1

Tracking changes in dataset with condition

04 Jan 2021, 04:47

Dear reader,

I tracked executives across firms and retained those individual executives who were employed for at least three years in each of at least two different firms. Although it is possible that teams of executives move from firm to firm, this should not happen in exactly the same years. These executives need to be dropped because this contains perfectly collinear executives. To make this more clear I added an example from Stata.

As you can see executives (3018 and 2972) moved from company A (1177) to company B (1500) in exactly the same year (2001). In my dataset, this might also occur for more than two executives at the same time. In case a team exists of 3 executives and 2 of the executives move to the same firm in the same year, then I still need to remove those 2 executives. Since I am interested in individual changes only. I am searching for a code to remove those teams of executives, satisfying the above condition.

----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(GVKEY1 EXECID2) double YEAR

1164 607 1998
1164 607 1999
1164 607 2000
1164 607 2001
1164 18563 2002
1164 18563 2003
1164 18563 2004
1177 3018 1992
1177 3018 1993
1177 3018 1994
1177 3018 1995
1177 2972 1992
1177 2972 1993
1177 2972 1994
1177 2972 1995
1500 3018 2001
1500 3018 2002
1500 3018 2003
1500 3018 2004
1500 2972 2001
1500 2972 2002
1500 2972 2003
1500 2972 2004
Tags: None
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#2

04 Jan 2021, 11:10

I believe what you want is:

Code:

by GVKEY1 YEAR, sort: drop if _N > 1
Comment
Jarno Stappers

Join Date: Jan 2021

Posts: 15
#3

04 Jan 2021, 15:18

Thanks Clyde for your reply!

However, that is not exactly what I meant. Sorry, if I did not specify my question enough.
In the code you provided I drop all GVKEY-YEAR rows if they are more than once in the sample.
This is not a problem because more than one person (EXECIDs) can work for the same company (GVKEY1) in the same year (YEAR).
It is all about working at the same company A and move to another company in the same year.

I am looking to drop persons (EXECID2: 3018 and 2972) that both worked at one company (GVKEY1: 1177) and both moved to the same company (GVKEY1: 1500) and both leave the first company (join the second company) in exactly the same year (YEAR: 2001).

So I have three conditions:

1. Group of persons (this can be two, three or even four EXECIDs) working at the same company A (1177);
2. This group of persons (two, three or even four EXECIDs) move to the same company B (1500); and
3. This group of persons leave company A for company B (1177 -> 1500) in exactly the same year (e.g. 2001)

I have to drop those groups of persons only if these conditions are satisfied.

In the STATA output, I provided an example of two persons (EXECID2) but as I stated in parentheses this could also be more than two persons (EXECIDs).
I am working on a replication study and as we speak I have around 9,000 observations.

Thank you in advance for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#4

04 Jan 2021, 16:10

Then I think this does it:

Code:

frame put execid2 gvkey1 year, into(changes) frame change changes by execid2 (year), sort: gen new_job = (gvkey1 != gvkey1[_n-1]) by execid2 (year): gen prior_job = gvkey1[_n-1] if new_job keep if new_job by gvkey1 year prior_job, sort: keep if _N > 1 keep execid2 duplicates drop frame change default frlink m:1 execid2, frame(changes) drop if !missing(changes) frame drop changes

Note: this code uses frames, so it requires version 16.0 of later. If running an earlier version of Stata, you are supposed to disclose that fact in your post. In that case, this same code can be emulated using a tempfile instead of a frame, and -merge, keep(master)- instead of -frlink- followed by -drop if !missing(...-.
Comment
Jarno Stappers

Join Date: Jan 2021

Posts: 15
#5

05 Jan 2021, 03:27

Clyde thank you for your valuable input!

The code you provided indeed tracked people that indeed moved to the same company in the same year.
However, STATA also removed some people that should not be removed. Therefore I will add a part of the dataset before step: drop if !missing(changes).

As you can see, person X (EXECID2: 14636) move to another company (GVKEY1: 23877) in 1998.
The problem I face is that no one else in my dataset also moves to company (GVKEY1: 23877). Therefore this is in my case a valid (individual) change and should not be removed.
I also checked if person (EXECID2: 14636) worked for more companies than the two provided in the dataset below (GVKEY1: 10903 and 23877), but that is not the case.
So, I do not understand why this person is removed. Stated differently, why STATA also gives a number (changes: 335) to someone who only changed individually to another company. Instead of:

1. Group of persons (this can be two, three or even four EXECIDs) working at the same company A (1177);
2. This group of persons (two, three or even four EXECIDs) move to the same company B (1500); and
3. This group of persons leave company A for company B (1177 -> 1500) in exactly the same year (e.g. 2001)

It would be really nice if you could help me to fix this!

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(GVKEY1 EXECID2) double YEAR int changes 10903 11242 1995 280 10903 11242 1996 280 10903 11242 1997 280 10903 14636 1995 335 10903 14636 1996 335 10903 14636 1997 335 23877 14636 1998 335 23877 14636 1999 335 23877 14636 2000 335 23877 14636 2001 335 23877 14636 2002 335 23877 14636 2003 335 23877 14636 2004 335 23877 14636 2005 335 23877 14636 2006 335 10903 11242 1995 280 10903 11242 1996 280 10903 11242 1997 280 145367 11242 2003 280 145367 11242 2004 280 145367 11242 2005 280 145367 11242 2006 280 145367 11242 2003 280 145367 11242 2004 280 145367 11242 2005 280 145367 11242 2006 280 end

------------------ copy up to and including the previous line ------------------
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30089

05 Jan 2021, 10:16

I have changed the code to try to make the method a bit more transparent (and perhaps more accurate). In addition, after the code finishes running, frame working is left behind and contains the actual information on the movements of the executives involved so you can clearly see why they were flagged for dropping.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(gvkey1 execid2) double year
 10903 11242 1995
 10903 11242 1995
 10903 11242 1996
 10903 11242 1996
 10903 11242 1997
 10903 11242 1997
145367 11242 2003
145367 11242 2003
145367 11242 2004
145367 11242 2004
145367 11242 2005
145367 11242 2005
145367 11242 2006
145367 11242 2006
 10903 14636 1995
 10903 14636 1996
 10903 14636 1997
 23877 14636 1998
 23877 14636 1999
 23877 14636 2000
 23877 14636 2001
 23877 14636 2002
 23877 14636 2003
 23877 14636 2004
 23877 14636 2005
 23877 14636 2006
  1164   607 1998
  1164   607 1999
  1164   607 2000
  1164   607 2001
  1164 18563 2002
  1164 18563 2003
  1164 18563 2004
  1177  3018 1992
  1177  3018 1993
  1177  3018 1994
  1177  3018 1995
  1177  2972 1992
  1177  2972 1993
  1177  2972 1994
  1177  2972 1995
  1500  3018 2001
  1500  3018 2002
  1500  3018 2003
  1500  3018 2004
  1500  2972 2001
  1500  2972 2002
  1500  2972 2003
  1500  2972 2004
end

//  CREATE SPELLS OF EMPLOYMENT FOR EACH EXECUTIVE
frame put gvkey1 execid2 year, into(working)
frame change working
by execid2 (year), sort: gen spell = sum(gvkey1 != gvkey1[_n-1])

//  CHARACTERIZE EACH SPELL BY ITS START, FINISH, AND ANTECEDENT EMPLOYER
by execid2 spell (year), sort: gen start = year[1]
by execid2 spell (year): gen finish = year[_N]
by execid2 (spell year): gen previous = gvkey[_n-1] if spell != spell[_n-1]
by execid2 spell (year): keep if _n == 1
drop spell year

//  NOW IDENTIFY SPELLS THAT HAVE THE SAME EMPLOYER, START, FINISH,
//  AND ANTECEDENT EMPLOYER FOR MORE THAN ONE EXEC
by gvkey1 start finish previous, sort: keep if _N > 1

//  RETAIN THE INFORMATION ON PARALLEL TRANSFERS IN FRAME WORKING
//  AND CREATE A SEPARATE FRAME FOR JUST THE EXECID'S INVOLVED
frame put execid2, into(execs_to_drop)
frame execs_to_drop: duplicates drop

//  BACK TO THE ORIGINAL DATA
frame change default
frlink m:1 execid2, frame(execs_to_drop)
drop if !missing(execs_to_drop)

The -dataex- here combines both your most recent data and the earlier one.

You will see that execs 14636 and 11242 are correctly identified for dropping because they arrive together at gvkey 10903 in 1995 and leave together in 1997. Before 1995 they were both unemployed, but they came and left 10903 as a pair.

Comment

Jarno Stappers

Join Date: Jan 2021

Posts: 15
#7

06 Jan 2021, 03:48

Thank you again, Clyde!

I checked the code and one condition is not satisfied, because I did not explain enough.
I will cite your last sentence to make sure what I need: "You will see that execs 14636 and 11242 are correctly identified for dropping because they arrive together at gvkey 10903 in 1995 and leave together in 1997. Before 1995 they were both unemployed, but they came and left 10903 as a pair."

- It is about: "but they came and left 10903 as a pair". They indeed came and left 10903 as a pair but they also should leave as a pair to exactly the same company.

I will illustrate what you did:
Two executives: Peter and Dirk
Those executives work at Apple (GVKEY1) in period 1994 until 1997 (or whatever period)
Those executives leave as a pair in the SAME YEAR (e.g. 1997) but Peter moves to work for Samsung and Dirk becomes an executive of Nokia (different companies).

I will illustrate what I need:
Two executives: Peter and Dirk
Those executives work at Apple (GVKEY1) in period 1994 until 1997 (or whatever period)
Those executives leave as a pair in the SAME YEAR (e.g. 1997) and Peter and Dirk both join Samsung (the same company) .

Hopefully, you want to help me for one last time!

Last edited by Jarno Stappers; 06 Jan 2021, 03:52.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30089

06 Jan 2021, 10:01

OK, that's just a small change to the code:

Code:

//  CREATE SPELLS OF EMPLOYMENT FOR EACH EXECUTIVE
frame put gvkey1 execid2 year, into(working)
frame change working
by execid2 (year), sort: gen spell = sum(gvkey1 != gvkey1[_n-1])

//  CHARACTERIZE EACH SPELL BY ITS START, FINISH, AND ANTECEDENT
//  AND SUBSEQUENT EMPLOYERS
by execid2 spell (year), sort: gen start = year[1]
by execid2 spell (year): gen finish = year[_N]
by execid2 (spell year): gen previous = gvkey[_n-1] if spell != spell[_n-1]
by execid2 (spell year): gen subsequent = gvkey[_n+1] if spell != spell[_n+1]
by execid2 spell (year): replace subsequent = subsequent[_N]
by execid2 spell (year): keep if _n == 1
drop spell year

//  NOW IDENTIFY SPELLS THAT HAVE THE SAME EMPLOYER, START, FINISH,
//  AND ANTECEDENT EMPLOYER FOR MORE THAN ONE EXEC
by gvkey1 start finish previous subsequent, sort: keep if _N > 1

//  RETAIN THE INFORMATION ON PARALLEL TRANSFERS IN FRAME WORKING
//  AND CREATE A SEPARATE FRAME FOR JUST THE EXECID'S INVOLVED
frame put execid2, into(execs_to_drop)
frame execs_to_drop: duplicates drop

//  BACK TO THE ORIGINAL DATA
frame change default
frlink m:1 execid2, frame(execs_to_drop)
drop if !missing(execs_to_drop)

Comment

Jarno Stappers

Join Date: Jan 2021

Posts: 15
#9

07 Jan 2021, 04:08

Thanks again Clyde for your reply!

The code is again more accurate! However, there are still some problems arising.
STATA still removes executives that I qualify as individual changes.

Instead of showing what is going wrong in my dataset, it is probably better if I edit the STATA output a little bit to show you which executives I would like to keep and which ones I would like to remove.
I would shortly summarize the data!

Remove those executives.

Condition 1: Both worked together in Company A (condition satisfied)
Company A (1004) -> Executive 1 (33979) -> Period: 2007 until 2012
Company A (1004) -> Executive 2 (46404) -> Period: 2009 until 2012

Condition 2: Both leave the company in exactly the same year (condition satisfied)
Change year = 2013 (in this year both executives join the second company: 1100)

Condition 3: Both are going to work for the same second company (condition satisfied)
Company B (1100) -> Executive (33979) -> Period: 2013 until 2018
Company B (1100) -> Executive (46404) -> Period: start 2013 until 2016

Since all the conditions combined are satisfied these executives can be removed from our data set.

Do not remove those executives

Condition 1: Both worked together in Company A (condition satisfied)
Company A (1056) -> Executive 1 (1891) -> period 2006 until 2008
Company A (1056) -> Executive 2 (12755) - > period 2008 until 2010
Company A (1056) -> Executive 3 (13000) -> period 2008 until 2011

Condition 2: Both leave the company in exactly the same year (condition not satisfied)
Change year = different for all the executives because:
Executive 1: does not join company (1080) and therefore irrelevant
Executive 2: joins second company (1080) in 2011
Executive 3: joins second company (1080) in 2012

Condition 3: Both are going to work for the same second company (condition satisfied)
Company B (1080) -> Executive (12755) -> period 2011 until 2013
Company B (1080) -> Executive (13000) -> period 2012 until 2014

Since not all the conditions combined are satisfied do not remove those executives from our data set.
Thus, I need to keep these observations. These changes are identified as individual changes.

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(GVKEY1 EXECID2) double YEAR 1004 33979 2007 1004 33979 2008 1004 33979 2009 1004 33979 2010 1004 33979 2011 1004 33979 2012 1004 46404 2009 1004 46404 2010 1004 46404 2011 1045 46192 2012 1056 1891 2006 1056 1891 2007 1056 1891 2008 1056 12755 2008 1056 12755 2009 1056 12755 2010 1056 13000 2008 1056 13000 2009 1056 13000 2010 1056 13000 2011 1080 12755 2011 1080 12755 2012 1080 12755 2013 1080 13000 2012 1080 13000 2013 1080 13000 2014 1100 33979 2013 1100 33979 2014 1100 33979 2015 1100 33979 2016 1100 33979 2017 1100 33979 2018 1100 46404 2013 1100 46404 2014 1100 46404 2015 1100 46192 2016 1111 22978 2006 1111 22978 2007 1111 22978 2008 1111 32091 2006 1111 32091 2007 1111 32091 2008 1161 37980 2009 1161 37980 2010 1161 37980 2011 1161 37980 2012 1161 40775 2010 1161 40775 2011 1161 40775 2012 1161 42390 2011 1161 42390 2012 1161 42390 2013 1161 42390 2014 27119 46404 2014 27119 46404 2015 27119 46404 2016 27119 46404 2017 27119 46404 2018 27119 46404 2019 181104 33979 2013 181104 33979 2014 181104 33979 2015 181104 33979 2015 181104 33979 2016 181104 33979 2017 181104 33979 2018 181104 33979 2019 end

------------------ copy up to and including the previous line ------------------
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#10

07 Jan 2021, 14:15

I don't understand what you are trying to do here.

As for 13000 and 12755: Both arrive together in 2008 at gvkey 1056, having previously both been unemployed. That they leave at different times is, you say, irrelevant. So they were a team at 1056.

33979 and 46404 both arrive at 1100 in 2013, having previously worked at 1004--so they are a team at 1100.

I don't understand why one of these is in and the other is out. I don't see the difference between them.

Last edited by Clyde Schechter; 07 Jan 2021, 14:59.
Comment
Jarno Stappers

Join Date: Jan 2021

Posts: 15
#11

07 Jan 2021, 14:50

"As it stands, the code does not remove 1899, 12755, or 13000."
Yes, indeed exactly as I want those executives are not removed.

"33979 is reported in 2013 as working for both1100 and 18104. Which is it? My code saw this as a transition from 1100 to 18104 in 2013. Bit 46404 transitioned from 1100 to 27119--a different company and in a different year. They are not moving as a team. They are going at separate times to separate places".

This person is working for both companies at the same time. Probably it a problem that the code identifies people with multiple positions at the same time as a "change in company" when it actually not the case. It is possible for executives to have more than one board position at one moment in time. For example, Elon Musk is CEO of Tesla and Space X.

Is there any code to keep those observations (with multiple positions at the same) in the data set and also use your code?
The reason is that if I remove duplicates (people with more than one position at the same time) I will lose a lot of observations. Stated differently, I will lose power of tests which is not my preference.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#12

07 Jan 2021, 15:03

Your response to #11 crossed with my re-edit of #10. During that time, I realized that the problem was that these execs were holding multiple positions simultaneously. So I modified the code to allow for that. But then it introduces new problems. Please read the changed version of #10 and respond to that, because I don't see how to have it both ways.
Comment
Jarno Stappers

Join Date: Jan 2021

Posts: 15
#13

07 Jan 2021, 15:39

My response to #10.

"As for 13000 and 12755: Both arrive together in 2008 at gvkey 1056, having previously both been unemployed."
They worked together in company 1056 and therefore they are a team. If person A (13000) starts working for the first company (1056) in 2006, and person B (12755) starts working in 2007. Then this is still considered as a team.

"That they leave at different times is, you say, irrelevant."
This is not what I meant. I want to remove teams of executives that leave this first company (1056) and join the second company (1080) in exactly the same year. Executive (12755) start working for company (1080) in 2010. Executive (13000) start working for company (1080) in 2011. Therefore I identify those changes as not perfectly collinear and thus I qualify them as individual changes. Even though I understand that this might be a little bit strange because they are working as a team (for two companies).

What is irrelevant: 'Executive 1: does not join company (1080) and therefore irrelevant'.
This "Executive 1" is irrelevant because he worked as a team together with "Executive 2" and "Executive 3" in his first company. However, this person does not move to company (1080), so he is not part of the team in the second company (and therefore irrelevant).

"33979 and 46404 both arrive at 1100 in 2013, having previously worked at 1004--so they are a team at 1100".
The difference between why 33070 and 46404 are in; and 13000 and 12755 are out is:

33979 and 46404 -> start working together for the second company exactly in the same year (2013)
13000 and 12755 --> do not start working in exactly the same year because 12755 starts working in 2010 for company 1080. Executive 13000 starts working in 2011 for company 1080. Thus not perfectly collinear and therefore I keep them in my data set.

Summarizing:
- Working together is qualified as a team (independent of which years they worked together)
- But if a team of executives moves to their second company then this MUST be in the same YEAR. Otherwise, I qualify them as individuals.

Hopefully, this gives you a better understanding of my problem.
If there are any specification questions. Please let me know, I try to respond as quickly as possible.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#14

07 Jan 2021, 16:21

I want to remove teams of executives that leave this first company (1056) and join the second company (1080) in exactly the same year. Executive (12755) start working for company (1080) in 2010. Executive (13000) start working for company (1080) in 2011.

Yes, but at firm 1056, they begin together in 2008, and both arrived there from the same preceding place, namely unemployment. So they are a team there and should be removed. Do you want to remove only their observations for firm 1056 and keep the ones for firm 1080?
Comment
Jarno Stappers

Join Date: Jan 2021

Posts: 15
#15

07 Jan 2021, 16:38

They were unemployed and therefore their first job is company 1056. The second company they join as a team is company 1080. But I do not want to remove them, because they join the second company in different years (Executive 1 "2010" l Executive 2 "2011). I only want to remove teams of executives if they join the second company in the exact same year (e.g. 33979 and 46404 in "2013")
Otherwise, I want to keep the team of executives. So, I do not want to remove any observations regarding 1056 and 1080.
Comment

Announcement