Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tracking changes in dataset with condition

    Dear reader,

    I tracked executives across firms and retained those individual executives who were employed for at least three years in each of at least two different firms. Although it is possible that teams of executives move from firm to firm, this should not happen in exactly the same years. These executives need to be dropped because this contains perfectly collinear executives. To make this more clear I added an example from Stata.

    As you can see executives (3018 and 2972) moved from company A (1177) to company B (1500) in exactly the same year (2001). In my dataset, this might also occur for more than two executives at the same time. In case a team exists of 3 executives and 2 of the executives move to the same firm in the same year, then I still need to remove those 2 executives. Since I am interested in individual changes only. I am searching for a code to remove those teams of executives, satisfying the above condition.


    ----------------------- copy starting from the next line -----------------------
    [CODE]
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(GVKEY1 EXECID2) double YEAR

    1164 607 1998
    1164 607 1999
    1164 607 2000
    1164 607 2001
    1164 18563 2002
    1164 18563 2003
    1164 18563 2004
    1177 3018 1992
    1177 3018 1993
    1177 3018 1994
    1177 3018 1995
    1177 2972 1992
    1177 2972 1993
    1177 2972 1994
    1177 2972 1995
    1500 3018 2001
    1500 3018 2002
    1500 3018 2003
    1500 3018 2004
    1500 2972 2001
    1500 2972 2002
    1500 2972 2003
    1500 2972 2004
    Tags: None

  • #2
    I believe what you want is:
    Code:
    by GVKEY1 YEAR, sort: drop if _N > 1

    Comment


    • #3
      Thanks Clyde for your reply!

      However, that is not exactly what I meant. Sorry, if I did not specify my question enough.
      In the code you provided I drop all GVKEY-YEAR rows if they are more than once in the sample.
      This is not a problem because more than one person (EXECIDs) can work for the same company (GVKEY1) in the same year (YEAR).
      It is all about working at the same company A and move to another company in the same year.

      I am looking to drop persons (EXECID2: 3018 and 2972) that both worked at one company (GVKEY1: 1177) and both moved to the same company (GVKEY1: 1500) and both leave the first company (join the second company) in exactly the same year (YEAR: 2001).

      So I have three conditions:

      1. Group of persons (this can be two, three or even four EXECIDs) working at the same company A (1177);
      2. This group of persons (two, three or even four EXECIDs) move to the same company B (1500); and
      3. This group of persons leave company A for company B (1177 -> 1500) in exactly the same year (e.g. 2001)

      I have to drop those groups of persons only if these conditions are satisfied.

      In the STATA output, I provided an example of two persons (EXECID2) but as I stated in parentheses this could also be more than two persons (EXECIDs).
      I am working on a replication study and as we speak I have around 9,000 observations.

      Thank you in advance for your help!


      Comment


      • #4
        Then I think this does it:

        Code:
        frame put execid2 gvkey1 year, into(changes)
        frame change changes
        by execid2 (year), sort: gen new_job = (gvkey1 != gvkey1[_n-1])
        by execid2 (year): gen prior_job = gvkey1[_n-1] if new_job
        keep if new_job
        by gvkey1 year prior_job, sort: keep if _N > 1
        keep execid2
        duplicates drop
        frame change default
        frlink m:1 execid2, frame(changes)
        drop if !missing(changes)
        frame drop changes
        Note: this code uses frames, so it requires version 16.0 of later. If running an earlier version of Stata, you are supposed to disclose that fact in your post. In that case, this same code can be emulated using a tempfile instead of a frame, and -merge, keep(master)- instead of -frlink- followed by -drop if !missing(...-.

        Comment


        • #5
          Clyde thank you for your valuable input!

          The code you provided indeed tracked people that indeed moved to the same company in the same year.
          However, STATA also removed some people that should not be removed. Therefore I will add a part of the dataset before step: drop if !missing(changes).

          As you can see, person X (EXECID2: 14636) move to another company (GVKEY1: 23877) in 1998.
          The problem I face is that no one else in my dataset also moves to company (GVKEY1: 23877). Therefore this is in my case a valid (individual) change and should not be removed.
          I also checked if person (EXECID2: 14636) worked for more companies than the two provided in the dataset below (GVKEY1: 10903 and 23877), but that is not the case.
          So, I do not understand why this person is removed. Stated differently, why STATA also gives a number (changes: 335) to someone who only changed individually to another company. Instead of:

          1. Group of persons (this can be two, three or even four EXECIDs) working at the same company A (1177);
          2. This group of persons (two, three or even four EXECIDs) move to the same company B (1500); and
          3. This group of persons leave company A for company B (1177 -> 1500) in exactly the same year (e.g. 2001)

          It would be really nice if you could help me to fix this!




          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long(GVKEY1 EXECID2) double YEAR int changes
          10903 11242 1995 280
          10903 11242 1996 280
          10903 11242 1997 280
          10903 14636 1995 335
          10903 14636 1996 335
          10903 14636 1997 335
          23877 14636 1998 335
          23877 14636 1999 335
          23877 14636 2000 335
          23877 14636 2001 335
          23877 14636 2002 335
          23877 14636 2003 335
          23877 14636 2004 335
          23877 14636 2005 335
          23877 14636 2006 335
          10903 11242 1995 280
          10903 11242 1996 280
          10903 11242 1997 280
          145367 11242 2003 280
          145367 11242 2004 280
          145367 11242 2005 280
          145367 11242 2006 280
          145367 11242 2003 280
          145367 11242 2004 280
          145367 11242 2005 280
          145367 11242 2006 280
          
          end
          ------------------ copy up to and including the previous line ------------------








          Comment


          • #6
            I have changed the code to try to make the method a bit more transparent (and perhaps more accurate). In addition, after the code finishes running, frame working is left behind and contains the actual information on the movements of the executives involved so you can clearly see why they were flagged for dropping.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input long(gvkey1 execid2) double year
             10903 11242 1995
             10903 11242 1995
             10903 11242 1996
             10903 11242 1996
             10903 11242 1997
             10903 11242 1997
            145367 11242 2003
            145367 11242 2003
            145367 11242 2004
            145367 11242 2004
            145367 11242 2005
            145367 11242 2005
            145367 11242 2006
            145367 11242 2006
             10903 14636 1995
             10903 14636 1996
             10903 14636 1997
             23877 14636 1998
             23877 14636 1999
             23877 14636 2000
             23877 14636 2001
             23877 14636 2002
             23877 14636 2003
             23877 14636 2004
             23877 14636 2005
             23877 14636 2006
              1164   607 1998
              1164   607 1999
              1164   607 2000
              1164   607 2001
              1164 18563 2002
              1164 18563 2003
              1164 18563 2004
              1177  3018 1992
              1177  3018 1993
              1177  3018 1994
              1177  3018 1995
              1177  2972 1992
              1177  2972 1993
              1177  2972 1994
              1177  2972 1995
              1500  3018 2001
              1500  3018 2002
              1500  3018 2003
              1500  3018 2004
              1500  2972 2001
              1500  2972 2002
              1500  2972 2003
              1500  2972 2004
            end
            
            //  CREATE SPELLS OF EMPLOYMENT FOR EACH EXECUTIVE
            frame put gvkey1 execid2 year, into(working)
            frame change working
            by execid2 (year), sort: gen spell = sum(gvkey1 != gvkey1[_n-1])
            
            //  CHARACTERIZE EACH SPELL BY ITS START, FINISH, AND ANTECEDENT EMPLOYER
            by execid2 spell (year), sort: gen start = year[1]
            by execid2 spell (year): gen finish = year[_N]
            by execid2 (spell year): gen previous = gvkey[_n-1] if spell != spell[_n-1]
            by execid2 spell (year): keep if _n == 1
            drop spell year
            
            //  NOW IDENTIFY SPELLS THAT HAVE THE SAME EMPLOYER, START, FINISH,
            //  AND ANTECEDENT EMPLOYER FOR MORE THAN ONE EXEC
            by gvkey1 start finish previous, sort: keep if _N > 1
            
            //  RETAIN THE INFORMATION ON PARALLEL TRANSFERS IN FRAME WORKING
            //  AND CREATE A SEPARATE FRAME FOR JUST THE EXECID'S INVOLVED
            frame put execid2, into(execs_to_drop)
            frame execs_to_drop: duplicates drop
            
            //  BACK TO THE ORIGINAL DATA
            frame change default
            frlink m:1 execid2, frame(execs_to_drop)
            drop if !missing(execs_to_drop)
            The -dataex- here combines both your most recent data and the earlier one.

            You will see that execs 14636 and 11242 are correctly identified for dropping because they arrive together at gvkey 10903 in 1995 and leave together in 1997. Before 1995 they were both unemployed, but they came and left 10903 as a pair.

            Comment


            • #7
              Thank you again, Clyde!

              I checked the code and one condition is not satisfied, because I did not explain enough.
              I will cite your last sentence to make sure what I need: "You will see that execs 14636 and 11242 are correctly identified for dropping because they arrive together at gvkey 10903 in 1995 and leave together in 1997. Before 1995 they were both unemployed, but they came and left 10903 as a pair."

              - It is about: "but they came and left 10903 as a pair". They indeed came and left 10903 as a pair but they also should leave as a pair to exactly the same company.

              I will illustrate what you did:
              Two executives: Peter and Dirk
              Those executives work at Apple (GVKEY1) in period 1994 until 1997 (or whatever period)
              Those executives leave as a pair in the SAME YEAR (e.g. 1997) but Peter moves to work for Samsung and Dirk becomes an executive of Nokia (different companies).

              I will illustrate what I need:
              Two executives: Peter and Dirk
              Those executives work at Apple (GVKEY1) in period 1994 until 1997 (or whatever period)
              Those executives leave as a pair in the SAME YEAR (e.g. 1997) and Peter and Dirk both join Samsung (the same company) .

              Hopefully, you want to help me for one last time!
              Last edited by Jarno Stappers; 06 Jan 2021, 03:52.

              Comment


              • #8
                OK, that's just a small change to the code:

                Code:
                //  CREATE SPELLS OF EMPLOYMENT FOR EACH EXECUTIVE
                frame put gvkey1 execid2 year, into(working)
                frame change working
                by execid2 (year), sort: gen spell = sum(gvkey1 != gvkey1[_n-1])
                
                //  CHARACTERIZE EACH SPELL BY ITS START, FINISH, AND ANTECEDENT
                //  AND SUBSEQUENT EMPLOYERS
                by execid2 spell (year), sort: gen start = year[1]
                by execid2 spell (year): gen finish = year[_N]
                by execid2 (spell year): gen previous = gvkey[_n-1] if spell != spell[_n-1]
                by execid2 (spell year): gen subsequent = gvkey[_n+1] if spell != spell[_n+1]
                by execid2 spell (year): replace subsequent = subsequent[_N]
                by execid2 spell (year): keep if _n == 1
                drop spell year
                
                //  NOW IDENTIFY SPELLS THAT HAVE THE SAME EMPLOYER, START, FINISH,
                //  AND ANTECEDENT EMPLOYER FOR MORE THAN ONE EXEC
                by gvkey1 start finish previous subsequent, sort: keep if _N > 1
                
                //  RETAIN THE INFORMATION ON PARALLEL TRANSFERS IN FRAME WORKING
                //  AND CREATE A SEPARATE FRAME FOR JUST THE EXECID'S INVOLVED
                frame put execid2, into(execs_to_drop)
                frame execs_to_drop: duplicates drop
                
                //  BACK TO THE ORIGINAL DATA
                frame change default
                frlink m:1 execid2, frame(execs_to_drop)
                drop if !missing(execs_to_drop)

                Comment


                • #9
                  Thanks again Clyde for your reply!

                  The code is again more accurate! However, there are still some problems arising.
                  STATA still removes executives that I qualify as individual changes.

                  Instead of showing what is going wrong in my dataset, it is probably better if I edit the STATA output a little bit to show you which executives I would like to keep and which ones I would like to remove.
                  I would shortly summarize the data!


                  Remove those executives.

                  Condition 1: Both worked together in Company A (condition satisfied)
                  Company A (1004) -> Executive 1 (33979) -> Period: 2007 until 2012
                  Company A (1004) -> Executive 2 (46404) -> Period: 2009 until 2012

                  Condition 2: Both leave the company in exactly the same year (condition satisfied)
                  Change year = 2013 (in this year both executives join the second company: 1100)

                  Condition 3: Both are going to work for the same second company (condition satisfied)
                  Company B (1100) -> Executive (33979) -> Period: 2013 until 2018
                  Company B (1100) -> Executive (46404) -> Period: start 2013 until 2016

                  Since all the conditions combined are satisfied these executives can be removed from our data set.


                  Do not remove those executives

                  Condition 1: Both worked together in Company A (condition satisfied)
                  Company A (1056) -> Executive 1 (1891) -> period 2006 until 2008
                  Company A (1056) -> Executive 2 (12755) - > period 2008 until 2010
                  Company A (1056) -> Executive 3 (13000) -> period 2008 until 2011

                  Condition 2: Both leave the company in exactly the same year (condition not satisfied)
                  Change year = different for all the executives because:
                  Executive 1: does not join company (1080) and therefore irrelevant
                  Executive 2: joins second company (1080) in 2011
                  Executive 3: joins second company (1080) in 2012

                  Condition 3: Both are going to work for the same second company (condition satisfied)
                  Company B (1080) -> Executive (12755) -> period 2011 until 2013
                  Company B (1080) -> Executive (13000) -> period 2012 until 2014

                  Since not all the conditions combined are satisfied do not remove those executives from our data set.
                  Thus, I need to keep these observations. These changes are identified as individual changes.







                  ----------------------- copy starting from the next line -----------------------
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input long(GVKEY1 EXECID2) double YEAR
                  1004 33979 2007
                  1004 33979 2008
                  1004 33979 2009
                  1004 33979 2010
                  1004 33979 2011
                  1004 33979 2012
                  1004 46404 2009
                  1004 46404 2010
                  1004 46404 2011
                  1045 46192 2012
                  1056 1891 2006
                  1056 1891 2007
                  1056 1891 2008
                  1056 12755 2008
                  1056 12755 2009
                  1056 12755 2010
                  1056 13000 2008
                  1056 13000 2009
                  1056 13000 2010
                  1056 13000 2011
                  1080 12755 2011
                  1080 12755 2012
                  1080 12755 2013
                  1080 13000 2012
                  1080 13000 2013
                  1080 13000 2014
                  1100 33979 2013
                  1100 33979 2014
                  1100 33979 2015
                  1100 33979 2016
                  1100 33979 2017
                  1100 33979 2018
                  1100 46404 2013
                  1100 46404 2014
                  1100 46404 2015
                  1100 46192 2016
                  1111 22978 2006
                  1111 22978 2007
                  1111 22978 2008
                  1111 32091 2006
                  1111 32091 2007
                  1111 32091 2008
                  1161 37980 2009
                  1161 37980 2010
                  1161 37980 2011
                  1161 37980 2012
                  1161 40775 2010
                  1161 40775 2011
                  1161 40775 2012
                  1161 42390 2011
                  1161 42390 2012
                  1161 42390 2013
                  1161 42390 2014   
                  27119 46404 2014
                  27119 46404 2015
                  27119 46404 2016 
                  27119 46404 2017    
                  27119 46404 2018  
                  27119 46404 2019   
                  181104 33979 2013       
                  181104 33979 2014   
                  181104 33979 2015 
                  181104 33979 2015
                  181104 33979 2016
                  181104 33979 2017     
                  181104 33979 2018       
                  181104 33979 2019
                  
                  end
                  ------------------ copy up to and including the previous line ------------------

                  Comment


                  • #10
                    I don't understand what you are trying to do here.

                    As for 13000 and 12755: Both arrive together in 2008 at gvkey 1056, having previously both been unemployed. That they leave at different times is, you say, irrelevant. So they were a team at 1056.

                    33979 and 46404 both arrive at 1100 in 2013, having previously worked at 1004--so they are a team at 1100.

                    I don't understand why one of these is in and the other is out. I don't see the difference between them.
                    Last edited by Clyde Schechter; 07 Jan 2021, 14:59.

                    Comment


                    • #11
                      "As it stands, the code does not remove 1899, 12755, or 13000."
                      Yes, indeed exactly as I want those executives are not removed.

                      "33979 is reported in 2013 as working for both1100 and 18104. Which is it? My code saw this as a transition from 1100 to 18104 in 2013. Bit 46404 transitioned from 1100 to 27119--a different company and in a different year. They are not moving as a team. They are going at separate times to separate places".

                      This person is working for both companies at the same time. Probably it a problem that the code identifies people with multiple positions at the same time as a "change in company" when it actually not the case. It is possible for executives to have more than one board position at one moment in time. For example, Elon Musk is CEO of Tesla and Space X.

                      Is there any code to keep those observations (with multiple positions at the same) in the data set and also use your code?
                      The reason is that if I remove duplicates (people with more than one position at the same time) I will lose a lot of observations. Stated differently, I will lose power of tests which is not my preference.

                      Comment


                      • #12
                        Your response to #11 crossed with my re-edit of #10. During that time, I realized that the problem was that these execs were holding multiple positions simultaneously. So I modified the code to allow for that. But then it introduces new problems. Please read the changed version of #10 and respond to that, because I don't see how to have it both ways.

                        Comment


                        • #13
                          My response to #10.

                          "As for 13000 and 12755: Both arrive together in 2008 at gvkey 1056, having previously both been unemployed."
                          They worked together in company 1056 and therefore they are a team. If person A (13000) starts working for the first company (1056) in 2006, and person B (12755) starts working in 2007. Then this is still considered as a team.

                          "That they leave at different times is, you say, irrelevant."
                          This is not what I meant. I want to remove teams of executives that leave this first company (1056) and join the second company (1080) in exactly the same year. Executive (12755) start working for company (1080) in 2010. Executive (13000) start working for company (1080) in 2011. Therefore I identify those changes as not perfectly collinear and thus I qualify them as individual changes. Even though I understand that this might be a little bit strange because they are working as a team (for two companies).

                          What is irrelevant: 'Executive 1: does not join company (1080) and therefore irrelevant'.
                          This "Executive 1" is irrelevant because he worked as a team together with "Executive 2" and "Executive 3" in his first company. However, this person does not move to company (1080), so he is not part of the team in the second company (and therefore irrelevant).

                          "33979 and 46404 both arrive at 1100 in 2013, having previously worked at 1004--so they are a team at 1100".
                          The difference between why 33070 and 46404 are in; and 13000 and 12755 are out is:

                          33979 and 46404 -> start working together for the second company exactly in the same year (2013)
                          13000 and 12755 --> do not start working in exactly the same year because 12755 starts working in 2010 for company 1080. Executive 13000 starts working in 2011 for company 1080. Thus not perfectly collinear and therefore I keep them in my data set.

                          Summarizing:
                          - Working together is qualified as a team (independent of which years they worked together)
                          - But if a team of executives moves to their second company then this MUST be in the same YEAR. Otherwise, I qualify them as individuals.

                          Hopefully, this gives you a better understanding of my problem.
                          If there are any specification questions. Please let me know, I try to respond as quickly as possible.

                          Comment


                          • #14
                            I want to remove teams of executives that leave this first company (1056) and join the second company (1080) in exactly the same year. Executive (12755) start working for company (1080) in 2010. Executive (13000) start working for company (1080) in 2011.
                            Yes, but at firm 1056, they begin together in 2008, and both arrived there from the same preceding place, namely unemployment. So they are a team there and should be removed. Do you want to remove only their observations for firm 1056 and keep the ones for firm 1080?

                            Comment


                            • #15
                              They were unemployed and therefore their first job is company 1056. The second company they join as a team is company 1080. But I do not want to remove them, because they join the second company in different years (Executive 1 "2010" l Executive 2 "2011). I only want to remove teams of executives if they join the second company in the exact same year (e.g. 33979 and 46404 in "2013")
                              Otherwise, I want to keep the team of executives. So, I do not want to remove any observations regarding 1056 and 1080.

                              Comment

                              Working...
                              X