Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with creating a complicated dummy variable

    Hello,

    I need to create a dummy variable "Interlock" that takes the value of 1 whenever one director (directorid) serves on the board of 2 (or more) firms (companyid) at the same point in time- that being in a "common year".
    In the data, I have the starting year (year_service_began) and the ending year (year_service_ends) for when the director served on the firm's board. It is enough that the director served on the board of 2 firms during one common year in the time period (start until end) to be considered "Ïnterlocked". So for a certain director in a certain firm, the time period (i.e. range) of his/her service would be the earliest year he/she started (minimum value in Year_Service_Began) until the last year he/she served (maximum value in Year Service Ends). That director would be considered interlocked if he/she was a director of another firm as well for at least one common year. I also have the variable "myear" which is the year of the data (I have financial variables which differ per firm per year, but I have not included them here for simplicity purposes). For that reason, I cannot aggregate the multiple observations per director and firm into one observation. I need to have one for each year as per the below example from my dataset.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long companyid int(directorid year_service_began year_service_ends myear)
    529825  1 1983 2011 2007
    529825  1 1983 2010 2008
    529825  1 1983 2011 2009
    529825  1 1983 2013 2010
    529825  1 1983 2013 2011
    529825  1 1983 2013 2012
    106645  2 1985 2009 2007
    106645  2 1985 2011 2008
    106645  2 1985 2012 2009
    106645  2 1985 2011 2010
    522777  3 2000 2009 2007
    522777  3 2000 2011 2008
     10610  4 1988 2011 2007
     10610  4 1988 2010 2008
     10610  4 1988 2011 2009
     10610  4 1988 2013 2010
     10610  4 1988 2013 2011
     10610  4 1988 2013 2012
     10610  5 1985 2009 2007
     10610  5 1985 2011 2008
     10610  5 1985 2012 2009
     29572  6 2002 2010 2007
     29572  6 2002 2009 2008
     11757  8 1999 2009 2007
     11757  8 1999 2009 2008
    513271  9 2003 2009 2007
    513271  9 2003 2009 2008
    533240  9 2000 2009 2008
    513271  9 2003 2011 2009
    533240  9 2000 2011 2009
    513271  9 2003 2011 2010
    533240  9 2000 2011 2010
    533240  9 2000 2012 2011
     24785 12 1987 2009 2007
     29281 12 2003 2009 2007
    133543 12 1999 2009 2007
     24785 12 1987 2009 2008
    133543 12 1999 2009 2008
    133543 12 1999 2011 2009
    133543 12 1999 2011 2010
    133543 12 1999 2012 2011
    133543 12 1999 2013 2012
    133543 12 1999 2014 2013
    133543 12 1999 2015 2014
    133543 12 1999 2016 2015
      5191 13 2002 2009 2007
     24785 13 1971 2009 2007
      5191 13 2002 2009 2008
    534028 13 2008 2009 2008
      5191 13 2002 2011 2009
    109672 13 2008 2011 2009
    534028 13 2008 2011 2009
      5191 13 2002 2011 2010
    109672 13 2008 2011 2010
    534028 13 2008 2011 2010
      5191 13 2002 2012 2011
    109672 13 2008 2012 2011
    534028 13 2008 2012 2011
      5191 13 2002 2013 2012
    109672 13 2008 2013 2012
    534028 13 2008 2013 2012
      5191 13 2002 2014 2013
    109672 13 2008 2014 2013
    534028 13 2008 2014 2013
      5191 13 2002 2015 2014
    109672 13 2008 2015 2014
    534028 13 2008 2015 2014
      5191 13 2002 2016 2015
    109672 13 2008 2016 2015
    534028 13 2008 2016 2015
      5191 13 2002 2017 2016
    109672 13 2008 2017 2016
    534028 13 2008 2017 2016
      5191 13 2002 2018 2017
    109672 13 2008 2018 2017
    534028 13 2008 2018 2017
    534028 13 2008 2019 2018
    534028 13 2008 2020 2019
    160556 14 2006 2009 2007
    515366 14 2004 2009 2007
    518512 14 1999 2009 2007
    160556 14 2006 2009 2008
    515366 14 2004 2009 2008
    518512 14 1999 2011 2008
    160556 14 2006 2011 2009
    515366 14 2004 2011 2009
    518512 14 1999 2012 2009
    160556 14 2006 2011 2010
    515366 14 2004 2011 2010
    518512 14 1999 2011 2010
    160556 14 2006 2012 2011
    515366 14 2004 2012 2011
    110983 14 2012 2013 2012
    160556 14 2006 2013 2012
    515366 14 2004 2013 2012
    110983 14 2012 2014 2013
    160556 14 2006 2014 2013
    515366 14 2004 2014 2013
    110983 14 2012 2015 2014
    160556 14 2006 2015 2014
    end
    For example (below), directorid #9 was a member of firms: 513271 from 2003 until 2011 and 533240 from 2000 until 2012. The dummy variable "Interlock" should be =1 for each of the following observations as for each one of them, director #9 was serving on 2 firms for at least one common year.

    513271 9 2003 2009 2007
    513271 9 2003 2009 2008
    533240 9 2000 2009 2008
    513271 9 2003 2011 2009
    533240 9 2000 2011 2009
    513271 9 2003 2011 2010
    533240 9 2000 2011 2010
    533240 9 2000 2012 2011

    I kindly need help with the commands to create the dummy variable "Interlock".


    P.S.: I am using Stata/SE 16.0

    Thank you in advance for your time.
    Christelle

  • #2
    Thank you for using -dataex- on your first post!

    I think the following does what you want:
    Code:
    preserve
    keep directorid companyid year_service*
    duplicates drop
    rename (companyid year_service*) =_alt
    tempfile copy
    save `copy'
    
    restore
    gen `c(obs_t)' obs_no = _n
    joinby directorid using `copy'
    by obs_no, sort: gen byte interlock = ///
        (min(year_service_ends, year_service_ends_alt) ///
        >= max(year_service_began, year_service_began_alt)) ///
        & companyid != companyid_alt
    by obs_no (interlock), sort: keep if _n == _N
    drop *_alt
    Notes:
    1. You have offered the dates only to the precision of year. It is conceivable that a given director sits at one company from Jan through June of a year and then leaves and starts a new directorship position at another company July through December. In that case, in real life, there is no interlock. But the code will consider that year to be a year of overlapping service. Without more exact dates, it is not possible to satisfactorily resolve this dilemma. The opposite strategy of coding interlock = 0 when there is only one year in common would miss the situation where the same director actually sat on two company's board for the entire year (or for overlapping parts of the year).
    2. If your real data set is large, the -joinby- command is going to be very slow. It may appear to you that Stata has hung. Be very, very patient.
    3. It is also possible that the intermediate data set that -joinby- creates will exceed the limits of available memory and Stata will exit with an error. If that happens, you should run this one director at a time, rather than in the entire data set. If this situation arises and you need help with that, post back.

    Comment


    • #3
      I'm sure there's a cleaner way, but I think this works.

      Note I've changed some data to make sure it is doing it's job.

      Code:
      sort directorid companyid
      *replace some data to check it
      replace year_service_ends = 1990 if companyid==24785 & directorid==12
      replace myear = int(year_service_began+year_service_ends)/2
      save origdata, replace
      
      tempvar temp
      unique companyid , by(directorid) generate(`temp')
      egen firmcount = max(`temp') , by(directorid)
      capture drop y1 y2
      bys directorid companyid: egen y1 = max(year_service_began)
      bys directorid companyid: egen y2 = max(year_service_ends)
      preserve
          collapse (first) y1 y2 , by(directorid companyid)
          ren y1 r1
          ren y2 r2
          ren companyid companyid2
          save rangedata, replace
      restore
      joinby directorid using rangedata
      drop if companyid==companyid2
      capture drop interlock
      g interlock = (inrange(y1,r1,r2) | inrange(y2,r1,r2) | inrange(r1,y1,y2) | inrange(r2,y1,y2) ) & firmcount>1
      collapse (max) interlock ,  by(directorid companyid)
      save interlockdata, replace
      use origdata, clear
      joinby directorid companyid using interlockdata

      Comment


      • #4
        Clyde's is better, obviously.

        He dropped duplicates, but you may want to use the maximum range.

        Comment


        • #5
          Thank you so much Clyde! The codes worked perfectly and I now have the Interlock variable.

          Also, thank you for your notes. Concerning the first point I totally agree, but unfortunately I don't have data on the specific starting and ending months so this is one limitation to my study. Thank you for pointing it out.
          My dataset has 181,815 observations, but Stata was super fast in running the codes you gave me.

          I have 3 more followup requests if that is okay.

          1) The Interlock variable is a dummy variable. I am also interested in having a continuous interlock variable. That is, I want to know the number of different firms a director served on in a given year.
          2) I also want Interlock variables for the firm (instead of director).
          a. Interlocked_firm: a dummy variable that takes the value of 1 if at least one director of that specific firm is interlocked (Interlock=1) in a given year, and 0 otherwise.
          b. Nb_Interlocks_firm: a continuous variable for the number of interlocks a firm has (if Interlocked_firm=1). That is, to how many different firms, a certain firm is connected (i.e. interlocked) in a given year.

          Again, thank you so much.

          Comment


          • #6
            George, thank you so much.
            The codes worked but for some reason (which I don't know) I lost around 25% of the observations.
            Using the codes provided by Clyde, all my observations were preserved. However, given that you pointed out the "maximum range" thing, yes this is what I want. I want to have the largest time period (i.e. range).
            That is, the minimum of the year_service_began and the maximum of the year_service_ended.
            In that case, do you think I should swap the below bold part of Clyde's code? Or would that ruin the code and thus the outcome?

            by obs_no, sort: gen byte interlock = /// (min(year_service_ends, year_service_ends_alt) /// >= max(year_service_began, year_service_began_alt)) /// & companyid != companyid_alt

            Again, thank you!!
            Last edited by Christelle Alkhoury; 22 Feb 2023, 12:35.

            Comment


            • #7
              that should work

              Comment


              • #8
                To implement the maximum range of service, do not do what you show in #6. Here is code that will do that:

                Code:
                preserve
                keep directorid companyid year_service*
                collapse (min) year_service_began (max) year_service_ends, by(directorid companyid)
                rename (companyid year_service*) =_alt
                tempfile copy
                save `copy'
                
                restore
                gen `c(obs_t)' obs_no = _n
                joinby directorid using `copy'
                by obs_no, sort: gen byte interlock = ///
                    (min(year_service_ends, year_service_ends_alt) ///
                    >= max(year_service_began, year_service_began_alt)) ///
                    & companyid != companyid_alt
                by obs_no (interlock), sort: keep if _n == _N
                drop *_alt
                It's just a matter of replacing the -duplicates drop- command in the original with a -collapse- command that creates the range of service. That said, I don't quite get why you would want to do that. If a director serves from, say Jan 2020 through June 2020, then leaves and serves elsewhere from July 2020 through September 2020, then leaves there and returns to the first company for the rest of 2020, I don't see why you would want to consider it interlocked. But it's your project, so you get to make that call.

                Concerning the additional variables you want, the nb_interlocks_firm adds a slight complication, because you could have two directors who overlap with the same other company, but you only want to count that once, I take it. So the overall approach has to be slightly different and does not just build on the previous code.
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input long companyid int(directorid year_service_began year_service_ends myear)
                529825  1 1983 2011 2007
                529825  1 1983 2010 2008
                529825  1 1983 2011 2009
                529825  1 1983 2013 2010
                529825  1 1983 2013 2011
                529825  1 1983 2013 2012
                106645  2 1985 2009 2007
                106645  2 1985 2011 2008
                106645  2 1985 2012 2009
                106645  2 1985 2011 2010
                522777  3 2000 2009 2007
                522777  3 2000 2011 2008
                 10610  4 1988 2011 2007
                 10610  4 1988 2010 2008
                 10610  4 1988 2011 2009
                 10610  4 1988 2013 2010
                 10610  4 1988 2013 2011
                 10610  4 1988 2013 2012
                 10610  5 1985 2009 2007
                 10610  5 1985 2011 2008
                 10610  5 1985 2012 2009
                 29572  6 2002 2010 2007
                 29572  6 2002 2009 2008
                 11757  8 1999 2009 2007
                 11757  8 1999 2009 2008
                513271  9 2003 2009 2007
                513271  9 2003 2009 2008
                533240  9 2000 2009 2008
                513271  9 2003 2011 2009
                533240  9 2000 2011 2009
                513271  9 2003 2011 2010
                533240  9 2000 2011 2010
                533240  9 2000 2012 2011
                 24785 12 1987 2009 2007
                 29281 12 2003 2009 2007
                133543 12 1999 2009 2007
                 24785 12 1987 2009 2008
                133543 12 1999 2009 2008
                133543 12 1999 2011 2009
                133543 12 1999 2011 2010
                133543 12 1999 2012 2011
                133543 12 1999 2013 2012
                133543 12 1999 2014 2013
                133543 12 1999 2015 2014
                133543 12 1999 2016 2015
                  5191 13 2002 2009 2007
                 24785 13 1971 2009 2007
                  5191 13 2002 2009 2008
                534028 13 2008 2009 2008
                  5191 13 2002 2011 2009
                109672 13 2008 2011 2009
                534028 13 2008 2011 2009
                  5191 13 2002 2011 2010
                109672 13 2008 2011 2010
                534028 13 2008 2011 2010
                  5191 13 2002 2012 2011
                109672 13 2008 2012 2011
                534028 13 2008 2012 2011
                  5191 13 2002 2013 2012
                109672 13 2008 2013 2012
                534028 13 2008 2013 2012
                  5191 13 2002 2014 2013
                109672 13 2008 2014 2013
                534028 13 2008 2014 2013
                  5191 13 2002 2015 2014
                109672 13 2008 2015 2014
                534028 13 2008 2015 2014
                  5191 13 2002 2016 2015
                109672 13 2008 2016 2015
                534028 13 2008 2016 2015
                  5191 13 2002 2017 2016
                109672 13 2008 2017 2016
                534028 13 2008 2017 2016
                  5191 13 2002 2018 2017
                109672 13 2008 2018 2017
                534028 13 2008 2018 2017
                534028 13 2008 2019 2018
                534028 13 2008 2020 2019
                160556 14 2006 2009 2007
                515366 14 2004 2009 2007
                518512 14 1999 2009 2007
                160556 14 2006 2009 2008
                515366 14 2004 2009 2008
                518512 14 1999 2011 2008
                160556 14 2006 2011 2009
                515366 14 2004 2011 2009
                518512 14 1999 2012 2009
                160556 14 2006 2011 2010
                515366 14 2004 2011 2010
                518512 14 1999 2011 2010
                160556 14 2006 2012 2011
                515366 14 2004 2012 2011
                110983 14 2012 2013 2012
                160556 14 2006 2013 2012
                515366 14 2004 2013 2012
                110983 14 2012 2014 2013
                160556 14 2006 2014 2013
                515366 14 2004 2014 2013
                110983 14 2012 2015 2014
                160556 14 2006 2015 2014
                end
                
                preserve
                keep directorid companyid year_service*
                collapse (min) year_service_began (max) year_service_ends, by(directorid companyid)
                rename (companyid year_service*) =_alt
                tempfile copy
                save `copy'
                
                restore
                gen `c(obs_t)' obs_no = _n
                joinby directorid using `copy'
                by obs_no, sort: gen byte interlock = ///
                    (min(year_service_ends, year_service_ends_alt) ///
                    >= max(year_service_began, year_service_began_alt)) ///
                    & companyid != companyid_alt
                by companyid directorid myear (obs_no), sort: egen nb_interlocks_director ///
                    = total(interlock)
                by companyid myear, sort: egen interlocked_firm = max(interlock)
                by companyid myear (companyid_alt), sort: gen nb_interlocks_firm ///
                    = sum(interlock & (companyid_alt != companyid_alt[_n-1]))
                by companyid myear (companyid_alt): replace nb_interlocks_firm ///
                    = nb_interlocks_firm[_N]
                by obs_no, sort: keep if _n == _N
                Note: In the above code, I have used the "maximum range" approach. If you decide you don't want that, replace that -collapse- command by -duplicates drop-.


                Comment


                • #9
                  Thank you Clyde.

                  I agree. I realized there's no need to use the maximum range, so I'm sticking with the -duplicates drop- instead of -collapse-.

                  Apologies for posting again, but I have some concerns which I noted below.

                  Using the original commands you gave me to get only the Interlock dummy variable for the board members, I got the below output:

                  interlock Freq. Percent Cum.

                  0 102,835 56.56 56.56
                  1 78,980 43.44 100.00

                  Total 181,815 100.00



                  However, when I ran the new commands which you have given me in the last post to get the 4 variables (replacing -collapse- command by -duplicates drop-) I got different results for the percentages of the same variable (56.56% for interlock=0 above, and 79.26% for interlock=0 below).
                  Also, the percent for the dummies when they are equal to 0 are different from the percent for the value of 0 for the relative continuous variables (below in bold: directors: 79.26% and 56.56%, firms: 7.8% and 8.2%).


                  For Directors:


                  interlock Freq. Percent Cum.

                  0 144,102 79.26 79.26
                  1 37,713 20.74 100.00

                  Total 181,815 100.00


                  nb_interlocks_director | Freq. Percent Cum.

                  0 | 102,830 56.56 56.56
                  1 | 7,798 4.29 60.85
                  2 | 7,629 4.20 65.04
                  3 | 6,511 3.58 68.62
                  (etc.)

                  Total | 181,815 100.00



                  For Firms:

                  interlocked_firm
                  | Freq. Percent Cum.

                  0 | 14,175 7.80 7.80
                  1 | 167,640 92.20 100.00

                  Total | 181,815 100.00



                  nb_interlocks_firm | Freq. Percent Cum.

                  0 | 14,905 8.20 8.20
                  1 | 16,426 9.03 17.23
                  2 | 15,866 8.73 25.96
                  3 | 14,029 7.72 33.67
                  (etc.)

                  Total | 181,815 100.00

                  .

                  I am not sure what should be fixed in that case, so I would really appreciate further support on this matter.
                  Thank you
                  Last edited by Christelle Alkhoury; 23 Feb 2023, 13:56.

                  Comment


                  • #10
                    You are right. The second code was wrong. Sorry for the error. To make the code and results clearer, I have removed the variable interlock at the end of the program. Instead, there is a new variable, interlocked_director, that indicates, for each companyid directorid year combination, whether that director has an interlocking relationship with another company in that year. (This is what the variable interlock was intended to be.)

                    I also noticed an error in calculating the number of interlocks for a firm, and that is corrected here as well.

                    Code:
                    preserve
                    keep directorid companyid year_service*
                    collapse (min) year_service_began (max) year_service_ends, by(directorid companyid)
                    rename (companyid year_service*) =_alt
                    tempfile copy
                    save `copy'
                    
                    restore
                    gen `c(obs_t)' obs_no = _n
                    joinby directorid using `copy'
                    by obs_no, sort: gen byte interlock = ///
                        (min(year_service_ends, year_service_ends_alt) ///
                        >= max(year_service_began, year_service_began_alt)) ///
                        & companyid != companyid_alt
                    by companyid directorid myear (obs_no), sort: egen interlocked_director = max(interlock)
                    by companyid directorid myear (obs_no), sort: egen nb_interlocks_director ///
                        = total(interlock)
                    by companyid myear, sort: egen interlocked_firm = max(interlock)
                    gsort companyid myear companyid_alt -interlock
                    by companyid myear: gen nb_interlocks_firm ///
                        = sum(interlock & (companyid_alt != companyid_alt[_n-1]))
                    by companyid myear (companyid_alt): replace nb_interlocks_firm ///
                        = nb_interlocks_firm[_N]
                    drop interlock
                    by obs_no, sort: keep if _n == _N
                    Last edited by Clyde Schechter; 23 Feb 2023, 14:03.

                    Comment


                    • #11
                      No worries at all and thank you so much. These commands are perfectly fine and gave me the required outcomes.

                      Much appreciated.

                      Comment


                      • #12
                        Analysis of "interlocks" is a classic topic within social network-oriented (SNA) studies of organizations. If an SNA approach is more generally relevant here, I'd like to point to the user-written package -nwcommands- (See http://www.nwcommands.org/) I've only used this package a little, and I'm not really an SNA person, but I've been impressed at its breadth and speed. I believe all of its work happens in Mata, and it thereby avoids some memory issues that can occur with -joinby- and the like. Among the relatively simple things it could do here would be to draw diagrams of the interlock structure, and to calculate various other node to node network distance measures, if I recall correctly.

                        Comment


                        • #13
                          Thank you Mike for sharing this. It is really interesting!

                          Comment

                          Working...
                          X