Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding Pairs within a column

    Dear Statalists,

    my dataset is about directors and which companies they work for and looks like this:

    Company_ID Director_ID Term Begin Term End new_app
    40416 1 1977 1999 0
    15456 1 1983 2013 0
    76085 1 1994 2005 0
    44601 2 1975 2005 0
    15720 2 1988 2002 0
    86594 2 1999 2005 1
    44601 3 1990 2006 0
    78172 3 2000 2005 1
    87657 3 2000 2009 1
    44601 4 1988 2013 0
    44601 5 1985 1999 0
    40707 5 1991 2003 0
    57568 6 1995 2007 0
    80316 6 2002 2009 1
    52840 7 1987 2009 0
    24221 7 1999 2005 1
    24221 8 1999 2009 1
    52840 9 1970 2005 0
    62148 9 1998 2002 1
    59459 9 1998 2003 1
    24221 9 1999 2004 1
    90601 9 2000 2012 1
    10137 9 2003 2011 1

    The variable "new_app" is 1 whenever the term of a director began between 1996 and 2014, otherwise it is 0.
    Now I would like to create "company pairs". That means whenever a director was appointed by a company in the above mentioned time period (new_app = 1), I want to know all other companies he worked for in that year. For example, if we take Director 2, when he was appointed by company 86594 in 1999, he also worked for companies 44601 and 15720, as his terms at those companies ended in 2005/2002. So I need to identify the combinations or pairs of companies. Ideally, the output (only director 2) would look like this:

    Company_1 Company_2 Director_ID Year
    86594 44601 2 1999
    86594 15720 2 1999


    Can anyone help me out with this? I´m willing to thank with a small reward!

  • #2
    You should give your Stata version, and provide a dataex output, and also state whether you want all pairs, e.g., would company 44601 in your example also be a company 1?

    Comment


    • #3
      Hey Dave,

      my Stata version is MP 15.0. I want all pairs, for which one company has appointed the director while he was already working for the other one. It has no relevance for me which company is under company_1/ company_2, but for consistency, it might be useful to list all companies with new_app = 1 in one column and the corresponding pair in the 2nd column. But it also might be the case, that both companies will have new_app = 1.

      Company 44601 might be a company_1 for another director, but for director 2, I just care about the two pairs above. For example, companies 44601 and 15720 shouldn´t be displayed as a pair.

      What do you mean by dataex output?
      Thanks!

      Comment


      • #4
        See FAQ 12.2 or type help dataex. Basically when you give an example dataset it ensures someone can easily get it from Statalist to their Stata to help you. No one wants to spend unnecessary data management time helping Statalist posters.

        Comment


        • #5
          Sorry Dave,

          I guess thats what you asked for:


          clear
          input long permno double didown int dirsince long year_term_ends float(year new_app)
          40416 1 1977 1999 1998 0
          15456 1 1983 2013 2012 0
          76085 1 1994 2005 2004 0
          44601 2 1975 2005 2004 0
          15720 2 1988 2002 2001 0
          86594 2 1999 2005 2002 1
          44601 3 1990 2006 2005 0
          78172 3 2000 2005 2003 1
          87657 3 2000 2009 2007 1
          44601 4 1988 2013 2012 0
          44601 5 1985 1999 2009 0
          40707 5 1991 2003 2002 0
          57568 6 1995 2007 2006 0
          80316 6 2002 2009 2008 1
          52840 7 1987 2009 2006 0
          24221 7 1999 2005 2002 1
          24221 8 1999 2009 2008 1
          52840 9 1970 2005 2002 0
          62148 9 1998 2002 2001 1
          59459 9 1998 2003 2002 1
          24221 9 1999 2004 2001 1
          90601 9 2000 2012 2011 1
          10137 9 2003 2011 2010 1
          65138 12 1983 2000 1999 0
          21371 12 1987 2009 2008 0
          75789 12 1990 2005 2004 0
          65859 12 1991 1999 1998 0
          66093 12 1999 2015 2014 1
          85255 12 2003 2009 2007 1
          21371 13 1971 2009 2007 0
          65138 13 1987 2002 2001 0
          15368 13 1994 2000 1999 0
          59176 13 2002 2015 2014 1
          57817 13 2008 2014 2013 1
          85348 13 2008 2015 2014 1
          51772 14 1987 1999 1998 0
          25144 14 1990 1999 1998 0
          47159 14 1999 2006 2003 1
          87146 14 1999 2011 2010 1
          76226 14 2006 2011 2010 1
          44206 14 2012 2014 2013 1
          end

          Comment


          • #6
            So I wonder if this is a task for SSC rangejoin?

            Comment


            • #7
              Dave, big thanks!

              I managed to generate the pairs and to drop duplicates (a pair of the same company). But for the companies with new_app=1, there are still duplicates. To illustrate that, I provide you with the dataex:



              clear
              input long permno double didown int dirsince long year_term_ends float new_app long permno_1 int app_year
              44601 3 1990 2006 0 78172 2000
              44601 3 1990 2006 0 87657 2000
              78172 3 2000 2005 1 87657 2000
              87657 3 2000 2009 1 78172 2000
              end



              As you can see for director 3, the companies 78172 and 87657 are displayed as a pair twice, each as company1 (permno) and then as company2 (permno_1). But in fact it´s only one pair.
              Any idea to delete the duplicates?

              Comment


              • #8
                I think it has to do with the fact that you are merging to more than you want to. For example here I'm just taking the first and then the second new_app == 1 obs within didown. You don't get the duplication, but this is not a programmatic solution.We need the authors of rangejoin to weigh in on this problem!



                Code:
                clear
                input permno didown dirsince year_term_ends year new_app
                40416 1 1977 1999 1998 0
                15456 1 1983 2013 2012 0
                76085 1 1994 2005 2004 0
                44601 2 1975 2005 2004 0
                15720 2 1988 2002 2001 0
                86594 2 1999 2005 2002 1
                44601 3 1990 2006 2005 0
                78172 3 2000 2005 2003 1
                87657 3 2000 2009 2007 1
                44601 4 1988 2013 2012 0
                44601 5 1985 1999 2009 0
                40707 5 1991 2003 2002 0
                57568 6 1995 2007 2006 0
                80316 6 2002 2009 2008 1
                52840 7 1987 2009 2006 0
                24221 7 1999 2005 2002 1
                24221 8 1999 2009 2008 1
                52840 9 1970 2005 2002 0
                62148 9 1998 2002 2001 1
                59459 9 1998 2003 2002 1
                24221 9 1999 2004 2001 1
                90601 9 2000 2012 2011 1
                10137 9 2003 2011 2010 1
                65138 12 1983 2000 1999 0
                21371 12 1987 2009 2008 0
                75789 12 1990 2005 2004 0
                65859 12 1991 1999 1998 0
                66093 12 1999 2015 2014 1
                85255 12 2003 2009 2007 1
                21371 13 1971 2009 2007 0
                65138 13 1987 2002 2001 0
                15368 13 1994 2000 1999 0
                59176 13 2002 2015 2014 1
                57817 13 2008 2014 2013 1
                85348 13 2008 2015 2014 1
                51772 14 1987 1999 1998 0
                25144 14 1990 1999 1998 0
                47159 14 1999 2006 2003 1
                87146 14 1999 2011 2010 1
                76226 14 2006 2011 2010 1
                44206 14 2012 2014 2013 1
                end
                
                gsort didown -new_app dirsince
                by didown: generate obs = _n
                list, sepby(didown)
                save full.dta, replace
                
                keep if new_app == 1 & obs == 1 // the first new_app within didown
                keep permno didown dirsince
                save part.dta, replace
                
                use full.dta, clear
                drop if new_app == 1 & obs == 1
                keep permno didown dirsince year_term_ends
                
                rangejoin dirsince dirsince year_term_ends using part.dta, by(didown)
                drop if permno_U == .
                list, sepby(didown)
                
                use full.dta, replace
                keep if new_app == 1 & obs == 2 // the second new_app within didown
                keep permno didown dirsince
                save part.dta, replace
                
                use full.dta, clear
                drop if new_app == 1 & inlist(obs, 1, 2)
                keep permno didown dirsince year_term_ends
                
                rangejoin dirsince dirsince year_term_ends using part.dta, by(didown)
                drop if permno_U == .
                list, sepby(didown)

                Comment


                • #9
                  Again, thank you!

                  You´re right. I think there are directors working for more than 10 companies. For those cases, it will become a long code
                  Will keep looking for a solution.

                  Comment


                  • #10
                    How about putting things in a loop?

                    Code:
                    clear
                    input permno didown dirsince year_term_ends year new_app
                    40416 1 1977 1999 1998 0
                    15456 1 1983 2013 2012 0
                    76085 1 1994 2005 2004 0
                    44601 2 1975 2005 2004 0
                    15720 2 1988 2002 2001 0
                    86594 2 1999 2005 2002 1
                    44601 3 1990 2006 2005 0
                    78172 3 2000 2005 2003 1
                    87657 3 2000 2009 2007 1
                    44601 4 1988 2013 2012 0
                    44601 5 1985 1999 2009 0
                    40707 5 1991 2003 2002 0
                    57568 6 1995 2007 2006 0
                    80316 6 2002 2009 2008 1
                    52840 7 1987 2009 2006 0
                    24221 7 1999 2005 2002 1
                    24221 8 1999 2009 2008 1
                    52840 9 1970 2005 2002 0
                    62148 9 1998 2002 2001 1
                    59459 9 1998 2003 2002 1
                    24221 9 1999 2004 2001 1
                    90601 9 2000 2012 2011 1
                    10137 9 2003 2011 2010 1
                    65138 12 1983 2000 1999 0
                    21371 12 1987 2009 2008 0
                    75789 12 1990 2005 2004 0
                    65859 12 1991 1999 1998 0
                    66093 12 1999 2015 2014 1
                    85255 12 2003 2009 2007 1
                    21371 13 1971 2009 2007 0
                    65138 13 1987 2002 2001 0
                    15368 13 1994 2000 1999 0
                    59176 13 2002 2015 2014 1
                    57817 13 2008 2014 2013 1
                    85348 13 2008 2015 2014 1
                    51772 14 1987 1999 1998 0
                    25144 14 1990 1999 1998 0
                    47159 14 1999 2006 2003 1
                    87146 14 1999 2011 2010 1
                    76226 14 2006 2011 2010 1
                    44206 14 2012 2014 2013 1
                    end
                    
                    gsort didown -new_app dirsince
                    by didown: generate obs = _n
                    list, sepby(didown)
                    save full.dta, replace
                    
                    summarize obs if new_app == 1
                    
                    foreach i of numlist 1/5 {
                        use full.dta, clear
                        keep if new_app == 1 & obs == `i' // the first new_app within didown
                        keep permno didown dirsince
                        save part.dta, replace
                    
                        use full.dta, clear
                        drop if new_app == 1 & obs <= `i'
                        keep permno didown dirsince year_term_ends
                    
                        rangejoin dirsince dirsince year_term_ends using part.dta, by(didown)
                        drop if permno_U == .
                        list, sepby(didown)
                        save part`i'.dta, replace
                    }
                    
                    use part1.dta, clear
                    foreach i of numlist 2/5 {
                        append using part`i'.dta
                    }
                    gsort didown dirsince
                    list, sepby(didown)

                    Comment


                    • #11
                      Dave, what about this way?

                      I multiplied permno and permno_1 (company1 and company2) to get an identifier which should be unique but identical for the duplicate pairs.
                      By using histogram I check that each identifier appears max. 2 times per director and then I delete one of the two lines with the respect identifier.


                      Code:
                      generate duplics = permno*permno_1
                      sort didown duplics
                      by didown duplics: generate count_duplics = _n
                      by didown duplics: generate total_duplics = _N
                      histogram total_duplics
                      drop if count_duplics == 2
                      drop count_duplics total_duplics
                      Thank you again!

                      Comment


                      • #12
                        Hey I think if it works that's great! Do we get the same results for the example file?

                        Comment


                        • #13
                          For director 9 I have definitely more results and that must be right.
                          But right now I am too tired to search for the errors

                          Thanks again!

                          Comment


                          • #14
                            Sure thing. I did not have the time to look at the files my code was generating vs what you wanted, so I'm not surprised if things are incorrect. I was using the opportunity to download and try rangejoin again. I've used rangestat once or twice but not rangejoin.

                            Comment

                            Working...
                            X