Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi again. Here's the problem I am facing now. I have two data sets. One that looks like #1 and has all the information about firms and directors. And the other that looks like the example below based on your code in #14. I can calculate whether the director has served in the same size tercile or not based on #2 but that gives me the information relative to that specific firm. Suppose I want to know the information for the following example data set, what do I do then? In the data set below, all the seats a director holds in the previous year are not listed, therefore, if I want to know if the director has served in the same size tercile, I can't determine that.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 firmid double(fyear directorid) byte selection
    "001004" 2005  1147 0
    "001004" 2005  2455 0
    "001004" 2005  2615 0
    "001004" 2005  3162 0
    "001004" 2005  3162 0
    "001004" 2005  3335 0
    "001004" 2005  4669 0
    "001004" 2005  5769 0
    "001004" 2005  6098 0
    "001004" 2005  6098 0
    "001004" 2005  6897 0
    "001004" 2005  7209 0
    "001004" 2005  7683 0
    "001004" 2005  7801 0
    "001004" 2005  8036 0
    "001004" 2005  8658 0
    "001004" 2005  8803 0
    "001004" 2005  9960 0
    "001004" 2005 10122 0
    "001004" 2005 10122 0
    "001004" 2005 10339 0
    "001004" 2005 13187 0
    "001004" 2005 14651 0
    "001004" 2005 15482 0
    "001004" 2005 16567 0
    "001004" 2005 16672 0
    "001004" 2005 17103 0
    "001004" 2005 17103 0
    end
    This data set is large since I have 1,000 random (unpicked) directors for every director picked.

    Comment


    • #17
      So it sounds like you want to enlarge this data set to include information about directorships in the prior year. That information is in your original data set, which I'll call original_data in the code below:

      Code:
      rangejoin fyear -1 -1 using original_data, by(directorid)
      At the end of that each observation in the data from #16 will be paired up with all preceding year directorships.

      Comment


      • #18
        Thank you Clyde. I just realised that #14 results in duplicate director matches to firms in the same year. How do I assign unique 1,000 unpicked directors to every director picked. I could drop the duplicates but that results in a decrease in the number of directors assigned.

        Comment


        • #19
          The code in #8 and #14 does not introduce any duplicates that weren't in the data already. I think there was a misunderstanding back at #8. Your posts before that had included example data in which there were multiple observations per director, corresponding to different years. The code in #8 is predicated in starting from a file in which each directorid appears only once. If you were starting it instead from a data set which multiple observations per directorid, then it will retain and propagate those duplicates. So you need to first create a file that contains only distinct directorid values and then run the matching in #8 and the reorganization in #14.

          Comment


          • #20
            Thank you Clyde. I found the source of duplicates. The file I am using to start with in #14 has no duplicates in regards to firm_id, year, and director_id. However, a director could be hired by multiple firms in the same year. That is why there exist duplicate director_ids with different firm_ids. I am sorry for not pointing out that firm_id is the other dimension in the data set. Is it possible to do something like #14 given this kind of data set and avoid duplicates.

            Comment


            • #21
              So it sounds like what you need is this:

              Code:
              preserve
              tempfile matches
              rename directorid unpicked
              keep unpicked
              duplicates drop
              save `matches'
              
              restore
              rename directorid picked
              gen filename = `"`matches'"'
              
              set seed 1234
              
              capture program drop one_match
              program define one_match
                  local copy = filename[1]
                  cross using `copy'
                  drop if picked == unpicked
                  gen double shuffle = runiform()
                  sort shuffle
                  keep in 1/1000
                  drop shuffle
                  exit
              end
              
              runby one_match, by(picked) status    
              drop filename
              This retains the multiple observations of picked directors in each year or over multiple firms, but reduces the list of potential unpicked matches to a data set containing each directorid only once.

              Comment

              Working...
              X