Lag Values Excluding Observation

Mansoor Afzali

Join Date: Feb 2019

Posts: 17
#16

11 Feb 2019, 13:56

Hi again. Here's the problem I am facing now. I have two data sets. One that looks like #1 and has all the information about firms and directors. And the other that looks like the example below based on your code in #14. I can calculate whether the director has served in the same size tercile or not based on #2 but that gives me the information relative to that specific firm. Suppose I want to know the information for the following example data set, what do I do then? In the data set below, all the seats a director holds in the previous year are not listed, therefore, if I want to know if the director has served in the same size tercile, I can't determine that.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str6 firmid double(fyear directorid) byte selection "001004" 2005 1147 0 "001004" 2005 2455 0 "001004" 2005 2615 0 "001004" 2005 3162 0 "001004" 2005 3162 0 "001004" 2005 3335 0 "001004" 2005 4669 0 "001004" 2005 5769 0 "001004" 2005 6098 0 "001004" 2005 6098 0 "001004" 2005 6897 0 "001004" 2005 7209 0 "001004" 2005 7683 0 "001004" 2005 7801 0 "001004" 2005 8036 0 "001004" 2005 8658 0 "001004" 2005 8803 0 "001004" 2005 9960 0 "001004" 2005 10122 0 "001004" 2005 10122 0 "001004" 2005 10339 0 "001004" 2005 13187 0 "001004" 2005 14651 0 "001004" 2005 15482 0 "001004" 2005 16567 0 "001004" 2005 16672 0 "001004" 2005 17103 0 "001004" 2005 17103 0 end

This data set is large since I have 1,000 random (unpicked) directors for every director picked.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#17

11 Feb 2019, 14:04

So it sounds like you want to enlarge this data set to include information about directorships in the prior year. That information is in your original data set, which I'll call original_data in the code below:

Code:

rangejoin fyear -1 -1 using original_data, by(directorid)

At the end of that each observation in the data from #16 will be paired up with all preceding year directorships.
Comment
Mansoor Afzali

Join Date: Feb 2019

Posts: 17
#18

12 Feb 2019, 11:00

Thank you Clyde. I just realised that #14 results in duplicate director matches to firms in the same year. How do I assign unique 1,000 unpicked directors to every director picked. I could drop the duplicates but that results in a decrease in the number of directors assigned.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#19

12 Feb 2019, 12:27

The code in #8 and #14 does not introduce any duplicates that weren't in the data already. I think there was a misunderstanding back at #8. Your posts before that had included example data in which there were multiple observations per director, corresponding to different years. The code in #8 is predicated in starting from a file in which each directorid appears only once. If you were starting it instead from a data set which multiple observations per directorid, then it will retain and propagate those duplicates. So you need to first create a file that contains only distinct directorid values and then run the matching in #8 and the reorganization in #14.
Comment
Mansoor Afzali

Join Date: Feb 2019

Posts: 17
#20

12 Feb 2019, 12:46

Thank you Clyde. I found the source of duplicates. The file I am using to start with in #14 has no duplicates in regards to firm_id, year, and director_id. However, a director could be hired by multiple firms in the same year. That is why there exist duplicate director_ids with different firm_ids. I am sorry for not pointing out that firm_id is the other dimension in the data set. Is it possible to do something like #14 given this kind of data set and avoid duplicates.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30121

#21

12 Feb 2019, 13:03

So it sounds like what you need is this:

Code:

preserve
tempfile matches
rename directorid unpicked
keep unpicked
duplicates drop
save `matches'

restore
rename directorid picked
gen filename = `"`matches'"'

set seed 1234

capture program drop one_match
program define one_match
    local copy = filename[1]
    cross using `copy'
    drop if picked == unpicked
    gen double shuffle = runiform()
    sort shuffle
    keep in 1/1000
    drop shuffle
    exit
end

runby one_match, by(picked) status    
drop filename

This retains the multiple observations of picked directors in each year or over multiple firms, but reduces the list of potential unpicked matches to a data set containing each directorid only once.

Announcement

Comment

Comment

Comment

Comment

Comment

Comment