Dear all,
I tried googling this (and search statalist directly), and have found some useful threads, but I still can not solve this challenge, so I am posting here and would be very grateful for any help and advice.
I have a data set containing three variables:
year - self explanatory
fid - firm id by ticker on stock market
mid - manager id, simply the name of the manager
My goal is to identify which (if any) firms that are connected through managers. The research is on interlocking directorates.
Originally the data looked like this:
year fid mid1 mid2 midx..
2010 XYZ AA BB ...
2011 XYZ AA BB ...
2011 ABC CC AA ...
2013 BDE DD EE ...
After some research I found out that it would be beneficial to transform the data to this:
year fid mid
2010 XYZ AA
2010 XYZ BB
2011 XYZ AA
2011 XYZ BB
2011 ABC CC
2011 ABC AA
2013 BDE DD
2013 BDE EE
Thus, since manager AA sits on the board of firm XYZ and ABC, there is an interlock between the firms.
I found a useful post here on the forum which I have tried. The code was the following:
use "bi"
rename mid mid2
joinby fid using "bi"
list fid mid mid2, noobs sepby(mid2)
>
* reduce to connected managers list
sort mid mid2
by mid mid2: keep if _n == 1
drop if mid2 <= mid
list mid mid2, noobs sepby(mid)
save "connect"
>
* to get a complete list of all possibilities
* form all pairwise combinations of managers
use "bi", clear
keep mid
sort mid
by mid: keep if _n == 1
save "mgrs"
use "mgrs"
rename mid mid2
cross using "mgrs"
*cross is not working, get cross to work, buy a beer*
sort mid mid2
list mid mid2, noobs sepby(mid)
drop if mid2 <= mid
>
* final list
merge 1:1 mid mid2 using "connect"
gen connect = _merge == 3
sort mid mid2
list mid mid2 connect, noobs sepby(mid)
I tried this code with a small sample and received a desired result. However, when I tried with my real data set. I received the following message:
sum of expand values exceed 2,147,483,620
The dataset may not contain more than 2,147,483,620 observations.
The dataset contains 261 872 observations.
I have tried to google and look through the forum for any other alternative ways to solve my problem, but I keep banging my head against the wall with this one. There may be a rather obvious solution to this, but I am not that familiar with stata. Thus I reach out to you experts and hope someone can save me.
If anyone knows how to work around this problem of the number of observations, or have an alternative solution in order to identify interlocks, I would deeply appreciate it.
Thank you,
Kim.
I tried googling this (and search statalist directly), and have found some useful threads, but I still can not solve this challenge, so I am posting here and would be very grateful for any help and advice.
I have a data set containing three variables:
year - self explanatory
fid - firm id by ticker on stock market
mid - manager id, simply the name of the manager
My goal is to identify which (if any) firms that are connected through managers. The research is on interlocking directorates.
Originally the data looked like this:
year fid mid1 mid2 midx..
2010 XYZ AA BB ...
2011 XYZ AA BB ...
2011 ABC CC AA ...
2013 BDE DD EE ...
After some research I found out that it would be beneficial to transform the data to this:
year fid mid
2010 XYZ AA
2010 XYZ BB
2011 XYZ AA
2011 XYZ BB
2011 ABC CC
2011 ABC AA
2013 BDE DD
2013 BDE EE
Thus, since manager AA sits on the board of firm XYZ and ABC, there is an interlock between the firms.
I found a useful post here on the forum which I have tried. The code was the following:
use "bi"
rename mid mid2
joinby fid using "bi"
list fid mid mid2, noobs sepby(mid2)
>
* reduce to connected managers list
sort mid mid2
by mid mid2: keep if _n == 1
drop if mid2 <= mid
list mid mid2, noobs sepby(mid)
save "connect"
>
* to get a complete list of all possibilities
* form all pairwise combinations of managers
use "bi", clear
keep mid
sort mid
by mid: keep if _n == 1
save "mgrs"
use "mgrs"
rename mid mid2
cross using "mgrs"
*cross is not working, get cross to work, buy a beer*
sort mid mid2
list mid mid2, noobs sepby(mid)
drop if mid2 <= mid
>
* final list
merge 1:1 mid mid2 using "connect"
gen connect = _merge == 3
sort mid mid2
list mid mid2 connect, noobs sepby(mid)
I tried this code with a small sample and received a desired result. However, when I tried with my real data set. I received the following message:
sum of expand values exceed 2,147,483,620
The dataset may not contain more than 2,147,483,620 observations.
The dataset contains 261 872 observations.
I have tried to google and look through the forum for any other alternative ways to solve my problem, but I keep banging my head against the wall with this one. There may be a rather obvious solution to this, but I am not that familiar with stata. Thus I reach out to you experts and hope someone can save me.
If anyone knows how to work around this problem of the number of observations, or have an alternative solution in order to identify interlocks, I would deeply appreciate it.
Thank you,
Kim.
Comment