Dear Statalisters,
I have a big panel dataset that contains institutional owners ownership data. The basic data structure is
input year stockid managerid sharesowned ownedvalue sharesout marketcap
1980 16510 26590 10000 80000 150000 1200000
1980 16510 15345 20000 160000 150000 1200000
1980 16510 82100 1400 11200 150000 1200000
1981 16510 26590 20000 300000 150000 2250000
1981 16510 15345 21000 315000 150000 2250000
1981 16510 82100 2400 36000 150000 2250000
1980 36110 26590 1000 16000 2000000 32000000
1980 36110 15345 20000 340000 2000000 34000000
1980 36310 82100 21000 378000 152000 2736000
end
In year 1980, stock 16510 was owned by 3 different institutional managers, and last column is the total number of shares outstanding of the firm, it is always greater than the sum across managers, because stocks are held by both individual investors and institutional investors.
What I need:
Identify each pair of managers (by managerid) that owns the same stock in the same year and calculate their joint ownership .
e.g. in year 1980 managerid 26590 and 15345 both own two stocks in common: stock 16510 and 36110
each manager holds 80,000 and 160,000 IN OWNEDVALUE for stock 16510 and 16000, and 340000 for stock 36110.
So for each pair of managers owning the same stock, I want to create a record, and records their respective ownership value.
Output data (want) - not a complete data output :
year manager_1 manager_2 stockid manager_1ownedvalue manager_2ownedvalue
1980 26590 15345 16510 80000 160000
1980 26590 15345 36110 16000 340000
1980 26590 82100 16510 80000 11200
1980 15345 82100 16510 160000 11200
I do not know if using joinby is a good idea because my data is huge as explained below:
My data cover 30 years, and all public listed firms, given that each stock could have many institutional managers, so the data set is very large. I break it down into years (e.g. 1980-1983, 1984-1986, )not to freeze my stata 13. 1980-1983 data has over 470,000 rows, if joinby, with a full Cartesian product, I am afraid, the memory requirement will freeze my computer.
I hope you could suggest an efficient way of doing this (sample coding is welcome !)
Thank you,
Rochelle
I have a big panel dataset that contains institutional owners ownership data. The basic data structure is
input year stockid managerid sharesowned ownedvalue sharesout marketcap
1980 16510 26590 10000 80000 150000 1200000
1980 16510 15345 20000 160000 150000 1200000
1980 16510 82100 1400 11200 150000 1200000
1981 16510 26590 20000 300000 150000 2250000
1981 16510 15345 21000 315000 150000 2250000
1981 16510 82100 2400 36000 150000 2250000
1980 36110 26590 1000 16000 2000000 32000000
1980 36110 15345 20000 340000 2000000 34000000
1980 36310 82100 21000 378000 152000 2736000
end
In year 1980, stock 16510 was owned by 3 different institutional managers, and last column is the total number of shares outstanding of the firm, it is always greater than the sum across managers, because stocks are held by both individual investors and institutional investors.
What I need:
Identify each pair of managers (by managerid) that owns the same stock in the same year and calculate their joint ownership .
e.g. in year 1980 managerid 26590 and 15345 both own two stocks in common: stock 16510 and 36110
each manager holds 80,000 and 160,000 IN OWNEDVALUE for stock 16510 and 16000, and 340000 for stock 36110.
So for each pair of managers owning the same stock, I want to create a record, and records their respective ownership value.
Output data (want) - not a complete data output :
year manager_1 manager_2 stockid manager_1ownedvalue manager_2ownedvalue
1980 26590 15345 16510 80000 160000
1980 26590 15345 36110 16000 340000
1980 26590 82100 16510 80000 11200
1980 15345 82100 16510 160000 11200
I do not know if using joinby is a good idea because my data is huge as explained below:
My data cover 30 years, and all public listed firms, given that each stock could have many institutional managers, so the data set is very large. I break it down into years (e.g. 1980-1983, 1984-1986, )not to freeze my stata 13. 1980-1983 data has over 470,000 rows, if joinby, with a full Cartesian product, I am afraid, the memory requirement will freeze my computer.
I hope you could suggest an efficient way of doing this (sample coding is welcome !)
Thank you,
Rochelle
Comment