construct a variable that measures the joint ownership between two firms

Rochelle Zhang

Join Date: Jun 2025

Posts: 0
#1

construct a variable that measures the joint ownership between two firms

15 Mar 2015, 18:01

Dear Statalisters,
I have a big panel dataset that contains institutional owners ownership data. The basic data structure is

input year stockid managerid sharesowned ownedvalue sharesout marketcap
1980 16510 26590 10000 80000 150000 1200000
1980 16510 15345 20000 160000 150000 1200000
1980 16510 82100 1400 11200 150000 1200000
1981 16510 26590 20000 300000 150000 2250000
1981 16510 15345 21000 315000 150000 2250000
1981 16510 82100 2400 36000 150000 2250000
1980 36110 26590 1000 16000 2000000 32000000
1980 36110 15345 20000 340000 2000000 34000000
1980 36310 82100 21000 378000 152000 2736000
end

In year 1980, stock 16510 was owned by 3 different institutional managers, and last column is the total number of shares outstanding of the firm, it is always greater than the sum across managers, because stocks are held by both individual investors and institutional investors.

What I need:
Identify each pair of managers (by managerid) that owns the same stock in the same year and calculate their joint ownership .
e.g. in year 1980 managerid 26590 and 15345 both own two stocks in common: stock 16510 and 36110
each manager holds 80,000 and 160,000 IN OWNEDVALUE for stock 16510 and 16000, and 340000 for stock 36110.

So for each pair of managers owning the same stock, I want to create a record, and records their respective ownership value.

Output data (want) - not a complete data output :

year manager_1 manager_2 stockid manager_1ownedvalue manager_2ownedvalue
1980 26590 15345 16510 80000 160000
1980 26590 15345 36110 16000 340000
1980 26590 82100 16510 80000 11200
1980 15345 82100 16510 160000 11200

I do not know if using joinby is a good idea because my data is huge as explained below:

My data cover 30 years, and all public listed firms, given that each stock could have many institutional managers, so the data set is very large. I break it down into years (e.g. 1980-1983, 1984-1986, )not to freeze my stata 13. 1980-1983 data has over 470,000 rows, if joinby, with a full Cartesian product, I am afraid, the memory requirement will freeze my computer.

I hope you could suggest an efficient way of doing this (sample coding is welcome !)

Thank you,
Rochelle
Tags: None
ben earnhart

Join Date: May 2014

Posts: 1027
#2

15 Mar 2015, 18:38

How many managers per stock? To avoid a joinby, I am vaguely thinking of restructuring the data wide, with one line per stock per year, then putting it back out long. But not sure if this will really solve anything, and if you have a lot of managers, would be much worse than doing it long. You likely will need to simply break it down year by year and wait for joinby to finish up for each year.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30060
#3

15 Mar 2015, 20:52

I think Rochelle is more worried that -joinby- will exceed available memory, time issues being secondary. She already has each year in a separate file. If those are too large for -joinby-, she could put each year/stock combination in a separate file and then do the -joinby-s on those. (Actually, if each file has just one firm/year combination, it will be -cross- rather than -joinby-.)

Also, if memory is scarce, do drop the extraneous variables (in the example shown, sharesowned, sharesout, and marketcap) while doing these calculations. And be sure to -compress- each data file before you use it.

Last edited by Clyde Schechter; 15 Mar 2015, 20:57.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

15 Mar 2015, 21:03

This is a pretty straightforward application of the joinby command:

Code:

clear
input year stockid managerid sharesowned ownedvalue sharesout marketcap
1980 16510 26590 10000 80000 150000 1200000
1980 16510 15345 20000 160000 150000 1200000
1980 16510 82100 1400 11200 150000 1200000
1981 16510 26590 20000 300000 150000 2250000
1981 16510 15345 21000 315000 150000 2250000
1981 16510 82100 2400 36000 150000 2250000
1980 36110 26590 1000 16000 2000000 32000000
1980 36110 15345 20000 340000 2000000 34000000
1980 36310 82100 21000 378000 152000 2736000
end

* verify assumptions about the data
isid year stockid managerid, sort

keep year stockid managerid ownedvalue
rename (managerid ownedvalue) (managerid1 ownedvalue1)
tempfile main
save "`main'"

rename (managerid1 ownedvalue1) (managerid2 ownedvalue2)
joinby year stockid using "`main'"
drop if managerid1 >= managerid2

order year managerid1 managerid2 stockid ownedvalue1 ownedvalue2
sort year stockid managerid1
list, sepby(year stockid) noobs

It's hard to say what will overload your computer but 470K observations (1980-1983) is not that much and joinby does not require 470K * 470K. It only forms pairwise combinations within groups that have the same year stockid values. If it's too big, you can do it in parts. Here's a replay of the example above but only for observations in 1980:

Code:

clear
input year stockid managerid sharesowned ownedvalue sharesout marketcap
1980 16510 26590 10000 80000 150000 1200000
1980 16510 15345 20000 160000 150000 1200000
1980 16510 82100 1400 11200 150000 1200000
1981 16510 26590 20000 300000 150000 2250000
1981 16510 15345 21000 315000 150000 2250000
1981 16510 82100 2400 36000 150000 2250000
1980 36110 26590 1000 16000 2000000 32000000
1980 36110 15345 20000 340000 2000000 34000000
1980 36310 82100 21000 378000 152000 2736000
end

keep if year == 1980

* verify assumptions about the data
isid year stockid managerid, sort

keep year stockid managerid ownedvalue
rename (managerid ownedvalue) (managerid1 ownedvalue1)
tempfile main
save "`main'"

rename (managerid1 ownedvalue1) (managerid2 ownedvalue2)
joinby year stockid using "`main'"
drop if managerid1 >= managerid2

order year managerid1 managerid2 stockid ownedvalue1 ownedvalue2
sort year stockid managerid1
list, sepby(year stockid) noobs

Comment

Rochelle Zhang

Join Date: Jun 2025

Posts: 0
#5

16 Mar 2015, 08:46

@ Clyde , Thank you ! could you explain why I should be using Cross if each file has just one firm year. Do you mean if my data is saved for just 1980, but it still could have many firms?

@ Robert,Thank you ! I ran your code for my 470k observations , it took 2 minutes. I am very happy to see it took so little time.

one question:

drop if managerid1 >= managerid2

is this used to drop duplicates ? is this a common statement to use when we do a cross which is a full Cartesian ?

Best,
Rochelle
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#6

16 Mar 2015, 09:00

Originally posted by Rochelle Zhang View Post

one question:

drop if managerid1 >= managerid2

is this used to drop duplicates ? is this a common statement to use when we do a cross which is a full Cartesian ?

Just comment the line and see why it's needed.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30060
#7

16 Mar 2015, 10:20

OK. The key step in Robert's code is the line

Code:

joinby year stockid using "`main'"

Now, year and stockid are linking variables that -joinby- uses: it restricts the result to combinations of observations that have the same values of year and stockid. That's why you don't get a full Cartesian product of a set of 470,000 observations with itself. And I agree with Robert's advice that you should at least try doing it his way first: depending on how many different years and stockid's there are, you may not hit any memory problem at all--and if so, this is the simplest way to do it.

But if you do run up against memory limits, then you can still do it by breaking up your data set into many data sets, each representing all and only the observations on a single combination of year and stockid. (You will have to build a loop over the levels of year and stock id to then do the combining and to finally append all the results together.) If you do that, the restriction that we only combine observations having the same value of year and stockid becomes a vacuous condition: this will be true of all the observations in your data set. So for these small subsets what you want is the full Cartesian product, which you get with -cross-.

Now, you can still do it as -joinby year stockid- on these sets if you want: Stata won't complain and you will get the same result in the end. It's just that -cross-, I think, will be faster in this situation.
Comment
Rochelle Zhang

Join Date: Jun 2025

Posts: 0
#8

20 Mar 2015, 21:33

Many Thanks to Robert and Clyde !!! The code worked well. I also understand the logit now.

Best,
Rochelle
Comment

Announcement

construct a variable that measures the joint ownership between two firms

Comment

Comment

Comment

Comment

Comment

Comment

Comment