Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • construct a variable that measures the joint ownership between two firms

    Dear Statalisters,
    I have a big panel dataset that contains institutional owners ownership data. The basic data structure is

    input year stockid managerid sharesowned ownedvalue sharesout marketcap
    1980 16510 26590 10000 80000 150000 1200000
    1980 16510 15345 20000 160000 150000 1200000
    1980 16510 82100 1400 11200 150000 1200000
    1981 16510 26590 20000 300000 150000 2250000
    1981 16510 15345 21000 315000 150000 2250000
    1981 16510 82100 2400 36000 150000 2250000
    1980 36110 26590 1000 16000 2000000 32000000
    1980 36110 15345 20000 340000 2000000 34000000
    1980 36310 82100 21000 378000 152000 2736000
    end

    In year 1980, stock 16510 was owned by 3 different institutional managers, and last column is the total number of shares outstanding of the firm, it is always greater than the sum across managers, because stocks are held by both individual investors and institutional investors.

    What I need:
    Identify each pair of managers (by managerid) that owns the same stock in the same year and calculate their joint ownership .
    e.g. in year 1980 managerid 26590 and 15345 both own two stocks in common: stock 16510 and 36110
    each manager holds 80,000 and 160,000 IN OWNEDVALUE for stock 16510 and 16000, and 340000 for stock 36110.

    So for each pair of managers owning the same stock, I want to create a record, and records their respective ownership value.

    Output data (want) - not a complete data output :

    year manager_1 manager_2 stockid manager_1ownedvalue manager_2ownedvalue
    1980 26590 15345 16510 80000 160000
    1980 26590 15345 36110 16000 340000
    1980 26590 82100 16510 80000 11200
    1980 15345 82100 16510 160000 11200

    I do not know if using joinby is a good idea because my data is huge as explained below:

    My data cover 30 years, and all public listed firms, given that each stock could have many institutional managers, so the data set is very large. I break it down into years (e.g. 1980-1983, 1984-1986, )not to freeze my stata 13. 1980-1983 data has over 470,000 rows, if joinby, with a full Cartesian product, I am afraid, the memory requirement will freeze my computer.

    I hope you could suggest an efficient way of doing this (sample coding is welcome !)

    Thank you,
    Rochelle

  • #2
    How many managers per stock? To avoid a joinby, I am vaguely thinking of restructuring the data wide, with one line per stock per year, then putting it back out long. But not sure if this will really solve anything, and if you have a lot of managers, would be much worse than doing it long. You likely will need to simply break it down year by year and wait for joinby to finish up for each year.

    Comment


    • #3
      I think Rochelle is more worried that -joinby- will exceed available memory, time issues being secondary. She already has each year in a separate file. If those are too large for -joinby-, she could put each year/stock combination in a separate file and then do the -joinby-s on those. (Actually, if each file has just one firm/year combination, it will be -cross- rather than -joinby-.)

      Also, if memory is scarce, do drop the extraneous variables (in the example shown, sharesowned, sharesout, and marketcap) while doing these calculations. And be sure to -compress- each data file before you use it.
      Last edited by Clyde Schechter; 15 Mar 2015, 20:57.

      Comment


      • #4
        This is a pretty straightforward application of the joinby command:

        Code:
        clear
        input year stockid managerid sharesowned ownedvalue sharesout marketcap
        1980 16510 26590 10000 80000 150000 1200000
        1980 16510 15345 20000 160000 150000 1200000
        1980 16510 82100 1400 11200 150000 1200000
        1981 16510 26590 20000 300000 150000 2250000
        1981 16510 15345 21000 315000 150000 2250000
        1981 16510 82100 2400 36000 150000 2250000
        1980 36110 26590 1000 16000 2000000 32000000
        1980 36110 15345 20000 340000 2000000 34000000
        1980 36310 82100 21000 378000 152000 2736000
        end
        
        * verify assumptions about the data
        isid year stockid managerid, sort
        
        keep year stockid managerid ownedvalue
        rename (managerid ownedvalue) (managerid1 ownedvalue1)
        tempfile main
        save "`main'"
        
        rename (managerid1 ownedvalue1) (managerid2 ownedvalue2)
        joinby year stockid using "`main'"
        drop if managerid1 >= managerid2
        
        order year managerid1 managerid2 stockid ownedvalue1 ownedvalue2
        sort year stockid managerid1
        list, sepby(year stockid) noobs
        It's hard to say what will overload your computer but 470K observations (1980-1983) is not that much and joinby does not require 470K * 470K. It only forms pairwise combinations within groups that have the same year stockid values. If it's too big, you can do it in parts. Here's a replay of the example above but only for observations in 1980:

        Code:
        clear
        input year stockid managerid sharesowned ownedvalue sharesout marketcap
        1980 16510 26590 10000 80000 150000 1200000
        1980 16510 15345 20000 160000 150000 1200000
        1980 16510 82100 1400 11200 150000 1200000
        1981 16510 26590 20000 300000 150000 2250000
        1981 16510 15345 21000 315000 150000 2250000
        1981 16510 82100 2400 36000 150000 2250000
        1980 36110 26590 1000 16000 2000000 32000000
        1980 36110 15345 20000 340000 2000000 34000000
        1980 36310 82100 21000 378000 152000 2736000
        end
        
        keep if year == 1980
        
        * verify assumptions about the data
        isid year stockid managerid, sort
        
        keep year stockid managerid ownedvalue
        rename (managerid ownedvalue) (managerid1 ownedvalue1)
        tempfile main
        save "`main'"
        
        rename (managerid1 ownedvalue1) (managerid2 ownedvalue2)
        joinby year stockid using "`main'"
        drop if managerid1 >= managerid2
        
        order year managerid1 managerid2 stockid ownedvalue1 ownedvalue2
        sort year stockid managerid1
        list, sepby(year stockid) noobs

        Comment


        • #5
          @ Clyde , Thank you ! could you explain why I should be using Cross if each file has just one firm year. Do you mean if my data is saved for just 1980, but it still could have many firms?

          @ Robert,Thank you ! I ran your code for my 470k observations , it took 2 minutes. I am very happy to see it took so little time.

          one question:

          drop if managerid1 >= managerid2

          is this used to drop duplicates ? is this a common statement to use when we do a cross which is a full Cartesian ?

          Best,
          Rochelle

          Comment


          • #6
            Originally posted by Rochelle Zhang View Post
            one question:

            drop if managerid1 >= managerid2

            is this used to drop duplicates ? is this a common statement to use when we do a cross which is a full Cartesian ?
            Just comment the line and see why it's needed.

            Comment


            • #7
              OK. The key step in Robert's code is the line

              Code:
              joinby year stockid using "`main'"
              Now, year and stockid are linking variables that -joinby- uses: it restricts the result to combinations of observations that have the same values of year and stockid. That's why you don't get a full Cartesian product of a set of 470,000 observations with itself. And I agree with Robert's advice that you should at least try doing it his way first: depending on how many different years and stockid's there are, you may not hit any memory problem at all--and if so, this is the simplest way to do it.

              But if you do run up against memory limits, then you can still do it by breaking up your data set into many data sets, each representing all and only the observations on a single combination of year and stockid. (You will have to build a loop over the levels of year and stock id to then do the combining and to finally append all the results together.) If you do that, the restriction that we only combine observations having the same value of year and stockid becomes a vacuous condition: this will be true of all the observations in your data set. So for these small subsets what you want is the full Cartesian product, which you get with -cross-.

              Now, you can still do it as -joinby year stockid- on these sets if you want: Stata won't complain and you will get the same result in the end. It's just that -cross-, I think, will be faster in this situation.


              Comment


              • #8
                Many Thanks to Robert and Clyde !!! The code worked well. I also understand the logit now.

                Best,
                Rochelle

                Comment

                Working...
                X