Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Portfolio construction

    Hello,

    I have data of individual investors trading behavior. Below you see an abstract of my data. My data is sorted by investor and date. I have an investor identifier, I have dates when the investor sold or bought a stock. The variable "stock" indicates the amount of stocks purchased (negative value indicates a sale). Furthermore, I have a variable indicating the price of the stock and one variable indicating the volume. Additionally, I have a stock identifier (cusip) and the indicator variable "sell" which is equal to one if a sale take place.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long investor float(dates stocks price volume) str8 cusip float sell
    2290 13366     100   54.375   5437.5 "06071610" 0
    2290 13382     200       21     4200 "86707110" 0
    2290 13408     200       25     5000 "17958410" 0
    2290 13437     500       10     5000 "80850988" 0
    3206 11501    1000     2.25     2250 "02365010" 0
    3206 11967 152.486  32.7899     5000 "92190830" 0
    3206 11967 452.489 11.04999     5000 "77957Q10" 0
    3206 12404   -1000   1.0312  -1031.2 "02365010" 1
    3206 13457 257.732     19.4     5000 "08188080" 0
    6002 12058     100   18.625   1862.5 "81137170" 0
    6002 12064     200     5.25     1050 "55261810" 0
    6002 12075    1000    1.125     1125 "50221010" 0
    6002 12088    -100    23.75    -2375 "81137170" 1
    6002 12092     100    8.625    862.5 "81090510" 0
    6002 12177     100     16.5     1650 "02312710" 0
    6002 12221     200     3.25      650 "59373610" 0
    6002 12257    -100    23.75    -2375 "02312710" 1
    6002 12263     100   11.625   1162.5 "55917720" 0
    6002 12435     100    6.875    687.5 "57679810" 0
    6002 12498    -100     16.5    -1650 "55917720" 1
    6002 12528     200      5.5     1100 "55261810" 0
    6002 12667 226.655 11.02998     2500 "77064830" 0
    6002 13137    -100    11.25    -1125 "81090510" 1
    6002 13156 116.279 10.32001     1200 "77064887" 0
    6002 13276    -200    1.625     -325 "59373610" 1
    6002 13346  62.814 11.94001      750 "77064887" 0
    6002 13356     200    4.875      975 "41025210" 0
    6002 13356  53.433 36.70017     1961 "62838030" 0
    6002 13402     100    15.75     1575 "64121010" 0
    6002 13408     200   4.5625    912.5 "87990540" 0
    6002 13426     200     2.75      550 "55403410" 0
    6002 13439      50    19.75    987.5 "07251010" 0
    6002 13460     100     1.75      175 "87990540" 0
    7109 11340     100    27.25     2725 "20810810" 0
    7109 11353    -100    27.25    -2725 "20810810" 1
    7109 11396     100    22.25     2225 "00190710" 0
    7109 11416     100       24     2400 "20810810" 0
    7109 11548     100     27.5     2750 "25283630" 0
    7109 11553     100    21.75     2175 "01849010" 0
    7109 11728     100   19.625   1962.5 "59408710" 0
    7109 11926     100       33     3300 "25084710" 0
    7109 12074     100       27     2700 "87156510" 0
    7109 12087    -100     22.5    -2250 "01849010" 1
    7109 12087    -100   23.375  -2337.5 "20810810" 1
    7109 12095      50   48.375  2418.75 "38131710" 0
    7109 12134      50     65.5     3275 "10904310" 0
    7109 12137    -100   12.125  -1212.5 "25283630" 1
    7109 12163     -50   44.125 -2206.25 "38131710" 1
    7109 12179     100     28.5     2850 "53567810" 0
    7109 12228     100   20.875   2087.5 "09367610" 0
    7109 12282     100     21.5     2150 "72387710" 0
    7109 12290     100   40.875   4087.5 "50025510" 0
    7109 12319      50   36.625  1831.25 "59408710" 0
    7109 12362      50   30.875  1543.75 "25084710" 0
    7109 12362     100   26.125   2612.5 "22237210" 0
    7109 12380     100    20.75     2075 "03785710" 0
    7109 12395    -150    32.25  -4837.5 "59408710" 1
    7109 12403    -100     23.5    -2350 "00190710" 1
    7109 12409    -100   46.625  -4662.5 "50025510" 1
    7109 12418    -100    18.75    -1875 "72387710" 1
    7109 12430     100    28.75     2875 "44984210" 0
    7109 12445     100       32     3200 "00157530" 0
    7109 12472     100    27.25     2725 "80818810" 0
    7109 12472    -100    24.75    -2475 "09367610" 1
    7109 12479     100       21     2100 "86693310" 0
    7109 12514    -100   23.625  -2362.5 "22237210" 1
    7109 12514    -100   34.375  -3437.5 "00157530" 1
    7109 12514    -100    27.75    -2775 "80818810" 1
    7109 12515    -100   22.375  -2237.5 "86693310" 1
    7109 12516    -100    25.75    -2575 "03785710" 1
    7109 12562     200   13.875     2775 "86881810" 0
    7109 12575    -100     46.5    -4650 "53567810" 1
    7109 12589     -50     72.5    -3625 "10904310" 1
    7109 12607    -150   25.125 -3768.75 "25084710" 1
    7109 12627    -200   14.625    -2925 "86881810" 1
    7109 12661    -100       19    -1900 "87156510" 1
    7109 12697     150     16.5     2475 "04890310" 0
    7109 12697      50   68.375  3418.75 "10904310" 0
    7109 12697     100    20.25     2025 "77136710" 0
    7109 12738      50       51     2550 "53567810" 0
    7109 12817     -50     51.5    -2575 "53567810" 1
    7109 12907     200     18.5     3700 "86693310" 0
    7109 13031     100       44     4400 "40621610" 0
    7109 13234     100   40.375   4037.5 "53567810" 0
    7109 13269     100    26.25     2625 "03785710" 0
    7109 13319 251.678     5.96     1500 "48625010" 0
    7109 13355     200    13.75     2750 "23281510" 0
    7109 13443      50   62.875  3143.75 "12692010" 0
    7109 13450     100    37.25     3725 "88355610" 0
    7109 13464    -100       13    -1300 "44984210" 1
    7109 13467     250    8.625  2156.25 "92551410" 0
    7348 12466     100    19.25     1925 "74143W10" 0
    7348 12768     100    13.75     1375 "74143W10" 0
    8774 11947     100   29.875   2987.5 "00282410" 0
    8774 11948 211.685    11.81     2500 "31606120" 0
    9003 11372    1000     4.75     4750 "35671310" 0
    9003 11430     300    11.75     3525 "09954120" 0
    9003 11612    -300       14    -4200 "09954120" 1
    9003 12337    1000    3.125     3125 "35671310" 0
    9003 12991   -2000    5.125   -10250 "35671310" 1
    end
    format %td dates

    Now I want to create investor-stock-portfolios. My analysis examines the portfolio of stocks that an investor could sell on each day that they do sell at least one position. Therefore, I already marked all dates where an investor sold a stock using the indicator variable "sell". Now, I want to create an observation for each position held by an investor on these sell dates.
    For example, investor 6002: in line 13 he sells one stock holding. This investor currently holds 3 stocks (including the one he will now sell) and now I want to add 2 additional observations for the other two stocks that he holds for this sell date, that means for investor 6002 on this sell date I want three observations.

    I would appreciate any help!

  • #2
    Perhaps something like this?
    Code:
    fillin dates investor cusip
    replace sell = 0 if missing(sell)
    
    bys dates investor: egen _selldate = max(sell)
    keep if _fillin == 0 | (_fillin == 1 & _selldate == 1)
    
    bys investor cusip (dates): gen last_holding = sum(stocks[_n-1])
    keep if _fillin == 0 | last_holding!=0
     
    sort investor dates cusip
    drop _fillin _selldate
    Here is part of what is produced:

    Code:
    . li if investor<7000 & dates<td(01jan1994), noobs sepby(investor)
    
      +----------------------------------------------------------------------------------+
      | investor       dates    stocks      price    volume      cusip   sell   last_h~g |
      |----------------------------------------------------------------------------------|
      |     3206   28jun1991      1000       2.25      2250   02365010      0          0 |
      |     3206   06oct1992   452.489   11.04999      5000   77957Q10      0          0 |
      |     3206   06oct1992   152.486    32.7899      5000   92190830      0          0 |
      |     3206   17dec1993     -1000     1.0312   -1031.2   02365010      1       1000 |
      |     3206   17dec1993         .          .         .   77957Q10      0    452.489 |
      |     3206   17dec1993         .          .         .   92190830      0    152.486 |
      |----------------------------------------------------------------------------------|
      |     6002   05jan1993       100     18.625    1862.5   81137170      0          0 |
      |     6002   11jan1993       200       5.25      1050   55261810      0          0 |
      |     6002   22jan1993      1000      1.125      1125   50221010      0          0 |
      |     6002   04feb1993         .          .         .   50221010      0       1000 |
      |     6002   04feb1993         .          .         .   55261810      0        200 |
      |     6002   04feb1993      -100      23.75     -2375   81137170      1        100 |
      |     6002   08feb1993       100      8.625     862.5   81090510      0          0 |
      |     6002   04may1993       100       16.5      1650   02312710      0          0 |
      |     6002   17jun1993       200       3.25       650   59373610      0          0 |
      |     6002   23jul1993      -100      23.75     -2375   02312710      1        100 |
      |     6002   23jul1993         .          .         .   50221010      0       1000 |
      |     6002   23jul1993         .          .         .   55261810      0        200 |
      |     6002   23jul1993         .          .         .   59373610      0        200 |
      |     6002   23jul1993         .          .         .   81090510      0        100 |
      |     6002   29jul1993       100     11.625    1162.5   55917720      0          0 |
      +----------------------------------------------------------------------------------+
    Last edited by Hemanshu Kumar; 29 Oct 2022, 02:58.

    Comment


    • #3
      Thank you for the fast response!

      But the first code
      Code:
      fillin dates investor cusip
      already does not work. I got the error message "I/O error writing .dta file. Usually such I/O errors are caused by the disk or file system being full." I think that my dataset is too large as I have nearly 2 million observations...
      But the output you show is what I want to accomplish: Having at each sell day listed what other stocks the investor currently holds.

      Comment


      • #4
        Here's an attempt to achieve the same thing one investor at a time. Does this work without crashing Stata?

        Code:
        preserve
            clear
            tempfile portfolio_data
            save `portfolio_data', emptyok replace
        restore
        
        egen long inv_id = group(investor)
        sum inv_id, meanonly
        local num_investors = r(max)
        
        
        forval i=1/`num_investors' {
            preserve
                keep if inv_id == `i'
                fillin investor dates cusip
                replace sell = 0 if missing(sell)
        
                bys dates: egen _selldate = max(sell)
                keep if _fillin == 0 | (_fillin == 1 & _selldate == 1)
        
                bys cusip (dates): gen last_holding = sum(stocks[_n-1])
                keep if _fillin == 0 | last_holding!=0
                drop _fillin _selldate inv_id
            
                append using `portfolio_data'
                save `portfolio_data', replace
            restore
        }
        
        use `portfolio_data', clear
        sort investor dates cusip

        Comment


        • #5
          Could you maybe explain your code?
          And should the tempfile have the same name as my current dta file? I am not familiar with tempfile thereofore I am a bit confused

          Comment


          • #6
            The first block within preserve .. restore simply creates an empty temporary file. The tempfile command just tells Stata that we will use a temporary file (which gets deleted automatically when the code finishes running) and refer to it using the local macro portfolio_data. This local macro can take any legal macro name, it has nothing to do with the actual file which contains your dataset.

            Then we create an identifier for each investor in inv_id, which will have numbers like 1,2,3... this is to make it easier to loop over the investors using forval.

            Then the loop goes over basically the same code I created for your earlier, each time extracting the information for one investor, and then appending the results to our temporary file.


            Comment


            • #7
              Thank you, I understand it now!

              Could you maybe also explain the code you used in the loop? That would help a lot!

              Comment


              • #8
                Within the loop:
                • I use fillin to create the extra observations: we make sure that there is an observation for every cusip on each date (we will need this particularly for a date on which a sale occurs)
                • I use the egen function max to flag all observations for the date on which a sale occurred. We then retain only the original observations (for which _fillin is 0), or the observations for all stocks on the date on which a sale occurred.
                • Then for each cusip, I sort the observations by date, and create a cumulative sum of all stock transactions till the last date, to give us the last_holding. Finally, for the sale dates, we only retain observations for those stocks for which these holdings were non-zero.

                Comment


                • #9
                  I have a follow-up question. I know do not restrict my data to a small sample but want to create a portfolio for the entire dataset. I was therefore running the code with the whole dataset and I started the code more than 24 hours and it is still running with no end in sight. Is there any possibility to make the code more efficient so that it does not run as long?

                  Comment

                  Working...
                  X