Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running total updating latest value by group, financial data

    Hi all,

    I have a dataset on stock short positions by country and position holder. The holders short a stock that is uniquely identified by Issuer or ISIN, but there are multiple holders for some stocks.
    I first sort the dataset using,

    Code:
    sort country issuer isin pos_date
    I want to create a new column which has a running total of short positions by country, issuer/isin.
    The rule is that for a date, the total is the position taken by the holder on that day and the sum of the latest positions taken by other holders.

    I created an example dataset below due to confidentiality and manually added the last two columns to show how the rules work - for example, Row 6 has holder C's position plus A and B's latest positions. If the same holder reports a position at the next date, then we update the total with the new value..

    The same holders A,B,C can also short other stocks X,Y etc. so the total also needs to work within the group issuer, sorted by date.
    There are different numbers of holders and issuers within a country.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str7 country str1(position_holder issuer) int isin float net_short_position str9 pos_date str13 calculation float new_column
    "Austria" "A" "X" 123 .55 "1-Jan-17"  "0.55"           .55
    "Austria" "A" "X" 123 .49 "2-Jan-17"  "0.49"           .49
    "Austria" "B" "X" 123 .52 "2-Jan-17"  "0.52+0.49"     1.01
    "Austria" "B" "X" 123 .62 "3-Jan-17"  "0.62+0.49"     1.11
    "Austria" "B" "X" 123 .71 "5-Jan-17"  "0.71+0.49"      1.2
    "Austria" "C" "X" 123  .5 "6-Jan-17"  "0.5+0.71+0.49"  1.7
    "Austria" "C" "X" 123  .6 "7-Jan-17"  "0.6+0.71+0.49"  1.8
    "Austria" "B" "X" 123 .75 "8-Jan-17"  "0.75+0.6+0.49" 1.84
    "Austria" "A" "X" 123 .55 "9-Jan-17"  "0.55+0.75+0.6"  1.9
    "Austria" "C" "X" 123  .8 "10-Jan-17" "0.8+0.55+0.75"  2.1
    "Austria" "A" "Y" 456  .5 "1-Jan-17"  "0.5"             .5
    "Austria" "B" "Y" 456 .58 "2-Jan-17"  "0.58+0.5"      1.08
    "Austria" "B" "Y" 456  .7 "3-Jan-17"  "0.7+0.5"        1.2
    "Austria" "C" "Y" 456  .6 "4-Jan-17"  "0.6+0.7+0.5"    1.8
    "Belgium" "D" "Z" 789 .48 "1-Jan-17"  "0.48"           .48
    "Belgium" "D" "Z" 789  .4 "2-Jan-17"  "0.4"             .4
    "Belgium" "E" "Z" 789 .55 "3-Jan-17"  "0.55+0.4"       .95
    "Belgium" "E" "Z" 789 .62 "4-Jan-17"  "0.62+0.4"      1.02
    "Belgium" "F" "Z" 789 .67 "5-Jan-17"  "0.67+0.62+0.4" 1.69
    "Belgium" "E" "Z" 789 .59 "6-Jan-17"  "0.59+0.67+0.4" 1.66
    end

    I am very new to Stata so any help would be appreciated,
    many thanks!
    Last edited by Prateek Jagadeesh; 12 Aug 2021, 07:48.

  • #2
    Stata's sum() is a running sum, so it does exactly what you want. First make sure to generate a Stata date so you can sort on it correctly.
    Code:
    gen date = date(pos_date,"DMY",2050)
    bys country issuer (date): gen cumul_pos = sum(net_short_position)

    Comment


    • #3
      Hi Maria thanks for the reply, I generated the date. The sum() function doesn't work here because for a given date, I need the sum of the latest short position for each holder.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str7 country str1(position_holder issuer) int isin float net_short_position str13 calculation float(new_column pos_date cumul_pos)
      "Austria" "A" "X" 123 .55 "0.55"           .55 20820       .55
      "Austria" "A" "X" 123 .49 "0.49"           .49 20821      1.04
      "Austria" "B" "X" 123 .52 "0.52+0.49"     1.01 20821      1.56
      "Austria" "B" "X" 123 .62 "0.62+0.49"     1.11 20822      2.18
      "Austria" "B" "X" 123 .71 "0.71+0.49"      1.2 20824      2.89
      "Austria" "C" "X" 123  .5 "0.5+0.71+0.49"  1.7 20825      3.39
      "Austria" "C" "X" 123  .6 "0.6+0.71+0.49"  1.8 20826      3.99
      end
      format %td pos_date

      In the example, as of 7-Jan (last row) the column total should be the sum of A for 2-Jan, B for 5-Jan and C for 7-Jan. Is there a way to sum only the latest observation by group position_holder?
      Last edited by Prateek Jagadeesh; 12 Aug 2021, 09:32.

      Comment


      • #4
        This was an interesting challenge. Here is my solution, note that your original obs 2 in #1:
        "Austria" "A" "X" 123 .49 "2-Jan-17" "0.49" .49
        is lost because you have another holder on the same day that makes the cumulative for the day higher. If you want to keep these "intermediate" obs you would have to mark them with a dummy var to_keep and then add that to the duplcates drop line
        The trick here is to use
        Code:
        tsfill, full
        to get to what you want.
        Code:
        clear
        input str7 country str1(position_holder issuer) int isin float net_short_position str9 pos_date str13 calculation float new_column
        "Austria" "A" "X" 123 .55 "1-Jan-17"  "0.55"           .55
        "Austria" "A" "X" 123 .49 "2-Jan-17"  "0.49"           .49
        "Austria" "B" "X" 123 .52 "2-Jan-17"  "0.52+0.49"     1.01
        "Austria" "B" "X" 123 .62 "3-Jan-17"  "0.62+0.49"     1.11
        "Austria" "B" "X" 123 .71 "5-Jan-17"  "0.71+0.49"      1.2
        "Austria" "C" "X" 123  .5 "6-Jan-17"  "0.5+0.71+0.49"  1.7
        "Austria" "C" "X" 123  .6 "7-Jan-17"  "0.6+0.71+0.49"  1.8
        "Austria" "B" "X" 123 .75 "8-Jan-17"  "0.75+0.6+0.49" 1.84
        "Austria" "A" "X" 123 .55 "9-Jan-17"  "0.55+0.75+0.6"  1.9
        "Austria" "C" "X" 123  .8 "10-Jan-17" "0.8+0.55+0.75"  2.1
        "Austria" "A" "Y" 456  .5 "1-Jan-17"  "0.5"             .5
        "Austria" "B" "Y" 456 .58 "2-Jan-17"  "0.58+0.5"      1.08
        "Austria" "B" "Y" 456  .7 "3-Jan-17"  "0.7+0.5"        1.2
        "Austria" "C" "Y" 456  .6 "4-Jan-17"  "0.6+0.7+0.5"    1.8
        "Belgium" "D" "Z" 789 .48 "1-Jan-17"  "0.48"           .48
        "Belgium" "D" "Z" 789  .4 "2-Jan-17"  "0.4"             .4
        "Belgium" "E" "Z" 789 .55 "3-Jan-17"  "0.55+0.4"       .95
        "Belgium" "E" "Z" 789 .62 "4-Jan-17"  "0.62+0.4"      1.02
        "Belgium" "F" "Z" 789 .67 "5-Jan-17"  "0.67+0.62+0.4" 1.69
        "Belgium" "E" "Z" 789 .59 "6-Jan-17"  "0.59+0.67+0.4" 1.66
        end
        gen date = date(pos_date,"DMY",2050)
        format %td date
        egen group = group(country issuer position_holder)
        xtset group date
        tsfill, full
        foreach v of varlist country issuer position_holder  {
        bys group (date): replace `v' = `v'[_n-1] if `v' == ""
        }
        *
        foreach v of varlist net_short_position  {
        bys group (date): replace `v' = `v'[_n-1] if `v' == .
        }
        *
        bys country issuer date (position_holder): gen cumul_pos = sum(net_short_position)
        loc v new_column
        bys country issuer date (cumul_pos): replace `v' = `v'[_n-1] if `v' == .
        gsort country issuer date - cumul_pos
        duplicates drop country issuer date, force
        drop if cumul_pos == 0 | new_column == .

        Comment


        • #5
          That works perfectly, thank you so much Maria!

          Comment

          Working...
          X