Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems calculating cumulative volumes

    Dear statalisters,

    I have trouble calculating cumulative volumes. The data looks the following:
    account_number security_number Volume
    1 10 100
    1 10 100
    1 10 -250
    1 10 100
    1 11 150
    1 11 -50
    1 11 100
    1 12 300
    1 12 -300
    I want to calculate the cumulative volume for each secuity in account_number 1. The easiest way to do so is to use sum(), however, the cumulative volume should not drop below zero. The ideal cumvolume variable should look like this:
    account_number security_number Volume Cumvolume
    1 10 100 100
    1 10 100 200
    1 10 -250 0
    1 10 100 100
    1 11 150 150
    1 11 -50 100
    1 11 100 200
    1 12 300 300
    1 12 -300 0
    I tried to use the if/else command but I am not able to solve my problem.
    Does anyone know how to calculate the cumvolume such that it does not drop below zero?

    Thank you all in advance,
    Sabine

  • #2
    Welcome to Statalist, Sabine.

    This code should do what you want. The answers depend on your data being in the order you show it. If there is another, unshown variable that sorts your data within account_number and security_number into the correct order, you can replace "time" in my code with that variable.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(account_number security_number Volume)
    1 10  100
    1 10  100
    1 10 -250
    1 10  100
    1 11  150
    1 11  -50
    1 11  100
    1 12  300
    1 12 -300
    end
    generate time = _n
    generate Cumvolume = .
    by account_number security_number (time), sort: replace Cumvolume = max(0,Volume) if _n==1
    by account_number security_number (time): replace Cumvolume = max(0,Volume+Cumvolume[_n-1]) if _n>1
    drop time
    Code:
    . list, clean abbreviate(16)
    
           account_number   security_number   Volume   Cumvolume  
      1.                1                10      100         100  
      2.                1                10      100         200  
      3.                1                10     -250           0  
      4.                1                10      100         100  
      5.                1                11      150         150  
      6.                1                11      -50         100  
      7.                1                11      100         200  
      8.                1                12      300         300  
      9.                1                12     -300           0

    Comment


    • #3
      I tried, and failed, to come up with a solution that did not involve looping over observations. Since I am surrendering to looping over observations, I have chosen to do so, on the assumption that your real data set is very large, I have done it in a way that uses local macros to "point" to observations and avoids doing things using -if- qualifiers that would slow things down.

      Code:
      clear
      input account_number    security_number    Volume
      1    10    100
      1    10    100
      1    10    -250
      1    10    100
      1    11    150
      1    11    -50
      1    11    100
      1    12    300
      1    12    -300
      end
      gen long sort_order = _n // MARK THE SORT ORDER IN CASE WE NEED IT LATER
      
      egen long block_num = group(account_number security_number)
      by block_num (sort_order), sort: gen long block_size = _N
      
      count
      local n_obs = r(N)
      
      local first = 1
      gen cum_volume = .
      
      while `first' <= `n_obs' {
           local next = `first' + block_size[`first']
           scalar sum = 0
           local i = `first'
           while `i' < `next' {
              scalar sum = sum + Volume[`i']
              if sum < 0 {
                  scalar sum = 0
              }
              replace cum_volume = sum in `i'
              local ++i
          }
          local first = `next'
      }
      Added: Crossed with William's solution, which is much better.

      Comment


      • #4
        Thanks a lot to both of you! It works perfect now!

        Comment

        Working...
        X