Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Discounted cumulative sum with uneven time spaces

    Dear Stata Listers,

    I am working with a (psuedo) panel dataset of firm acquisitions, trying to calculate a recency weighted cumulative 'experience' variable (i.e., sum of acquisitions). My problem is similar to another that was posted & answered last year (see attached link), except that the time between each observation is not evenly spaced in my dataset. Rather, I need to include the number of days since the last acquisition in the discounting equation. I've tried substituting that into the macro provided in the link, but it doesn't work. My data looks something like this:


    acquirorid year time date days days_cum fullacq exp
    1 1999 1 5/5/99 0 0 1
    1 2002 2 7/24/02 1176 1176 1
    1 2003 3 1/30/03 145 1321 1
    2 1994 1 2/23/94 0 0 1
    2 1994 2 6/3/94 155 155 1
    2 1995 3 4/13/95 393 548 1
    2 1996 4 1/5/96 246 794 1
    2 1996 5 11/1/96 369 1163 1
    2 1997 6 4/1/97 160 1323 1
    2 1997 7 7/2/97 106 1429 1

    days = the number of days since the last acquisition
    days_cum = the number of days since the first acquisition
    fullacq = full acquisitions (I include this as the variable that will need to be summed)

    I need to discount each observation by the square root of the number of days between any given acquisition and the current time period. So for example, for Acquiror 1:

    exp (at time 1) = 1
    exp (at time 2) = 1 + 1/sqrt(1176)
    exp (at time 3) = 1 + 1/sqrt(145) + 1/sqrt(145+1176)

    Any help would be greatly appreciated.

    All the best,

    Joseph Harrison
    PhD Student
    Texas A&M University
    [email protected]

  • #2
    The link is just to StataCorp's website. No doubt you meant something more specific. They are just down the road from you any way, not that that helps here.
    Last edited by Nick Cox; 08 Oct 2014, 11:58.

    Comment


    • #3
      Yeah, it gave me an error as I was posting and must not have been able to pick up the link. I'll try to just add it directly here:

      http://www.stata.com/statalist/archi.../msg00533.html

      Comment


      • #4
        I think you want something like this (assuming that there are never gaps in the time variable and that time always starts at 1):
        Code:
        gen exp = 1 if time == 1
        by acquirorid (time), sort: replace exp = exp[_n-1] + 1/sqrt(days_cum) if _n > 1

        Comment


        • #5
          That's a scary problem computationally. See especially http://www.timberlake.co.uk/common/pdf/uk14_gould.pdf around slide 34.

          Also use a double to hold results.

          Comment


          • #6
            Thanks for the feedback.

            Nick, I appreciate the comment about using "double", I'll make sure to incorporate that when I find a solution. I'm not quite sure how the slide deck helps, though?

            Clyde, unfortunately it's not that simple. Past the second time point that formula doesn't work because I'm not just adding a new discounted term to the value of my exp variable at (t-1). Rather, the current time period (t) becomes the base where exp=1 and each of the previous acquisitions needs to be discounted by the number of days since it's completion before adding it to the exp variable. So it's a moving target.

            To put it another way, I essentially need to add the number of days between (t) and (t-1) to EACH of the discounting terms preceding (t). Going back to my example, notice that 145 is added to the discount for both the second and third terms. If I were to have a fourth acquisition, say, 50 days after the acquisition at t=3, I would then need to add a new "1" at the front of the equation and then add 50 to the discounting factor for each of the pre-existing terms. so I would get:

            exp (at time 4) = 1 + 1/sqrt(50) + 1/sqrt(50+145) + 1/sqrt(50+145+1176)

            For this new equation, the first term is associated with the acquisition at t=4, the second with t=3, the third with t=2, and the fourth with t=1.

            In summary, for any given time point, the first term is always "1" (for t=n) and the last term is always "1/sqrt(days_cum)" (for t=1), but anything in between changes by (t) - (t-1). Does that make sense? I'm fairly certain I'll need to use macros to solve this rather than just a 'replace' command, but it still has me stumped...

            Comment


            • #7
              Bill Gould's talk (enjoyed in slightly different versions by users' meetings in Boston, Aarhus and London) explains, around slide 34, that adding very small quantities to larger ones is a bad idea numerically. You should do it the other way round.

              Comment


              • #8
                I'm sorry I misunderstood the original post. But now I'm even more confused. Let's just look at acquiror 1 with times 1, 2, 3, and 4 with days 0, 1176, 145, and 50. What should the four values of exp be? I think if you show me all four of those as expressions involving the values of days, I can probably figure out some code to do it.

                Comment


                • #9
                  The following code can surely be improved upon, but it will give you an idea and also what you want (I think):

                  Code:
                  clear
                  set more off
                  
                  *----- example data -----
                  
                  input ///
                  acquiror year time days days_cum fullacq
                  1 1999 1 0 0 1    
                  1 2002 2 1176 1176 1    
                  1 2003 3 145 1321 1    
                  1 2004 4 50 1371 1    
                  1 2005 5 15 1386 1    
                  end
                  
                  list, sepby(acquiror)
                  
                  *----- what you want -----
                  
                  gen exp = 0
                  gen term = 0
                  local to = 0
                  forvalues i = 3/`=_N' {
                      
                      local newcum = 0
                      
                      // computes "something" (see final section of code)
                      if `i' >= 4 {
                      
                          forvalues j = `i'/`=`i' + `to'' {
                              
                              gsort - time
                              replace term = sum(days) if time <= `j'
                              sort time
                              
                              local newcum = `newcum' + 1/sqrt(`=term[`=2*`i' - `j' - 1']')
                          }
                          
                          local to = `to' + 1
                      }
                      
                      replace exp = 1/sqrt(days) + 1/sqrt(days_cum) + `newcum' in `i'
                  }
                  
                  replace exp = 1 + exp
                  replace exp = 1 + 1/sqrt(days) in 2
                  
                  list
                  
                  // to check
                  display "1 + 1/sqrt(_n) + something + cumul"
                  display 1
                  display 1 + 1/sqrt(1176)
                  display 1 + 1/sqrt(145) + 1/sqrt(145+1176)
                  display 1 + 1/sqrt(50) + 1/sqrt(50+145) + 1/sqrt(50+145+1176)
                  display 1 + 1/sqrt(15) + 1/sqrt(15+50) + 1/sqrt(15+50+145) + 1/sqrt(15+50+145+1176)
                  It's somewhat convoluted, but that's what came up when thinking about the problem. I haven't tested speed, which I suppose can be very slow for large data sets. Mata is much much faster with loops.
                  You should:

                  1. Read the FAQ carefully.

                  2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

                  3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

                  4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

                  Comment


                  • #10
                    Hi Joseph,

                    It may not be efficient but the following code might help.

                    sort acquirorid time
                    gen double exp = 1
                    qui forval n = 1/`=_N' {
                    local t = time[`n']
                    forval j = 1/`t' {
                    sum days if acquirorid == acquirorid[`n'] & inrange(time,`j',time[`n']), meanonly
                    local texp = 1/sqrt(r(sum))
                    replace exp = exp + `texp' in `n' if `j'>1
                    }
                    }

                    Abraham

                    Comment


                    • #11
                      Abraham's solution seems superior (although it would read easier within code tags and indented).
                      Last edited by Roberto Ferrer; 09 Oct 2014, 15:42.
                      You should:

                      1. Read the FAQ carefully.

                      2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

                      3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

                      4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

                      Comment


                      • #12
                        Yes, that worked! Thank you Abraham for the solution and everyone for your feedback and time! I truly appreciate it.

                        Best,
                        Joseph

                        Comment

                        Working...
                        X