Discounted cumulative sum with uneven time spaces

Joseph Harrison

Join Date: Oct 2014
Posts: 4

Discounted cumulative sum with uneven time spaces

08 Oct 2014, 11:48

Dear Stata Listers,

I am working with a (psuedo) panel dataset of firm acquisitions, trying to calculate a recency weighted cumulative 'experience' variable (i.e., sum of acquisitions). My problem is similar to another that was posted & answered last year (see attached link), except that the time between each observation is not evenly spaced in my dataset. Rather, I need to include the number of days since the last acquisition in the discounting equation. I've tried substituting that into the macro provided in the link, but it doesn't work. My data looks something like this:

acquirorid	year	time	date	days	days_cum	fullacq	exp
1	1999	1	5/5/99	0	0	1
1	2002	2	7/24/02	1176	1176	1
1	2003	3	1/30/03	145	1321	1
2	1994	1	2/23/94	0	0	1
2	1994	2	6/3/94	155	155	1
2	1995	3	4/13/95	393	548	1
2	1996	4	1/5/96	246	794	1
2	1996	5	11/1/96	369	1163	1
2	1997	6	4/1/97	160	1323	1
2	1997	7	7/2/97	106	1429	1

days = the number of days since the last acquisition
days_cum = the number of days since the first acquisition
fullacq = full acquisitions (I include this as the variable that will need to be summed)

I need to discount each observation by the square root of the number of days between any given acquisition and the current time period. So for example, for Acquiror 1:

exp (at time 1) = 1
exp (at time 2) = 1 + 1/sqrt(1176)
exp (at time 3) = 1 + 1/sqrt(145) + 1/sqrt(145+1176)

Any help would be greatly appreciated.

All the best,

Joseph Harrison
PhD Student
Texas A&M University
[email protected]

http://www.stata.com

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35468
#2

08 Oct 2014, 11:51

The link is just to StataCorp's website. No doubt you meant something more specific. They are just down the road from you any way, not that that helps here.

Last edited by Nick Cox; 08 Oct 2014, 11:58.
Comment
Joseph Harrison

Join Date: Oct 2014

Posts: 4
#3

08 Oct 2014, 11:54

Yeah, it gave me an error as I was posting and must not have been able to pick up the link. I'll try to just add it directly here:

http://www.stata.com/statalist/archi.../msg00533.html
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29969
#4

08 Oct 2014, 12:30

I think you want something like this (assuming that there are never gaps in the time variable and that time always starts at 1):

Code:

gen exp = 1 if time == 1 by acquirorid (time), sort: replace exp = exp[_n-1] + 1/sqrt(days_cum) if _n > 1
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35468
#5

08 Oct 2014, 12:53

That's a scary problem computationally. See especially http://www.timberlake.co.uk/common/pdf/uk14_gould.pdf around slide 34.

Also use a double to hold results.
Comment
Joseph Harrison

Join Date: Oct 2014

Posts: 4
#6

08 Oct 2014, 21:30

Thanks for the feedback.

Nick, I appreciate the comment about using "double", I'll make sure to incorporate that when I find a solution. I'm not quite sure how the slide deck helps, though?

Clyde, unfortunately it's not that simple. Past the second time point that formula doesn't work because I'm not just adding a new discounted term to the value of my exp variable at (t-1). Rather, the current time period (t) becomes the base where exp=1 and each of the previous acquisitions needs to be discounted by the number of days since it's completion before adding it to the exp variable. So it's a moving target.

To put it another way, I essentially need to add the number of days between (t) and (t-1) to EACH of the discounting terms preceding (t). Going back to my example, notice that 145 is added to the discount for both the second and third terms. If I were to have a fourth acquisition, say, 50 days after the acquisition at t=3, I would then need to add a new "1" at the front of the equation and then add 50 to the discounting factor for each of the pre-existing terms. so I would get:

exp (at time 4) = 1 + 1/sqrt(50) + 1/sqrt(50+145) + 1/sqrt(50+145+1176)

For this new equation, the first term is associated with the acquisition at t=4, the second with t=3, the third with t=2, and the fourth with t=1.

In summary, for any given time point, the first term is always "1" (for t=n) and the last term is always "1/sqrt(days_cum)" (for t=1), but anything in between changes by (t) - (t-1). Does that make sense? I'm fairly certain I'll need to use macros to solve this rather than just a 'replace' command, but it still has me stumped...
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35468
#7

09 Oct 2014, 04:52

Bill Gould's talk (enjoyed in slightly different versions by users' meetings in Boston, Aarhus and London) explains, around slide 34, that adding very small quantities to larger ones is a bad idea numerically. You should do it the other way round.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29969
#8

09 Oct 2014, 10:29

I'm sorry I misunderstood the original post. But now I'm even more confused. Let's just look at acquiror 1 with times 1, 2, 3, and 4 with days 0, 1176, 145, and 50. What should the four values of exp be? I think if you show me all four of those as expressions involving the values of days, I can probably figure out some code to do it.
Comment

Roberto Ferrer

Join Date: Apr 2014
Posts: 449

09 Oct 2014, 14:49

The following code can surely be improved upon, but it will give you an idea and also what you want (I think):

Code:

clear
set more off

*----- example data -----

input ///
acquiror year time days days_cum fullacq
1 1999 1 0 0 1    
1 2002 2 1176 1176 1    
1 2003 3 145 1321 1    
1 2004 4 50 1371 1    
1 2005 5 15 1386 1    
end

list, sepby(acquiror)

*----- what you want -----

gen exp = 0
gen term = 0
local to = 0
forvalues i = 3/`=_N' {
    
    local newcum = 0
    
    // computes "something" (see final section of code)
    if `i' >= 4 {
    
        forvalues j = `i'/`=`i' + `to'' {
            
            gsort - time
            replace term = sum(days) if time <= `j'
            sort time
            
            local newcum = `newcum' + 1/sqrt(`=term[`=2*`i' - `j' - 1']')
        }
        
        local to = `to' + 1
    }
    
    replace exp = 1/sqrt(days) + 1/sqrt(days_cum) + `newcum' in `i'
}

replace exp = 1 + exp
replace exp = 1 + 1/sqrt(days) in 2

list

// to check
display "1 + 1/sqrt(_n) + something + cumul"
display 1
display 1 + 1/sqrt(1176)
display 1 + 1/sqrt(145) + 1/sqrt(145+1176)
display 1 + 1/sqrt(50) + 1/sqrt(50+145) + 1/sqrt(50+145+1176)
display 1 + 1/sqrt(15) + 1/sqrt(15+50) + 1/sqrt(15+50+145) + 1/sqrt(15+50+145+1176)

It's somewhat convoluted, but that's what came up when thinking about the problem. I haven't tested speed, which I suppose can be very slow for large data sets. Mata is much much faster with loops.

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

Comment

Abraham Wolde-Tsadick

Join Date: Oct 2014

Posts: 61
#10

09 Oct 2014, 15:14

Hi Joseph,

It may not be efficient but the following code might help.

sort acquirorid time
gen double exp = 1
qui forval n = 1/`=_N' {
local t = time[`n']
forval j = 1/`t' {
sum days if acquirorid == acquirorid[`n'] & inrange(time,`j',time[`n']), meanonly
local texp = 1/sqrt(r(sum))
replace exp = exp + `texp' in `n' if `j'>1
}
}

Abraham
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#11

09 Oct 2014, 15:30

Abraham's solution seems superior (although it would read easier within code tags and indented).

Last edited by Roberto Ferrer; 09 Oct 2014, 15:42.

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment
Joseph Harrison

Join Date: Oct 2014

Posts: 4
#12

10 Oct 2014, 11:12

Yes, that worked! Thank you Abraham for the solution and everyone for your feedback and time! I truly appreciate it.

Best,
Joseph
Comment

Announcement

Discounted cumulative sum with uneven time spaces

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment