Discounted cumulative sum of preceding values (unbalanced panel with many missing observations)

Anna Pak

Join Date: May 2018
Posts: 33

Discounted cumulative sum of preceding values (unbalanced panel with many missing observations)

23 May 2018, 03:04

Dear STATALIST,

I am interested in creating the discounted cumulative sum of preceding values of a variable. I have the following unbalanced panel data with many missing observations. I filled in the gaps between the available calendar years, and this is why the current data set has complete years regardless of missing observations. Year and time variables are the same. I am trying to generate the discounted cumulative sum using the variable (performance).

firm	year	time	performance
102	1995	1	13474868
102	1996	2	26222656
102	1997	3	.
102	1998	4	74088704
102	1999	5	.
102	2000	6	69155080
102	2001	7	39108876
109	2006	1	1787580.5
109	2007	2	.
109	2008	3	9137924

The equation for the discounted sum is as following:

Click image for larger version

Name: Capture.JPG
Views: 1
Size: 12.3 KB
ID: 1445406

P1 is performance of the focal firm i at time 1 (first firm-year observation). T is the given time of the focal firm i.

For example, for the 5th observation (firm 102, time 5), the discounted sum from 1995 to 1998 should be calculated as 13474868 * (1/5-1) + 26222656 * (1/5-2) + 74088704 * ( 1/5-4). Therefore, I will get the following discounted sum values based on this equation. The discounted sum of any first observation of each firm should be zero since there is no preceding value. Any missing observation will be excluded in the equation.

firm	time	discounted sum
102	1	0
102	2	13474868 * (1/2-1)
102	3	13474868 * (1/3-1) + 26222656 * (1/3-2)
102	4	13474868 * (1/4-1) + 26222656 * (1/4-2)
102	5	13474868 * (1/5-1) + 26222656 * (1/5-2) + 74088704 * ( 1/5-4)
102	6	13474868 * (1/6-1) + 26222656 * (1/6-2) + 74088704 * ( 1/6-4)
102	7	13474868 * (1/7-1) + 26222656 * (1/7-2) + 74088704 * ( 1/7-4) + 69155080 * (1/7-6)
109	1	0
109	2	1787580.5 * (1/2-1)
109	3	1787580.5 * (1/3-1)

I would like to know if there is a way to generate this variable with loops (or any simpler way).
Any suggestions or advice would be greatly appreciated. Thank you in advance for your help.

Best regards,

Anna

Last edited by Anna Pak; 23 May 2018, 03:10.

Tags: None

Robert Picard

Join Date: Mar 2014
Posts: 1536

23 May 2018, 08:45

You did not have to fill in the gaps to do this and you do not need the time variable, you can use year. The performance variable is weighted using the difference in time between the current observation and the past observation. Say you want to calculate the sum up to year 2000 and the first observation is in 1995, the weight to apply can be calculated directly from the year variable as follows:

Code:

clear
input float year
1995
1996
1997
1998
1999
2000
end

gen weight = year[_N] - year
drop if inlist(year, 1997,1999)
gen weight2 = year[_N] - year
list

and the results as the same even though weight2 was calculated after dropping 1997 and 1999

Code:

. list

     +-------------------------+
     | year   weight   weight2 |
     |-------------------------|
  1. | 1995        5         5 |
  2. | 1996        4         4 |
  3. | 1998        2         2 |
  4. | 2000        0         0 |
     +-------------------------+

You can perform the desired calculation using rangerun (from SSC). For each observation, rangerun will run the dcumsum program on a data subset that includes all observations up to the current observation (within each firm). All new variables created by dcumsum are considered results and the values for the last observations in memory at the time the dcumsum program terminates are returned.

Code:

clear all
input int(firm year) double performance
102 1995  13474868
102 1996  26222656
102 1998  74088704
102 2000  69155080
102 2001  39108876
109 2006 1787580.5
109 2008   9137924
end
xtset firm year

program dcumsum
    gen weight = (year[_N] - year)
    gen wanted = sum(performance/weight)
end
rangerun dcumsum, interval(year . 0) by(firm)

format %12.0g wanted
list, sepby(firm)

and the results

Code:

. list, sepby(firm)

     +----------------------------------------------+
     | firm   year   perform~e   weight      wanted |
     |----------------------------------------------|
  1. |  102   1995    13474868        0           0 |
  2. |  102   1996    26222656        0    13474868 |
  3. |  102   1998    74088704        0    17602950 |
  4. |  102   2000    69155080        0    46294988 |
  5. |  102   2001    39108876        0   101341656 |
     |----------------------------------------------|
  6. |  109   2006   1787580.5        0           0 |
  7. |  109   2008     9137924        0   893790.25 |
     +----------------------------------------------+

.

Here's code to spot check results for firm 102 in 2001:

Code:

gen w = (2001 - year)
gen pw = performance / w
gen xpw = sum(pw)
format %12.0g xpw
list if firm == 102

and the results

Code:

. list if firm == 102

     +-------------------------------------------------------------------------+
     | firm   year   perfor~e   weight      wanted   w         pw          xpw |
     |-------------------------------------------------------------------------|
  1. |  102   1995   13474868        0           0   6    2245811   2245811.25 |
  2. |  102   1996   26222656        0    13474868   5    5244531      7490342 |
  3. |  102   1998   74088704        0    17602950   3   2.47e+07     32186576 |
  4. |  102   2000   69155080        0    46294988   1   6.92e+07    101341656 |
  5. |  102   2001   39108876        0   101341656   0          .    101341656 |
     +-------------------------------------------------------------------------+

.

Comment

Anna Pak

Join Date: May 2018

Posts: 33
#3

23 May 2018, 15:28

Dear Robert,

Thank you so much! Your codes are very helpful.

As for the missing observations, if I do not fill in the gaps, a lot of information will be wasted (my data has so many missing observations). For example, three observations (firm 102, year 1997, 1999 & firm 109 and year 2007) are missing in the current sample data set, but the discounted sum values of such observations can still be generated for the missing years.

Thank you again for your help.

Best regards,

Anna
Comment

Announcement

Discounted cumulative sum of preceding values (unbalanced panel with many missing observations)

Comment

Comment