Depreciating values to generate "stock-variables"

Christian Rauch

Join Date: Nov 2015
Posts: 9

Depreciating values to generate "stock-variables"

07 Dec 2015, 06:52

Hi,

my name is Christian, and I am currently trying to depreciate values by years, to generate a "stock-value" out of the "flow-values" meaning values which come up each year.
This stock-value should be a multiplicated number of about 4 times (exactly: 1/0,23) of the first-appearing flow-value, and after that, be decreased each year by 15%. The flow-value should be added. If a year is missing than the decrease should be as if in this year the flow variable was 0, for the next observation available.

Even though I tried to look for previous solutions, I couldn't find a applicable solution:
http://www.stata.com/statalist/archi.../msg00220.html
http://www.statalist.org/forums/foru...-of-a-variable

The problem looks like this:

Code:

Right now:
  symbol
year
patent_amount (flow values)
other variables...

A
1985
3
...

A
1986
3
...

A
1987
5
...

A
1987
5
...

A
1988
6


B
1990
1


B
1992
2



Goal
  symbol
ayear
patent_amount (flow values)
patentstock (stock value)
other variables...

A
1985
3
=3*(1/0,23)
...

A
1986
3
=patent_stock_last observed*0,85^1+3
...

A
1987
5
=patent_stock_last observed*0,85^1+5
...

A
1987
5
=patent_stock_last observed*0,85^1+5
...

A
1988
6
...


B
1990
1
=1*(1/0,23)


B
1992
2
=patent_stock_last observed*0,85^2+2



I tried to solve this problem by myself with this code:

sort symbol ayear
quietly by symbol: gen first_year = _n
gen patentstock=0
replace patentstock=patent_amount*(1/0.23) if symbol != symbol[_n-1] & ayear != ayear[_n-1] & first_year ==1
replace patentstock = patentstock[_n-1]*0.85 + patent_amount if symbol == symbol[_n-1] & ayear-1 == ayear[_n-1]

drop patent_amount
drop first_year


However, the code doesnt work if there are several years for the same symbol, and if there is a gap of years. And I cannot figure out how to solve these issues appropriately.

I would be very happy about any suggestions on how to improve the current code

Best regards,

Christian

Tags: None

Christian Rauch

Join Date: Nov 2015

Posts: 9
#2

07 Dec 2015, 08:25

For a better insight: This is a sample of the file I am working on.
Attached Files

patent_amount.dta (1.10 MB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#3

07 Dec 2015, 10:01

Your problem is incompletely specified, so it is premature to try to write code. When there are multiple observations for a given symbol in the same year, what is supposed to happen? Do we add up the flows for the year, and then apply the 0.85 factor once for the year and add in the total flow? Or do we assume that the different observations represent equally spaced events during the year and we apply a correspondingly increased factor and add in the flow at each observation? Or does some other rule apply? Similarly, if there are year gaps, then what are we to assume about what happened in the intervening years? If you answer these questions, it shouldn't be too difficult to write code.
Comment
Christian Rauch

Join Date: Nov 2015

Posts: 9
#4

07 Dec 2015, 10:29

Hey,

thanks for your reply

>> When there are multiple observations for a given symbol in the same year, what is supposed to happen?

Each symbol should have in a year the same stock-value. So for several observations in a year for a symbol the stock-variable should have the same value.

>>Do we add up the flows for the year, and then apply the 0.85 factor once for the year and add in the total flow?

The flows are already sums of the entries in the year. (If not, it may be that I made a mistake in the summing-process, I am pretty new to stata, therefore this could happen. I compounded the flows out of the application_ids, which are included in the attached sample). The flow of the year should not be decreased, just the previous stock result. So for year 1 of each symbol the stock should be = (1/0,23)*flow. For the next it should be (0,85)*the value of the previous year stock, plus the flow of the year. For the next years this process should be continued.

>>Or do we assume that the different observations represent equally spaced events during the year and we apply a correspondingly increased factor and add in the flow at each observation? Or does some other rule apply?

No, the flows are simplified as sums of the entries in the year, and happen as "event" on a yearly basis.

>>Similarly, if there are year gaps, then what are we to assume about what happened in the intervening years?

A year gap should be treated as if there is a flow of zero, thus if the flow in year 1=5 , the flow in year 2=0 and the flow in year3=2 it should be:

stock year 1= (1/0,23)*5 (=21,74)
stock year 3= stock year1*0,85*0,85+2 (=17,71)
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30095

07 Dec 2015, 11:41

OK. The trick is to first simplify the data set so that we have one and only one observation for each year-symbol combination, and the year gaps are filled in with zeros for patent_amount. Then we can just apply the simple rules, and merge the results back to the original data.

Code:

clear*
use patent_amount
// VERIFY THAT patent_amount IS CONSTANT WITHIN SYMBOL-AYEAR COMBINATIONS
by symbol ayear (patent_amount), sort: assert patent_amount[1] == patent_amount[_N]

// COLLAPSE DATA TO ONE OBSERVATION PER SYMBOL-AYEAR
collapse (first) patent_amount, by(symbol ayear)

//    NOW FILL IN ANY GAPS IN YEARS
//    SETTING patent_amount TO ZERO IN GAP YEARS
encode symbol, gen(n_symbol) // CAN'T -xtset- STRING VARIABLE--SEE NOTE
xtset n_symbol ayear
gen byte original = 1
tsfill
replace patent_amount = 0 if missing(patent_amount)
replace original = 0 if missing(original)

// AND FILL IN symbol WITH CORRESPONDING VALUE FROM n_symbol
by n_symbol (symbol), sort: replace symbol = symbol[_N]  // SEE NOTE
drop n_symbol
isid symbol ayear patent_amount

//    NOW APPLY THE ROLES FOR THE STOCK AMOUNT
by symbol (ayear), sort: gen patentstock = patent_amount/0.23 if _n == 1
by symbol (ayear): replace patentstock = ///
     0.85*patentstock[_n-1] + patent_amount if _n > 1

//    NOW MERGE THESE RESULTS BACK INTO THE ORIGINAL DATA SET
merge 1:m symbol ayear patent_amount using patent_amount, assert(master match)
assert (original == 0) == (_merge == 1)
drop if original == 0
drop _merge original

NOTE: The part of the code that fills in the gaps is somewhat complicated because the variable symbol is string rather than numeric. So we have to first create a labeled numeric version of it so that we can -xtset- the data. Then, -tsfill- creates missing values for symbol in the expanded observations, and those have to be backfilled from the original observations with the same numeric version of the symbol variable.

Comment

Xin Xiang

Join Date: Feb 2016
Posts: 1

17 Mar 2016, 16:30

Thanks you for sharing! This is really helpful!

Originally posted by Clyde Schechter View Post

Code:

clear*
use patent_amount
// VERIFY THAT patent_amount IS CONSTANT WITHIN SYMBOL-AYEAR COMBINATIONS
by symbol ayear (patent_amount), sort: assert patent_amount[1] == patent_amount[_N]

// COLLAPSE DATA TO ONE OBSERVATION PER SYMBOL-AYEAR
collapse (first) patent_amount, by(symbol ayear)

// NOW FILL IN ANY GAPS IN YEARS
// SETTING patent_amount TO ZERO IN GAP YEARS
encode symbol, gen(n_symbol) // CAN'T -xtset- STRING VARIABLE--SEE NOTE
xtset n_symbol ayear
gen byte original = 1
tsfill
replace patent_amount = 0 if missing(patent_amount)
replace original = 0 if missing(original)

// AND FILL IN symbol WITH CORRESPONDING VALUE FROM n_symbol
by n_symbol (symbol), sort: replace symbol = symbol[_N] // SEE NOTE
drop n_symbol
isid symbol ayear patent_amount

// NOW APPLY THE ROLES FOR THE STOCK AMOUNT
by symbol (ayear), sort: gen patentstock = patent_amount/0.23 if _n == 1
by symbol (ayear): replace patentstock = ///
0.85*patentstock[_n-1] + patent_amount if _n > 1

// NOW MERGE THESE RESULTS BACK INTO THE ORIGINAL DATA SET
merge 1:m symbol ayear patent_amount using patent_amount, assert(master match)
assert (original == 0) == (_merge == 1)
drop if original == 0
drop _merge original

Comment

Tobias Wendler

Join Date: Nov 2017
Posts: 51

18 Jan 2019, 06:16

Thanks for this great thread, I have a very similar problem but to our knowledge more difficult to code in STATA.
The formula we want to code in STATA 13 on Windows is the following

Click image for larger version

Name: Unbenannt.PNG
Views: 1
Size: 7.0 KB
ID: 1479451

The values for ß1 = 0.1 and ß2= 0.25, while e inidcates an exponential function. The values in brackets behind ß1 and ß2 indicate the value with which the ß-value is to be multiplied.
The data we want to generate this variable from looks as follows and we want to generate the Patent stock in each Country.

Code:

----------------------- copy starting from the next line -----------------------


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input str224 Country float(year Patents)
"AT" 1979  4.962845
"AT" 1980  4.867535
"AT" 1981  4.820282
"AT" 1982  4.905275
"AT" 1983  5.342334
"AT" 1984  5.153292
"AT" 1985   5.09375
"AT" 1986  5.153292
"AT" 1987  5.521461
"AT" 1988  5.655992
"AT" 1989  5.556828
"AT" 1990  6.013715
"AT" 1991  5.945421
"AT" 1992  5.983936
"AT" 1993  5.978886
"AT" 1994  5.872118
"AT" 1995   5.81413
"AT" 1996  5.953243
"AT" 1997  5.921578
"AT" 1998  5.602119
"AT" 1999  5.802118
"AT" 2000  6.030685
"AT" 2001  6.084499
"AT" 2002  6.061457
"AT" 2003  5.913503
"AT" 2004  5.894403
"AT" 2005  5.929589
"AT" 2006  6.006353
"AT" 2007  6.276643
"AT" 2008  6.410175
"AT" 2009  6.349139
"AT" 2010  6.526495
"AT" 2011  6.249975
"AT" 2012  6.434546
"AT" 2013  6.109248
"AT" 2014  5.749393
"AT" 2015  2.397895
"AT" 2016         .
"BE" 1979  3.610918
"BE" 1980 4.4308167
"BE" 1981 4.0430512
"BE" 1982  3.713572
"BE" 1983  3.912023
"BE" 1984  4.406719
"BE" 1985 4.7004805
"BE" 1986  4.983607
"BE" 1987  4.158883
"BE" 1988  4.204693
"BE" 1989 4.6634393
"BE" 1990 4.3820267
end
------------------ copy up to and including the previous line ------------------

Our main issue is that we do not know how to efficiently code using the time indices and telling STATA to run the sigma sign for each period over the corresponding time values.

We will keep trying and tell if we found a solution, but would be very appreciative for any hints on how to solve this issue efficiently.

Thanks so much and best
Tobi & Daniel

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#8

18 Jan 2019, 10:30

The formula you show makes no mathematical sense. The right hand side contains a subscript j which neither appears on the left side, nor is it a bound to a summation index.
Comment
Tobias Wendler

Join Date: Nov 2017

Posts: 51
#9

19 Jan 2019, 01:44

Dear Clyde,
thanks for the hint, indeed the subscript j does not seem to indicate anything relevant for the coding or summing, so the j can be thought inexistent.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#10

19 Jan 2019, 10:51

Ignoring subscript j, there is the issue of applying an infinite summation to finite data. I will assume that the intent is to truncate the summation at the point where t-s is the earliest available information for country i. Also, it seems to me from a mathematical point of view that missing values of PAT really mess up this calculation--and your data set has some. The code I show will calculate a zero result for those particular observations, but more problematic, those terms are missing from the calculation of later observations for the same country. Basically once you have a missing value of Patents in a given year, the values of the sum for all subsequent years are incorrect (and I don't see any way to get around this.)

With that in mind, I think the following is correct, though if you have some previously worked results, you should check them against what I've done here.

Code:

// NEED A NUMERIC COUNTRY VARIABLE encode Country, gen(country) xtset country year // DESIRED SUMMATION IS S = 0 TO INFINITY; BUT FIND MAXIMUM INSTANTIATED // VALUE OF S IN THE DATA SET by country, sort: gen n_years = _N summ n_years, meanonly local infinity = `r(max)' - 1 // PARAMETERS local b1 = 0.1 local b2 = 0.25 /// RUN THE SUMMATION gen double kstock = 0 forvalues s = 0/`infinity' { by country (year), sort: replace kstock = kstock /// + exp(-`b1'*`s')*(1-exp(-`b2'*(`s'+1)))*L`s'.Patents /// if !missing(L`s'.Patents) }
Comment
Tobias Wendler

Join Date: Nov 2017

Posts: 51
#11

21 Jan 2019, 03:04

Dear Clyde,
thank you so much for the great solution, I did not already come up with such an efficient solution. Your code worked perfectly.
I manually checked correctness for the first periods, without any differences arising:

Code:

gen kstock_control = 0 by country (year): replace kstock_control = exp(-0.1*0)*(1-exp(-0.25*1))*Patents if _n==1 by country (year): replace kstock_control = (exp(-0.1*1)*(1-exp(-0.25*2))*Patents[1]) + (exp(-0.1*0)*(1-exp(-0.25*1))*Patents[2]) if _n==2 by country (year): replace kstock_control = (exp(-0.1*2)*(1-exp(-0.25*3))*Patents[1]) + (exp(-0.1*1)*(1-exp(-0.25*2))*Patents[2]) + (exp(-0.1*0)*(1-exp(-0.25*1))*Patents[3]) if _n==3 by country (year): replace kstock_control = (exp(-0.1*3)*(1-exp(-0.25*4))*Patents[1]) + (exp(-0.1*2)*(1-exp(-0.25*3))*Patents[2]) + (exp(-0.1*1)*(1-exp(-0.25*2))*Patents[3]) + (exp(-0.1*0)*(1-exp(-0.25*1))*Patents[4]) if _n==4

I have one question concerning the local infinity: Am I correct that this value is constantly at a certain value (in my dataset with 38 periods, at 37), indicating that in each loop 37 Lags are tried for inclusion in the summing ? I am afraid I haven`t captured something here, as you mentioned to truncate the summation at t=s, e.g. for t=1 as first datapoint only s=0 is relevant.

Concerning the concern on the missings I should probably clarify the context. The data above was already logged, while the stock generation will be done on original values. In case of patent counts we do not have a missing data problem as a value of 0 for a given country and year indicates a genuine zero, not a missing value. Thus, it is fine to include this 0 in the given period and all later periods for summation.

Thanks so much and best wishes
Tobi
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#12

21 Jan 2019, 08:49

I have one question concerning the local infinity: Am I correct that this value is constantly at a certain value (in my dataset with 38 periods, at 37), indicating that in each loop 37 Lags are tried for inclusion in the summing ? I am afraid I haven`t captured something here, as you mentioned to truncate the summation at t=s, e.g. for t=1 as first datapoint only s=0 is relevant.

You are correct that the local macro infinity has a fixed value (apparently 37 in your data). The actual truncation of the summation at t = s arises from the -if !missing(L`s'.Patents)- clause. If `s' exceeds the number of lags actually available for that particular country, L`s'.Patents will be missing value.

The local macro inifnity was just there to set some overall stopping point for the process. It is, in a sense, wasteful because on some of the countries, Stata will attempt to add more terms that don't exist, only to be thwarted by the -if !missing(L`s'.Patents)- clause. There are ways to program around this, but they make the code complicated. If this were a data set with millions of observation, the more complicated code might be worth while in terms of reduced execution time, but knowing that the data was about countries in recent years, I reasoned that the data set was small enough that the "wasted" time wouldn't be a material problem, so I went for the simpler code.
Comment

Announcement

Depreciating values to generate "stock-variables"

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment