Generating a reference price measure as a probability weighted sum of past stock prices

Artem Rusanov

Join Date: Mar 2016

Posts: 5
#1

Generating a reference price measure as a probability weighted sum of past stock prices

03 Mar 2016, 10:22

Dear Statalist members,

I am trying to create a reference price measure that is an average of past 250 days stock prices, weighted by the probabilities of a stock not being traded since the date of the corresponding price:
reference_price_t=(price_t-250*turnover_ratio_t-250*(1-turnover_ratio_t-249)*(1-turnover_ratio_t-248)*...*(1-turnover_ratio_t-1)+price_t-249*turnover_ratio_t-249*(1-turnover_ratio_t-248)*(1-turnover_ratio_t-247)*...*(1-turnover_ratio_t-1)+...+price_t-1*turnover_ratio_t-1)/k, where k is a constant that ensure price weights add up to 1 (see the formula below).

I have written the following code, although as I am just a beginner in stata, it is far from optimized. Thus I would appreciate if you could give some tips on speeding up the code or even avoiding the loops, since with my fairly big data sample it takes quite some time to generate the variable. Attached I have enclosed a data sample for example.

Code:

qui{ nois _dots 0, title(Loop running) reps(100) bys id (date): gen float volr=vol/freefloat // generating turnover ratio bys id (date): gen float rev_volr=1-volr bys id (date): gen total_volr=sum(log(rev_volr)) forvalues i=250(-1)1 { bys id (date): gen ref_`i'=price[_n-`i']*vol[_n-`i']*exp(total_volr-total_volr[_n-`i'+1]) nois _dots `i' 0 } egen ref=rowtotal(ref_250-ref_1) forvalues i=1/250 { drop ref_`i' } forvalues i=250(-1)1 { bys id (date): gen k_`i'=vol[_n-`i']*exp(total_volr-total_volr[_n-`i'+1]) nois _dots `i' 0 } egen k=rowtotal(k_250-k_1) forvalues i=1/250 { drop k_`i' } gen float refprice=ref/k // generating reference price measure drop ref k volr rev_volr total_volr bys id (date) gen float cgo=(price-refprice)/price // generating capital gains overhang proxy }

Thank you for your help in advance.

Best regards,
Artem Rusanov
Attached Files

sample.dta (214.8 KB, 1 view)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

03 Mar 2016, 11:02

I can think of several tings to speed this up, but first I would like to clarify a few things that seem to me like possible errors.

The commands that generate volr and rev_volr don't appear to involve the id grouping, so I don't know why you have them done with -by-. Either the -by- is unnecessary or you have left something out of those formulas. (The formula you pasted as a photo is too small to read on my computer, so I can't figure it out from there. Posting photos on this forum often works poorly.)

Next, in -gen total_volr....- you are generating a sum of logarithms of rev_volr. So either you have used a somewhat confusing name for this variable or perhaps you meant to evaluate it some other way. Is this right hand side really what you wanted and you were just being very telegraphic with the variable name?
Comment

Artem Rusanov

Join Date: Mar 2016
Posts: 5

03 Mar 2016, 12:36

Thank you for a quick reply, Clyde!

You are right regarding generating the variables volr and rev_volr, grouping function is completely unnecessary. Moreover, I appologize for not naming the variable that generates sum of logarithms of rev_volr correctly, indeed it should have been "total_log_rev_volr". Please see below the bigger version of the formula picture and the corrected code.

Click image for larger version

Name: reference price formula.png
Views: 1
Size: 11.9 KB
ID: 1329171

Code:

qui{
nois _dots 0, title(Loop running) reps(100)
gen float volr=vol/freefloat
gen float rev_volr=1-volr            
bys id (date): gen total_log_rev_volr=sum(log(rev_volr))

forvalues i=250(-1)1 {
    bys id (date): gen ref_`i'=price[_n-`i']*vol[_n-`i']*exp(total_log_rev_volr-total_log_rev_volr[_n-`i'+1])
    nois _dots `i' 0
}
egen ref=rowtotal(ref_250-ref_1)

forvalues i=1/250 {
    drop ref_`i'
}

forvalues i=250(-1)1 {
    bys id (date): gen k_`i'=vol[_n-`i']*exp(total_log_rev_volr-total_log_rev_volr[_n-`i'+1])
    nois _dots `i' 0
}
egen k=rowtotal(k_250-k_1)

forvalues i=1/250 {
    drop k_`i'
}

gen float refprice=ref/k
bys id (date): gen float cgo=(price-refprice)/price
drop ref k volr rev_volr total_volr refprice
lab var cgo "Capital Gains Overhang Measure"
}

Last edited by Artem Rusanov; 03 Mar 2016, 12:38.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#4

03 Mar 2016, 13:04

The main inefficiency is that you calculate the same weight expression twice, once in the -gen ref_`i'price- loop and one in the -gen k_`i'- loop. You only need to do that once, to calculate k_`i', and then use k_`i' to calculate ref_`i'price. Since the expression you evaluate for k involves taking an exponential, it is fairly computationally intensive, so a noticeable savings should result from doing it only once instead of twice.

The -_dots- commands are a slight time-waster as well. I understand that perhaps you need ongoing reassurance that Stata has not simply gotten hung up, but writing things to the display, while not quite as slow as writing them to a disk file, is still slow compared to computation. In each case where it appears, it may well be the most time-consuming command in its loop.

Although it probably doesn't affect efficiency, I've rewritten the loop indices as i = 1/250. It's easier to understand the code that way, and I see nothing in the calculations that requires them to be done in reverse order.

Using a single -drop- command probably doesn't save much time other than the time to parse the command repeatedly (which probably isn't noticeable in only 250 iterations), but, again I think it makes the code more understandable. So what I've come up with is:

Code:

quietly { gen float volr=vol/freefloat // generating turnover ratio gen float rev_volr=1-volr bys id (date): gen total_log_rev_volr=sum(log(rev_volr)) } forvalues i = 1/250 { quietly { bys id (date): gen k_`i'=vol[_n-`i']*exp(total_log_rev_volr-total_log_rev_volr[_n-`i'+1]) by id (date): gen ref_`i' = price[_n-`i']*k_`i' } } egen ref = rowtotal(ref_*) egen k = rowtotal(k_*) gen refprice = ref/k drop ref_* k_*

There is one other possible area of improvement. Taking logs and exponentials is computationally expensive. Then again, addition is faster than multiplication. So it might be faster to calculate a running product directly, rather than summing logs and then exponentiating.

Code:

use "sample (2).dta", clear quietly { gen float volr=vol/freefloat // generating turnover ratio gen float rev_volr=1-volr bys id (date): gen product_rev_volr = 1 if _n == 1 by id (date): replace product_rev_volr= rev_volr*product_rev_volr[_n-1] if _n > 1 forvalues i = 1/250 { quietly { bys id (date): gen k_`i'=vol[_n-`i']*product_rev_volr[_n-`i'+1] by id (date): gen ref_`i' = price[_n-`i']*k_`i' } } egen ref = rowtotal(ref_*) egen k = rowtotal(k_*) gen refprice = ref/k drop ref_* k_*

My instinct is that this should be appreciably faster in a very large data set. In the sample data it runs about 10% faster with running products, but with that small data set the difference is in hundredths of a second.

Last edited by Clyde Schechter; 03 Mar 2016, 13:06. Reason: Correct error in explanation.
Comment
Artem Rusanov

Join Date: Mar 2016

Posts: 5
#5

03 Mar 2016, 13:17

Many thanks for the tips, Clyde!
Comment
Asad Rind

Join Date: Mar 2016

Posts: 55
#6

27 May 2017, 04:42

Hello. is there any way that we can further compress this code. It is taking a lot of time. I have a huge dataset of around 9 million observations. the system is running since 2 days but i have not yet got the results from above code.
Comment
Asad Rind

Join Date: Mar 2016

Posts: 55
#7

31 May 2017, 19:59

I have a question to Clyde Schechter . Why one bracket "}" is missing in the last code in #4 and if its a mistake then what is the appropriate position where we should put the missing bracket.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30122

31 May 2017, 21:47

Good pickup! It should be:

Code:

use "sample (2).dta", clear

quietly {
    gen float volr=vol/freefloat            // generating turnover ratio
    gen float rev_volr=1-volr  
    bys id (date): gen product_rev_volr = 1 if _n == 1
    by id (date): replace product_rev_volr= rev_volr*product_rev_volr[_n-1] if _n > 1
}

forvalues i = 1/250 {
    quietly {
        bys id (date): gen k_`i'=vol[_n-`i']*product_rev_volr[_n-`i'+1]
        by id (date): gen ref_`i' = price[_n-`i']*k_`i'
    }
}
egen ref = rowtotal(ref_*)
egen k = rowtotal(k_*)
gen refprice = ref/k
drop ref_* k_

When there is no example data, I usually don't test the code, so little typos and omissions like that happen. That's one reason it's best to post example data with almost any question--you get better answers!

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35734

01 Jun 2017, 04:50

Where products are concerned I would use double types.

The egen calls are just going to entail loops that you have already been through. And you don't need the 500 new variables you are going to drop any way.

Hence, I would suggest (but this is also completely untested):

Code:

use "sample (2).dta", clear

quietly {
    gen float volr = vol/freefloat   
    bys id (date): gen double product_rev_volr = 1 if _n == 1
    by id (date): replace product_rev_volr= (1 - volr) * product_rev_volr[_n-1] if _n > 1
}

gen double ref = 0 
gen double k = 0 

quietly forvalues i = 1/250 {
    by id (date): replace ref = ref + price[_n-`i'] * k_`i'
    by id (date): replace k = k + vol[_n-`i'] * product_rev_volr[_n-`i'+1]
}

gen double refprice = ref/k

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35734

#10

01 Jun 2017, 06:00

Incomplete rewriting!

Code:

use "sample (2).dta", clear

quietly {
    gen float volr = vol/freefloat   
    bys id (date): gen double product_rev_volr = 1 if _n == 1
    by id (date): replace product_rev_volr= (1 - volr) * product_rev_volr[_n-1] if _n > 1
}

gen double ref = 0 
gen double k = 0 
gen double work = . 

quietly forvalues i = 1/250 {
    by id (date): replace work = vol[_n-`i'] * product_rev_volr[_n-`i'+1]
    by id (date): replace ref = ref + price[_n-`i'] * work 
    by id (date): replace k = k + work                                    
}

gen double refprice = ref/k

Comment

Asad Rind

Join Date: Mar 2016

Posts: 55
#11

01 Jun 2017, 06:25

Thanks to both Nick Cox & Clyde Schechter. Just a minor addition of "*" in the end of Clyde Schechter's code in #8. The last line of the code should be this:

drop ref_* k_*
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#12

01 Jun 2017, 10:12

In the "sample.dta" dataset attached in #1, you see that there are lots of missing values for vol. Once a missing value is encountered, the rest of the running product becomes missing. I don't know what the implications are when a running sum of logs is used instead.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#13

01 Jun 2017, 10:21

Without wanting to play here with data (the OP is no longer driving the questions any way) I note three possibilities:

Products going missing because a term is missing might be desired behaviour.

sum() ignores missings or equivalently treats them as zero.

If you wanted a homegrown product to ignore missings then each new term could be cond(missing(whatever), 1, whatever)
Comment
Tobias Wendler

Join Date: Nov 2017

Posts: 51
#14

10 Apr 2018, 02:48

Hi everyone,

I have a similar issue with generating a variable that includes a sigma sign, as I did not find this as mathematical operator or function in STATA.
The formula I want to code for my data is :

I hope the picture is sufficiently readable. Fundamentally, the question is wheter there is a way to tell STATA the sigma sign that should be run when generating the variable from the data.

I have a Panel dataset for yearly country data that I want to generate the variable for, to have a variable "Patent stock".

Patents year Country
80 2002 "AT"
97 2003 "AT"
101 2004 "AT"
119 2005 "AT"
115 2006 "AT"
152 2007 "AT"
135 2008 "AT"
177 2009 "AT"
217 2010 "AT"
226 2011 "AT"
11 2009 "BG"
2 2010 "BG"
9 2011 "BG"
13 2012 "BG"
7 2013 "BG"
3 2014 "BG"
1 2015 "BG"

For any helpful hints on how to efficiently generate this, I would be very grateful.

Thanks and best
Tobi
Comment

Announcement