Speed up Rangestat and asrol

Ken Yang

Join Date: Dec 2018

Posts: 24
#1

Speed up Rangestat and asrol

26 Mar 2019, 05:35

Hi Statalists,

I have two datasets to calculate 2 variables: Roll (1984) liquidity measure and the 4-year-rolling-window standard deviations of the residuals of a cross-sectional regression.

1) For Roll liquidity:

Roll liquidity = 2*sqrt(- covariance (price_change_t, price_change_t-1))

I have unbalanced panel for daily stock price (i.e. id date price). I use rangestat to calculate the covariance over 21 days. I have quite big data with nearly 5,300,000 rows (about 800 firms over 14 years). It takes me years to get the results and I am not sure when it will complete.

Code:

encode id,gen (firm) sort firm date format %td date by firm: gen obs_count=_n xtset firm obs_count bys firm: gen change_prc= prc - L.prc bys firm: gen lag_change_prc=L.change_prc drop if year(date)<2004 drop if year(date)>2017 ssc install rangestat rangestat (cov) lag_change_prc change_prc, by(firm) interval(obs_count -20 0)

2) For the 4-year-rolling-window standard deviation of the residuals of a cross-sectional regression for unbalanced panel with over firm-year 35,000 obs from 2005 to 2017:
First, I run the cross-sectional regression like this reg accruals cf_1lag cf cf_1lead rev ppe
So I use runby as suggested by some prior posts here, and get the residuals, then I want to calculate the standard deviations of the residuals rolling 4 years.
Again, it takes me forever to have the results by using asrol.

Code:

ssc install runby capture program drop one_regression program define one_regression if _N > 10 { capture noisily reg accruals cf_1lag cf cf_1lead rev ppe, noconstant if c(rc) == 0 { // REGRESSION WENT OK predict r } else if inlist(c(rc), 2000, 2001) { // NO OR INSUFFICIENT OBSERVATIONS gen r = . } else { // THERE WAS AN UNEXPECTED PROBLEM gen comment = "Unexpected error `c(rc)''" } } exit end runby one_regression, by(year industry) status replace r=0 if missing(r) rename r residuals ///use asrol to obtain the standard deviation of the residuals rolling 4 years sort firm year bys firm: gen t=_n tsset firm t asrol residuals, w(year 4) s(sd) g(sd)

I cannot upload any dataset sample here because rangestat or asrol is run based on the actual data sample size.
Please accept my apology for this long post, but because they are the same topic how to speed up the running process.

Can anyone please help if I did something wrong with the codes? How can I check if when they will finish?

I really appreciate your help.

Kind regards,
Ken

Last edited by Ken Yang; 26 Mar 2019, 05:46.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35798
#2

26 Mar 2019, 05:43

rangestat doesn't emit signals about its progress. As for speeding it up, the code is visible. Any suggestions you have about improving it are welcome. I can't see that you are doing anything wrong in calling it.

asrol is (also) from SSC, as you are asked to explain. I can't speak on its behalf.
Comment
Ken Yang

Join Date: Dec 2018

Posts: 24
#3

26 Mar 2019, 06:07

Thank you so much for you quick reply Nick Cox .

Can you advise me it is better if I use rangestat for calculating the standard deviation of the residuals, instead of asrol?

Code:

rangestat (sd) residuals, by(firm) interval( t -4 0)

Thank you.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35798
#4

26 Mar 2019, 06:52

It really shouldn't matter much, but in any case I am not a good person to ask. I don't use asrol (which I didn't write) because I do use rangestat (which I helped to write).

You may find a speed difference; there is a price to pay for rangestat's generality.
Comment
Ken Yang

Join Date: Dec 2018

Posts: 24
#5

26 Mar 2019, 12:26

Thank you so much again Nick Cox.
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#6

28 Mar 2019, 06:53

I have the following comments on your code:

Code:

bys firm: gen t=_n tsset firm t asrol residuals, w(year 4) s(sd) g(sd)

Why do you use tsset firm t before asrol? The earlier version of asrol supported time series setting and would automatically pick those time and panel id dimensions, in the new version you need to explicitly use these variables in the asrol syntax. So if you need the statistics for each firm, the code is

Code:

bys firm: asrol residuals , w(year 4) s(sd) g(sd)

This is much faster than using by. Even for a bigger dataset, asrol is faster. Once instance in which asrol is slow is that when you have missing values in the timevar, ie. year here. So make sure this is not the case.

For the cross-sectional regression, you can also try asreg that is much faster. It also supports the noconstant option, you can try:

Code:

ssc install asreg bys industry year : asreg accruals cf_1lag cf cf_1lead rev ppe, noconstant

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment

Announcement

Speed up Rangestat and asrol

Comment

Comment

Comment

Comment

Comment