Speeding up loop of time series regressions in mata

Timo Bartels

Join Date: Dec 2017

Posts: 2
#1

Speeding up loop of time series regressions in mata

08 Dec 2017, 04:42

Dear Statalist,
I have a code (loop) to run time series regressions for each company (Identifier: lpermno) of prices (prccq) on earnings per share (epspiq). The code looks like this:

Code:

gen alpha =. gen alpha_stderr =. gen beta =. gen beta_stderr =. levelsof lpermno, local(levels) foreach i of local levels { reg prccq epspiq if lpermno== `i',robust replace alpha = _b[_cons] if lpermno == `i' replace alpha_stderr = _se[_cons] if lpermno == `i' replace beta = _b[epspiq] if lpermno == `i' replace beta_stderr = _se[epspiq] if lpermno == `i' } gen alpha_tstat = alpha / alpha_stderr gen beta_tstat = beta / beta_stderr save alphas_per_lpermno, replace

It gives back the alpha, beta and their standard errors from the regressions in new columns of my original data.
Unfortunately my code is way too slow. To run through once it takes 1.5 to 2 hours (I have almost 700,000 observations).

Is it possible to speed this up in mata? I’ve been trying to adapt a rangestat code to my needs (I used this one from Robert Picard as a starting point: https://www.statalist.org/forums/for...tas-for-stocks) but unfortunately was unsuccessful.

Any advice is appreciated.

Thanks in advance,
Timo
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#2

08 Dec 2017, 08:11

I don't think Mata is relevant here, and I'd suggest you move your question to the Stata forum. It's possible the slowness comes from your "replace ... if lpermno ..." qualifier, which often is slow. It's also possible that the simple effort of doing a large number of regressions (you didn't mention the number of levels of lpermno, but I'm guessing) that slows things down.

In any event, what you are doing looks to me like a task for -statsby-. It would accumulate the statistics in another file, and you can then merge that with your data file. It's possible that -rangestat- could help, too, but you might as well try something built-in first.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

08 Dec 2017, 08:33

The (reg) function of rangestat can perform regressions on by-groups like this but not with the robust option. However, I showed how to do this in a recent thread. Adapted to the nomenclature of this thread, this would look like (runs in less than 2 seconds on my computer):

Code:

* come on finance people, it's not that hard to create a demonstration dataset!
clear all
set seed 123
set obs 7000
gen lpermno = _n
expand 100
bysort lpermno: gen time = _n
gen prccq = runiform()
gen epspiq = runiform()


* this is a copy of the rs_reg() regression function used by rangestat,
* modified to calculate robust standard errors

mata:
// http://blog.stata.com/2016/01/12/programming-an-estimation-command-in-stata-an-ols-command-using-mata/
real rowvector reg_robust(real matrix Xall)
{
    real colvector y, b, e, e2, se
    real matrix X, XpX, XpXi, V, M
    real scalar n, k, ymean, tss, mss, r2, r2a
    
    y = Xall[.,1]                // dependent var is first column of Xall
    X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
    n = rows(X)                  // the number of observations
    X = X,J(n,1,1)               // add a constant
    k = cols(X)                      // number of independent variables
    
    if (n > k) {    // need more than k obs to estimate model and standard errors
    
        // compute the OLS point estimates
        XpX  = quadcross(X, X)
        XpXi = invsym(XpX)
    
        // proceed only if no variable is omitted
        if (diag0cnt(XpXi)==0) {
        

            b    = XpXi*quadcross(X, y)
        
            // compute robust standard errors
            e  = y - X*b
            e2 = e:^2
            M    = quadcross(X, e2, X)
            V    = (n/(n-k))*XpXi*M*XpXi
            se   = sqrt(diagonal(V))
        
            // r2 and adjusted r2
            ymean = mean(y)
            tss   = sum((y :- ymean) :^ 2)        // total sum of squares
            mss   = sum( (X * b :- ymean)  :^ 2)  // model sum of squares    
            r2    = mss / tss
            r2a   = 1 - (1 - r2) * (n - 1) / (n - k)
        }
    }
    
    return(rows(X), r2, r2a, b', se')
}
end

rangestat (reg_robust) prccq epspiq, interval(time . .) by(lpermno)
rename reg_robust1 reg_nobs
rename reg_robust2 reg_r2
rename reg_robust3 reg_r2a
rename reg_robust4 beta
rename reg_robust5 alpha
rename reg_robust6 beta_stderr
rename reg_robust7 alpha_stderr

Here's a spot check for company 2

Code:

. * spot check for company 2
. reg prccq epspiq if lpermno== 2,robust

Linear regression                               Number of obs     =        100
                                                F(1, 98)          =       0.24
                                                Prob > F          =     0.6237
                                                R-squared         =     0.0022
                                                Root MSE          =     .28681

------------------------------------------------------------------------------
             |               Robust
       prccq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      epspiq |   .0454931   .0924296     0.49   0.624    -.1379305    .2289167
       _cons |   .5679693   .0545472    10.41   0.000     .4597222    .6762163
------------------------------------------------------------------------------

. list in 101

     +------------------------------------------------------------------------------------------------------------------------+
     | lpermno   time      prccq     epspiq   reg_nobs     reg_r2      reg_r2a       beta       alpha   beta_st~r   alpha_s~r |
     |------------------------------------------------------------------------------------------------------------------------|
101. |       2      1   .7755345   .7378782        100   .0022436   -.00793759   .0454931   .56796925   .09242964   .05454715 |
     +------------------------------------------------------------------------------------------------------------------------+

Comment

Timo Bartels

Join Date: Dec 2017

Posts: 2
#4

08 Dec 2017, 11:31

Thank you for your prompt responses. Robert Picard the rangestat code you provided works perfectly well. It now takes about 10 seconds to do the regressions for my complete dataset, which is an improvement I did not expect but I am very grateful for. As far as I could check until now the regressions are calculated correctly. Thank you very much!
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#5

08 Dec 2017, 11:35

Great! For completeness, I'll add that you can use runby (from SSC) to do the same calculations. runby is a general tool to run Stata code on by-groups of observations and is more efficient than statsby. It will be a bit slower than the rangestat solution above due to the extra overhead related to using regress. The code below repeats the same as above and runs in about 35 seconds on my computer.

Code:

* come on finance people, it's not that hard to create a demonstration dataset! clear all set seed 123 set obs 7000 gen lpermno = _n expand 100 bysort lpermno: gen time = _n gen prccq = runiform() gen epspiq = runiform() program my_reg reg prccq epspiq, robust gen alpha = _b[_cons] gen alpha_stderr = _se[_cons] gen beta = _b[epspiq] gen beta_stderr = _se[epspiq] end runby my_reg, by(lpermno) status
Comment
Happy Glider

Join Date: Nov 2017

Posts: 6
#6

02 Mar 2018, 21:03

Hi again,

if I use the above mentioned code twice in a Do-file (I acutally change it to run regressions on different variables and also change the names of the variables containing the results), I receive the following error message:

"reg_robust() already exists
(41 lines skipped)".

The command "clear" which I usually used does not seem to do the trick here. Only restarting Stata worked for me until now.

What can I do to be able to run the rangestat code twice in a Do-file, without having to restart Stata?

Thanks in advance.
Best,
Timo
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#7

03 Mar 2018, 08:09

Hint: the example starts with

Code:

clear all
Comment

Announcement

Speeding up loop of time series regressions in mata

Comment

Comment

Comment

Comment

Comment

Comment