Getting the variance of residuals in a rolling regression

Andre Sacras

Join Date: Apr 2015

Posts: 2
#1

Getting the variance of residuals in a rolling regression

07 Apr 2015, 19:37

Hello! I have a panel dataset with around 10000 companies and I want to perform rolling regressions while obtaining the variance of the residuals for each regression performed. I can't use the rolling command and e(rmse) as the regressions suffer from heteroscedasticity and as such rmse is not the standard deviation of the residuals.

I would like to do something like this:

First window:
Reg Y X
Calculate residuals
Compute Standard Deviation of residuals
Store result in "new variable" 1st cell

Second window:
Reg Y X
Calculate residuals
Compute Standard Deviation of residuals
Store result in "new variable" 2nd cell

And keep going like this.

I probably need to write this in code but unfortunately I have never coded in stata before.

Please help either by providing an alternative way of doing this (as it seems rolling cannot) or by helping me code it.

Thank you in advance!

Last edited by Andre Sacras; 07 Apr 2015, 19:58.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

07 Apr 2015, 20:13

I think you can still use -rolling-, you just have to write a program that wraps your regression and residual calculations and returns the standard deviation of residuals. So it would be something like this:

Code:

capture program drop my_regress program define my_regress, rclass syntax varlist [if] regress `varlist' `if' tempvar resid predict `resid' if e(sample), resid summ `resid' return scalar sdr = r(sd) exit end

And then you can invoke that with something like:

Code:

rolling sd_resid = r(sdr) ...: my_regress Y X

You would, of course, replace the ... with whatever -rolling- syntax you would have used otherwise. This my_regress program is a bare-bones version that will just minimally accomplish what you set out in your post. If there is more to your problem than you describe, then you may need to embellish my_regress to accomplish that.

I think this approach will work, and it would be simpler than attempting to hand-code the management of the rolling windows that -rolling- does for you.
Comment
Andre Sacras

Join Date: Apr 2015

Posts: 2
#3

25 Apr 2015, 06:52

Thank you very much Clyde Schechter! Worked like a charm!
Comment
Pawel Mac

Join Date: Jun 2015

Posts: 1
#4

12 Jun 2015, 10:57

Hello ,

I had the same problem actually, but the above solution did not work (I waited several hutrs and got only a tiny percent of solutions, from over 250 000 observations).

I work on a database with monthly observations of Dow Jones stocks performance, consisting of ID, date in months and return

My aim is to obtain for every observation a 12 month variance of monthly returns. As already stated the following formula is too slow:

rolling r(return), window(12) clear: summarize return

I would really appreciate any help because I could not find any solution for last three days.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#5

12 Jun 2015, 11:21

You may find the thread at http://www.statalist.org/forums/foru...faster-program helpful here, especially #4. It deals with -statsby- instead of -rolling-, but the overall approach would be the same. -rolling- is just a wrapper command that repeatedly applies -if- conditions to your command and then posts the results to another file (on disk)

You first have to strongly balance the data set with -tsfill, full-. Then you can set up your various windows for summarize using -in- rather than -if- qualifiers. Qualifying with -in- is much faster, O(1), then with -if-, O(N), and -rolling- uses -if- repeatedly. The key is just calculating the right values to put in the -in- condition. It's just some simple algebra because every panel has the same number of observations now. Also, by setting up variables in the original data set to receive the results and using -replace- to update the appropriate observation (again with an -in- condition) you avoid a lot of time spent writing things to a disk file. The overall speedup should be quite appreciable.
Comment
Janys Ung

Join Date: Dec 2016

Posts: 35
#6

05 Jan 2017, 07:02

Hi Clyde Schechter,

I am new to STATA and would like to store the residuals after each rolling regression. I found that your code could be useful to me. However, I am confused with following code, could you please explain more on it? What should I include in the -if- condition?
syntax varlist [if] regress `varlist' `if'

I would greatly appreciate your help.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#7

05 Jan 2017, 09:17

What should I include in the -if- condition?

Nothing.

When you run -rolling-, the code in -rolling- supplies an -if- condition when it calls my_regress, the if-condition being one that identifies the observations to be included in its current interation. This is done for you automatically and you don't have to specify anything there. You just need to have the -[if]- and `if' parts present in the syntax in my_regress so that -rolling- can use it properly.
1 like
Comment
Janys Ung

Join Date: Dec 2016

Posts: 35
#8

05 Jan 2017, 11:25

Hi Clyde Schechter,

Many thanks for your reply.

What should I do if I want to store the last residual from each rolling window? For example:

First window:
Period: 1966m1 to 1970m12
Regress Y on X
Store the last residual in 1970m12

Second window:
Period: 1966m2 to 1971m1
Regress Y on X
Store the last residual in 1971m1

I think I should do some alterations to the code you provided but I don't have any ideas on how to do it. Would greatly appreciate if you could help me further.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#9

05 Jan 2017, 11:56

Since you tacked your post in #6 onto an existing thread, I assumed in #7 that you had exactly the same problem and requirements as those of the original question in #1 and were just asking for an explanation of how the code works. Now it appears you have something different, and -rolling- may not be an appropriate approach to this.

What is your data structure? If you have a single time series, this can be done . If you have panel data, then there is a "last residual" in each panel, and it would be impractical, at best, to do this using -rolling-; a different approach is required. Please elaborate about your problem, and include a small example of your data using the -dataex- command. (Run -ssc install dataex- to get it; -help dataex- will give you instructions for using it.)
1 like
Comment
Janys Ung

Join Date: Dec 2016

Posts: 35
#10

05 Jan 2017, 14:03

Hi Clyde Schechter,

Many thanks for your reply.

Following is a small example of my data. I have three independent variables (x1, x2, and x3) and the dependent variable, y. I would like to run a rolling window regression with a window size of 60 months and store only the residual from the last observation in each window (not all 60 residuals in each window). For example, in my first window (Jan 1966 - Dec 1970), I would like to store only the residual in Dec 1970 (last observation in the first window). For the second window (Feb 1966 - Jan 1971), I would like to store only the residual in Jan 1971 (last observation in the second window). The same process continues up to Dec 2014.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str8 date float(y x1 x2 x3) "Jan-1966" -.018 .096 .169 .065 "Feb-1966" -.022 .096 .12 .055 "Mar-1966" .02 .097 .099 .069 "Apr-1966" -.056 .094 .086 .066 "May-1966" -.016 .096 .058 .055 "Jun-1966" -.014 .092 .097 .077 "Jul-1966" -.081 .088 .09 .066 "Aug-1966" -.007 .084 .081 .059 "Sep-1966" .046 .092 .05 .054 "Oct-1966" .003 .088 .043 .013 end
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30095

#11

05 Jan 2017, 15:31

OK. While you could mark up program my_regress and use it with -rolling- to do this, I think that it's easier to just do it in a simple loop.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str8 date float(y x1 x2 x3)
"Jan-1966" -.018 .096 .169 .065
"Feb-1966" -.022 .096  .12 .055
"Mar-1966"   .02 .097 .099 .069
"Apr-1966" -.056 .094 .086 .066
"May-1966" -.016 .096 .058 .055
"Jun-1966" -.014 .092 .097 .077
"Jul-1966" -.081 .088  .09 .066
"Aug-1966" -.007 .084 .081 .059
"Sep-1966"  .046 .092  .05 .054
"Oct-1966"  .003 .088 .043 .013
end
gen mdate = monthly(date, "MY")
format mdate %tm

isid mdate, sort
gen last_residual = .
tempvar holding

local window 4
summ mdate, meanonly
local first_date = r(min) 
local last_date = r(max) - `window' + 1

forvalues m = `first_date'/`last_date' {
    regress y x1 x2 x3 if inrange(mdate, `m', `m'+`window'-1)
    predict `holding', resid
    replace last_residual = `holding' if mdate == `m' + `window' - 1
    drop `holding'
}

Notes:

1. Dates represented as strings are nearly useless in Stata. So I created a Stata internal format numerical monthly date to work with.
2. This code uses a window of 4 months, just for demonstration. Change 4 to 60 when running with your real data.
3. This code assumes that there is only one observation for each month, and verifies this assumption in an -assert- statement.

Comment

Janys Ung

Join Date: Dec 2016

Posts: 35
#12

05 Jan 2017, 16:14

Thank you very much! I got my problem solved!
Comment
Jimmy Kuo

Join Date: Apr 2017

Posts: 1
#13

16 Apr 2017, 00:26

Hello Clyde,

I am using your my_regress code, but my variable X is the one-quarter lag of Y, L1.Y. but the program came out to be, " factor variables and time-series operators not allowed
an error occurred when rolling executed my_regress." If I use [_n-1] instead, it says weights not allowed.

How can I fix this?

Many thanks for your help
Comment

Announcement