rolling and recursive regressions while storing fitted and residual values

Mike Kraft

Join Date: Dec 2014

Posts: 328
#1

rolling and recursive regressions while storing fitted and residual values

10 Jun 2016, 07:40

Dear all
I am looking for two alternatives to calculate fitted values and residuals (The dependent variable is ret and the independent variables are x y z ) from :
1- a rolling 5 year regression:
Here I want to run a regression using data from the most recent 5 years and calculate the fitted and residual, then move one year forward and drop one year and then calculate the fitted and residual. I understand that I will be able to calculate the fitted and residual values starting from the fifth year in this case.

In this option, can I do something like:
rolling _b[constant] _b[x] _b[y] _b[z] , window(5) : reg ret x y z
gen fitted=_b[_cons]+_b[x]+_b[y]+_b[z]
gen residual= ret-fitted

2- a recursive regression that adds one year each time
Here I want to run a regression using data from the most recent 5 years, calculate the fitted and residual values, then move one year forward WITHOUT dropping a year, i.e. the regression will be subsequently estimated for 6 years, then 7 years, then 8 years, and so one. I also want to calculate the fitted and residual values each time the regression is estimated.

In the second option, can I do something like:

rolling _b[constant] _b[x] _b[y] _b[z] , window(5) recursive : reg ret x y z
gen fitted=_b[_cons]+_b[x]+_b[y]+_b[z]
gen residual= ret-fitted

Thanks in a dvance
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

10 Jun 2016, 08:04

What is your question?

By the way, I think you mean

Code:

gen fitted = _b[_cons] + _b[x]*x+_b[y]+*y_b[z]*z
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#3

10 Jun 2016, 08:50

My question is that I want to calculate fitted values and residuals by two different approaches:

1-Using a 5 year rolling regression:
Here I run a regression using data for the first 5 years in the sample (for example 1990 to 1994) and calculate the fitted and residual, then move one year forward and drop one year (i.e. from 1991 to 1995) .

2- Using a recursive regression that adds one year each time
Here I want to run a regression using data for the first 5 years in the sample (for example 1990 to 1994), calculate the fitted and residual values, then move one year forward WITHOUT dropping a year (i.e. from 1990 to 1995, then 1990 to 1996, and so on).

Can you help please?
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#4

10 Jun 2016, 10:22

Is there an efficient way to get the fitted and residual values directly rather than estimating each coefficient, multiplying by the variables and adding them up? I have more variables than those here.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#5

10 Jun 2016, 12:27

Hi again
I revised the first option as following:
rolling _b[_cons] _b[x] _b[y] _b[z] _b[g] _b[d], window(5) : reg ret x y z g d
gen fitted=_b[_cons]+_b[x]*x+ _b[y]*y+ _b[z]*z+ _b[g]*g+ _b[d]*d
gen residual= ret-fitted

I get the following error message:
(running regress on estimation sample)
no; data in memory would be lost
r(4);

Can anyone help please?
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#6

10 Jun 2016, 13:07

I have also tried :

gen RES=.
capture program drop my_regress
program define my_regress, rclass
syntax varlist [if]
regress `varlist' `if'
tempvar resid
predict `resid' if e(sample), resid
replace RES=`resid'
exit
end

rolling RES, window(5)clear: my_regress ret x y z g d

The programme starts to run but produces the following:

Rolling replications (19)
1 ---+--- 2 ---+ -- 3 ---+--- 4 ---+--- 5
eeeeeeeeeeeeeeeeeee

-> permno = 10002

Rolling replications
1 ---+--- 2 ---+ -- 3 ---+--- 4 ---+--- 5

-> permno = 10010

Rolling replications
1 ---+--- 2 ---+ -- 3 ---+--- 4 ---+--- 5

-> permno = 10011

Rolling replications (2)
1 ---+--- 2 ---+ -- 3 ---+--- 4 ---+--- 5
ee

I appreciate any help please!!
Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

10 Jun 2016, 14:09

With regard to #5, -rolling- gives you two options as to what to do with its results. If you want them to replace the data in memory, you have to specify the -clear- option. If you want to save them to a Stata .dta file and leave the data in memory intact, then you have to specify the -saving()- option. If you specify neither you get the error message you found.

With regard to #6, it is hard to know exactly what is going wrong. But I see at least one error. You can't have -gen RES = .- in your program like that because after the first time my_regress gets called, RES will already exist, so the -gen RES = .- command will throw an error. The simplest way to get around that would be to have -gen RES = .- in the code before you call -rolling-, and then use only -replace RES = ...- in your program.

Unfortunately, that simplest way won't work for your purposes because RES will get written over with each successive rolling window, so you will not be left with the residuals you want. Actually, I'm not entirely sure what you want with residuals. After all each regression generates a residual for every observation, and each observation will participate in five different regressions (more or less), or even in more if you use the recursive window approach. But based on the code in your original post, I'm inferring that what you want is for each observation to keep the residual for the regression in which it serves as the last observation. If true, painful as it may seem, I think the most efficient way to go is your original approach: calculate that linear combination directly and subtract it from the observed value. The alternative is to modify my_regress to include -predict if e(sample)- and then identify the last residual, return that from my_regress, and have your rolling command pick that up. So overall it would look something like this (using the grunfeld data set as an example).

Code:

clear* capture program drop my_regress program define my_regress, sortpreserve rclass syntax [if] regress mvalue invest `if' tempvar r predict `r' if e(sample), resid tempvar in_sample gen byte `in_sample' = e(sample) sort `in_sample', stable return scalar residual = `r'[_N] exit end webuse grunfeld, clear regress mvalue invest in 1/5 predict resid, resid list resid in 1/10 replace resid = . rolling _b r(residual), window(5) keep(resid) clear: my_regress

Finally with regard to #1 and #3, I still don't understand what your question is. You have made a number of declarative statements. The only question you ask is "Can you help please?" Help with what?
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#8

10 Jun 2016, 16:13

Dear Clyde
Many thanks for your assistance. I am trying to understand the code.
Could you please clarify why do you use
regress mvalue invest in 1/5 ....why do you use in 1/5 . Is this related to the window length as well?
list resid in 1/10 ...........why do you need this? we do not want to get any specific stats afterwards, right?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

11 Jun 2016, 11:51

Oh, sorry. The part

Code:

regress mvalue invest in 1/5 predict resid, resid list resid in 1/10 replace resid = .

was just in there while I was testing it. It isn't necessary; I meant to edit it out before posting. You just need program my_regress and the -rolling- command. Sorry for the confusion.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#10

13 Jun 2016, 11:34

Dear Clyde

I run the code after removing the parts you suggessted as following

**
capture program drop my_regress
program define my_regress, sortpreserve rclass
syntax [if]
regress annual_ret x y z `if'
tempvar r
predict `r' if e(sample), resid
tempvar in_sample
gen byte `in_sample' = e(sample)
sort `in_sample', stable
return scalar residual = `r'[_N]
exit
end

rolling _b r(residual), window(5) keep(resid) clear: my_regress
**

But I got the error message:

keep() invalid: resid does not exist
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#11

13 Jun 2016, 13:22

Mike, sorry.

When I worked on your problem, I first tried an approach that involved saving the targeted residual in the original data set. I ultimately decided it was unworkable and opted instead for the approach in the version of -my_regress- that I posted, returning the targeted residual in r(). I needed to then purge the code of all references to the first approach, but apparently I was sloppy in the way I did it. I'm sorry for confusing you and delaying your work. Just remove the -keep(resid)- option from the -rolling- command.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#12

14 Jun 2016, 06:24

Thank You Clyde
I have been trying to run the code again on my panel data (for more than 60,000 firm-years). It appears that it has been taking hours....
I think these should be an efficient way to run a loop and produce the same results. But do know how this is possible.

Do you or other members have any suggestions about a more efficient loop?

Best wishes
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#13

14 Jun 2016, 14:39

Hi again
I left the program running and took hours till now. The problem is that even if works properly, it will be so time consuming once I try to re-run with different sets of variables as I aim to do.
Can a foreach or farval loop help to solve this problem and makes the code faster?
Thanks
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#14

14 Jun 2016, 14:46

In fact, with a simple calculation for the number of panels and the time Stat takes to "roll" through each panel, I would need not less than 70 hours to get this done for a one set of variables!
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#15

14 Jun 2016, 15:26

Take a look at rangestat (from SSC). Here are a few recent examples of performing regressions over a rolling window:

http://www.statalist.org/forums/foru...rolling-window
http://www.statalist.org/forums/foru...ow-regressions

In the second example, rangestat can perform over 3 million regressions a minute.
Comment

Announcement

rolling and recursive regressions while storing fitted and residual values

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment