Rolling-window and recursive estimation and forecasting

ispanyol

Join Date: May 2014
Posts: 17

Rolling-window and recursive estimation and forecasting

16 Mar 2017, 03:15

Hello everyone
I need your help
I understand what is rolling and recursive estimation and I used in my computer.

My question is how can I forecast dependent variable with this methods?

When I used "predict"

predict dln_inv

stata said that

last estimates not found

My code:

webuse lutkepohl2
tsset qtr
rolling _b, window(30) clear : regress dln_inv dln_inc dln_consump

list in 1/10, abbrev(14)

+-----------------------------------------------------------+

start end _b_dln_inc _b_dln_consump _b_cons

-----------------------------------------------------------

1. 1960q1 1967q2 .1054375 1.263474 -.0101802

2. 1960q2 1967q3 .1542573 1.251464 -.0113987

3. 1960q3 1967q4 .2400457 1.001518 -.0048182

4. 1960q4 1968q1 .0053584 1.202571 -.0067967

5. 1961q1 1968q2 .012656 1.187025 -.006777

-----------------------------------------------------------

6. 1961q2 1968q3 -.0790168 1.094311 -.0048056

7. 1961q3 1968q4 .0205408 .964076 -.0018992

8. 1961q4 1969q1 -.1895722 1.169699 -.0022988

9. 1962q1 1969q2 -.2074511 1.271727 -.002647

10. 1962q2 1969q3 -.0170991 1.187241 -.0051391

Thanks

Tags: None

Chris Engel

Join Date: May 2016
Posts: 21

16 Mar 2017, 08:27

Try the example in the Stata docs (page 7):

http://www.stata.com/manuals13/tsrolling.pdf

I found it very helpful when I had to do the same thing. Here's a snippet:

Code:

program myforecast, rclass
    syntax [if]
    regress ibm L.ibm L.spx ‘if’
    // Find last time period of estimation sample and
    // make forecast for period just after that
    summ t if e(sample)
    local last = r(max)
    local fcast = _b[_cons] + _b[L.ibm]*ibm[‘last’] + ///
                                             _b[L.spx]*spx[‘last’]
    return scalar forecast = ‘fcast’
    // Next period’s actual return
    // Will return missing value for final period
    return scalar actual = ibm[‘last’+1]
end

Then using this program you incorporate into "rolling":

Code:

rolling actual=r(actual) forecast=r(forecast), recursive window(20): myforecast

So this creates the vars "actual" and "forecast" which can you use to compare.

Obviously you can adjust the parameters and such to meet your specifications.

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

16 Mar 2017, 11:05

Here's a much more efficient way to perform a rolling regression with a recursive window using rangestat (from SSC). See my earlier post today for an example with a fixed window and with panel data. The following replicates the results from example 3 on page 7 of http://www.stata.com/manuals13/tsrolling.pdf:

Code:

clear all

* --------- basic regression mata code: DO NOT CHANGE CODE BELOW ---------------
* linear regression in Mata using quadcross() - help mata cross(), example 2
mata:
mata clear
mata set matastrict on
real rowvector myreg(real matrix Xall)
{
    real colvector y, b, Xy
    real matrix X, XX

    y = Xall[.,1]                // dependent var is first column of Xall
    X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
    X = X,J(rows(X),1,1)         // add a constant
    
    XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
    Xy = quadcross(X, y)
    b  = invsym(XX) * Xy
    
    return(rows(X), b')
}
end
* --------- end of basic regression mata code: DO NOT CHANGE CODE ABOVE --------

* replicate http://www.stata.com/manuals13/tsrolling.pdf, example 3, p. 7
use http://www.stata-press.com/data/r13/ibm, clear
tsset t
gen double L_ibm = L.ibm
gen double L_spx = L.spx

* for each observation, the sample starts with the first observation
* and ends at the current observation.
sum t
gen low = r(min)
rangestat (myreg) ibm L_ibm L_spx, interval(t low 0) casewise
rename myreg* (obs b_L_ibm_Return b_L_spx b_cons)

* limit results to t >= 20
gen forecast0 = b_cons + b_L_ibm_Return * ibm + b_L_spx * spx if t >= 20
gen actual0 = F.ibm if t >= 20
corr actual0 forecast0
save "rangestat_results.dta", replace

* repeat using the manual's code on page 7
program myforecast, rclass
    syntax [if]
    regress ibm L.ibm L.spx `if'
    // Find last time period of estimation sample and
    // make forecast for period just after that
    summ t if e(sample)
    local last = r(max)
    local fcast = _b[_cons] + _b[L.ibm]*ibm[`last'] + ///
                  _b[L.spx]*spx[`last']
    return scalar forecast = `fcast'
    // Next period’s actual return
    // Will return missing value for final period
    return scalar actual = ibm[`last'+1]
end

rolling actual=r(actual) forecast=r(forecast), recursive window(20): myforecast
corr actual forecast

* combine the rolling results with the original data plus rangestat results
rename end t
merge 1:1 t using "rangestat_results.dta", assert(match using) nogen
sort t

* show that the results match
gen dforecast = abs(forecast0 - forecast)
gen dactual = abs(actual0 - actual)
sum dforecast dactual

Comment

ispanyol

Join Date: May 2014

Posts: 17
#4

16 Mar 2017, 14:02

Originally posted by Chris Engel View Post

Try the example in the Stata docs (page 7):

http://www.stata.com/manuals13/tsrolling.pdf

I found it very helpful when I had to do the same thing. Here's a snippet:

Code:

program myforecast, rclass syntax [if] regress ibm L.ibm L.spx ‘if’ // Find last time period of estimation sample and // make forecast for period just after that summ t if e(sample) local last = r(max) local fcast = _b[_cons] + _b[L.ibm]*ibm[‘last’] + /// _b[L.spx]*spx[‘last’] return scalar forecast = ‘fcast’ // Next period’s actual return // Will return missing value for final period return scalar actual = ibm[‘last’+1] end

Then using this program you incorporate into "rolling":

Code:

rolling actual=r(actual) forecast=r(forecast), recursive window(20): myforecast

So this creates the vars "actual" and "forecast" which can you use to compare.

Obviously you can adjust the parameters and such to meet your specifications.

Dear Chris

. rolling actual=r(actual) forecast=r(forecast), recursive window(20): myforecast
(running myforecast on estimation sample)
‘if’ invalid name
an error occurred when rolling executed myforecast
r(198);

Can you send me your working file ? or only log?
Comment

ispanyol

Join Date: May 2014
Posts: 17

16 Mar 2017, 14:12

Originally posted by Robert Picard View Post

Code:

clear all

* --------- basic regression mata code: DO NOT CHANGE CODE BELOW ---------------
* linear regression in Mata using quadcross() - help mata cross(), example 2
mata:
mata clear
mata set matastrict on
real rowvector myreg(real matrix Xall)
{
real colvector y, b, Xy
real matrix X, XX

y = Xall[.,1] // dependent var is first column of Xall
X = Xall[.,2::cols(Xall)] // the remaining cols are the independent variables
X = X,J(rows(X),1,1) // add a constant

XX = quadcross(X, X) // linear regression, see help mata cross(), example 2
Xy = quadcross(X, y)
b = invsym(XX) * Xy

return(rows(X), b')
}
end
* --------- end of basic regression mata code: DO NOT CHANGE CODE ABOVE --------

* replicate http://www.stata.com/manuals13/tsrolling.pdf, example 3, p. 7
use http://www.stata-press.com/data/r13/ibm, clear
tsset t
gen double L_ibm = L.ibm
gen double L_spx = L.spx

* for each observation, the sample starts with the first observation
* and ends at the current observation.
sum t
gen low = r(min)
rangestat (myreg) ibm L_ibm L_spx, interval(t low 0) casewise
rename myreg* (obs b_L_ibm_Return b_L_spx b_cons)

* limit results to t >= 20
gen forecast0 = b_cons + b_L_ibm_Return * ibm + b_L_spx * spx if t >= 20
gen actual0 = F.ibm if t >= 20
corr actual0 forecast0
save "rangestat_results.dta", replace

* repeat using the manual's code on page 7
program myforecast, rclass
syntax [if]
regress ibm L.ibm L.spx `if'
// Find last time period of estimation sample and
// make forecast for period just after that
summ t if e(sample)
local last = r(max)
local fcast = _b[_cons] + _b[L.ibm]*ibm[`last'] + ///
_b[L.spx]*spx[`last']
return scalar forecast = `fcast'
// Next period’s actual return
// Will return missing value for final period
return scalar actual = ibm[`last'+1]
end

rolling actual=r(actual) forecast=r(forecast), recursive window(20): myforecast
corr actual forecast

* combine the rolling results with the original data plus rangestat results
rename end t
merge 1:1 t using "rangestat_results.dta", assert(match using) nogen
sort t

* show that the results match
gen dforecast = abs(forecast0 - forecast)
gen dactual = abs(actual0 - actual)
sum dforecast dactual

Dear Robert

y = Xall[.,1] // dependent var is first column of Xall
invalid expression

Last edited by ispanyol; 16 Mar 2017, 14:21.

Comment

ispanyol

Join Date: May 2014

Posts: 17
#6

16 Mar 2017, 14:22

I want to ask you about forecasting strategies.

There is 2 or 3 out-of sample forecasting methods?

Prof. Zivot said that there is 2 methods (rolling and recursive)

https://faculty.washington.edu/ezivo...evaluation.pdf

Prof. West said that there is 3 methods (fixed, rolling and recursive)(page 107)

http://www.ssc.wisc.edu/~kwest/publi...Evaluation.pdf

1----When we use Stata for forecasting, which method we use?

2---Do you have any add-on/codes for these methods for Stata ?

Sincerely
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#7

16 Mar 2017, 14:25

Re #5: You need to copy the code into a do-file and run it as a whole.

Re #6: Above my pay grade.
Comment
ispanyol

Join Date: May 2014

Posts: 17
#8

17 Mar 2017, 09:44

Dear Robert
Can you modify your codes according to following ?
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

17 Mar 2017, 14:23

You are not trying very hard, you have a fully functioning example to work with. As the picture you posted shows, the only difference between a rolling window and a recursive (rolling) window is the start period.

It's important to understand that in both rolling and recursive windows, time moves ahead by one period. This means that you have to estimate the model at each period. Before you try to put together a complete solution, you should be able to write down the code that will do what you want for a specific window sample. Say we use the period in observation 50 as the end period for the window. The following, inspired by what you showed in #1, estimate the model using a recursive window that starts at the first observation and ends in the quarter of the 50th observation. The code predicts the value of the dependent variable for the 50th observation and makes out of sample predictions for the next 3 quarters using the estimated coefficients. I show how to do this using Stata's predict command and also how to calculate these manually

Code:

webuse lutkepohl2, clear
tsset qtr

* window bounds for a recursive window that end at the 50th observation
list qtr in 1
list qtr in 50

* estimate model
regress dln_inv dln_inc dln_consump if qtr <= qtr[50]

* predict the 50th observation and forecast the next 3
predict xb
list qtr xb in 50/53

* predict and forecast manually
dis _b[_cons] + _b[dln_inc]*dln_inc[50] + _b[dln_consump]*dln_consump[50]
dis _b[_cons] + _b[dln_inc]*dln_inc[51] + _b[dln_consump]*dln_consump[51]
dis _b[_cons] + _b[dln_inc]*dln_inc[52] + _b[dln_consump]*dln_consump[52]
dis _b[_cons] + _b[dln_inc]*dln_inc[53] + _b[dln_consump]*dln_consump[53]

If you want to use a rolling window instead, the only thing that changes is the start of the window. Let's say that the rolling window should include 5 quarters:

Code:

webuse lutkepohl2, clear
tsset qtr

* window bounds for a rolling window that end at the 50th observation
list qtr in 46
list qtr in 50

* estimate model
regress dln_inv dln_inc dln_consump if inrange(qtr, qtr[46], qtr[50])

* predict the 50th observation and forecast the next 3
predict xb
list qtr xb in 50/53

* predict and forecast manually
dis _b[_cons] + _b[dln_inc]*dln_inc[50] + _b[dln_consump]*dln_consump[50]
dis _b[_cons] + _b[dln_inc]*dln_inc[51] + _b[dln_consump]*dln_consump[51]
dis _b[_cons] + _b[dln_inc]*dln_inc[52] + _b[dln_consump]*dln_consump[52]
dis _b[_cons] + _b[dln_inc]*dln_inc[53] + _b[dln_consump]*dln_consump[53]

You will need to repeat this process for each period in the data that terminates a rolling window. This can be done by extending the code above using a loop. You can also use Stata's rolling command. The following uses rangestat because it is vastly more efficient computationally. First, for the recursive window (note that this code should be copied to a new do-file and run as a whole; do not try to cut and paste it directly into Stata's Command window):

Code:

webuse lutkepohl2, clear
tsset qtr

* --------- basic regression mata code: DO NOT CHANGE CODE BELOW ---------------
* linear regression in Mata using quadcross() - help mata cross(), example 2
mata:
mata clear
mata set matastrict on
real rowvector myreg(real matrix Xall)
{
    real colvector y, b, Xy
    real matrix X, XX

    y = Xall[.,1]                // dependent var is first column of Xall
    X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
    X = X,J(rows(X),1,1)         // add a constant
    
    XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
    Xy = quadcross(X, y)
    b  = invsym(XX) * Xy
    
    return(rows(X), b')
}
end
* --------- end of basic regression mata code: DO NOT CHANGE CODE ABOVE --------

gen low = qtr[1]
rangestat (myreg) dln_inv dln_inc dln_consump, interval(qtr low 0) casewise
rename myreg* (obs b_dln_inc b_dln_consump b_cons)
gen xb0 = b_cons + b_dln_inc*dln_inc + b_dln_consump*dln_consump
gen xb1 = b_cons + b_dln_inc*F.dln_inc + b_dln_consump*F.dln_consump
gen xb2 = b_cons + b_dln_inc*F2.dln_inc + b_dln_consump*F2.dln_consump
gen xb3 = b_cons + b_dln_inc*F3.dln_inc + b_dln_consump*F3.dln_consump

list in 50

You can see that the results using the window that ends at the 50th observation forecast the same values as the individual case (first code block above).

Now do the same using a rolling window of 5 quarters. Note that the only change is the window start period. Again, this code should be copied to a new do-file and run as a whole; do not try to cut and paste it directly into Stata's Command window.

Code:

* --------- basic regression mata code: DO NOT CHANGE CODE BELOW ---------------
* linear regression in Mata using quadcross() - help mata cross(), example 2
mata:
mata clear
mata set matastrict on
real rowvector myreg(real matrix Xall)
{
    real colvector y, b, Xy
    real matrix X, XX

    y = Xall[.,1]                // dependent var is first column of Xall
    X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
    X = X,J(rows(X),1,1)         // add a constant
    
    XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
    Xy = quadcross(X, y)
    b  = invsym(XX) * Xy
    
    return(rows(X), b')
}
end
* --------- end of basic regression mata code: DO NOT CHANGE CODE ABOVE --------

rangestat (myreg) dln_inv dln_inc dln_consump, interval(qtr -4 0) casewise
rename myreg* (obs b_dln_inc b_dln_consump b_cons)
gen xb0 = b_cons + b_dln_inc*dln_inc + b_dln_consump*dln_consump
gen xb1 = b_cons + b_dln_inc*F.dln_inc + b_dln_consump*F.dln_consump
gen xb2 = b_cons + b_dln_inc*F2.dln_inc + b_dln_consump*F2.dln_consump
gen xb3 = b_cons + b_dln_inc*F3.dln_inc + b_dln_consump*F3.dln_consump

list in 50

Again, you can check that these match the results using the individual case in the second code block above.

The only thing I haven't discussed is the number of observations that are used in each estimation. In the examples above, the variable obs indicates the number of observations in the regression sample for the window that ends in that quarter. You may wish to ignore the results from a certain number of cases initially and/or require a minimum sample per window. rangestat will not do this for you, you have to decide when to reject results because of insufficient sample.

Last edited by Robert Picard; 17 Mar 2017, 14:27.

Comment

ispanyol

Join Date: May 2014

Posts: 17
#10

19 Mar 2017, 15:15

Dear @RobertPicard
Thank for your interest
But in your code which executes "rolling estimation", is there any problem ?
Because rolling estimation should execute following algorithm

xb0= 1 to 50 [ 1960q1 - 1972q2 ]
xb1=2 to 51 [ 1960q2 - 1972q3 ]
xb2=3 to 52 [ 1960q3 - 1972q4 ]
xb3=4 to 53 [ 1960q4 - 1973q1 ]

Sincerely
Engin
Comment
ispanyol

Join Date: May 2014

Posts: 17
#11

27 Mar 2017, 04:22

Dear @RobertPicard
Comment

John Baxter

Join Date: Sep 2021
Posts: 14

#12

04 Oct 2021, 23:18

Hi everyone,

I am quite new to Stata. I would like to do out-of-sample forecasting with rolling/recursive regression, and I find the Robert Picard's code on March 16, 2017 runs very well. However, I would like to adjust it. If I understand the code correctly, the way it works is that:
it esimate the model based on observations 1-20, and forecast the #20 based on the estimation results.
it esimate the model based on observations 1-21, and forecast the #21 based on the estimation results. .......
I would like to do what I think to be out-of-sample forecasts as follows:
it esimate the model based on observations 1-20, and forecast the #21 based on the estimation results.
it esimate the model based on observations 1-21, and forecast the #22 based on the estimation results. .......

How should I modify the code to do it? Thank you!!

Originally posted by Robert Picard View Post

Code:

clear all

* --------- basic regression mata code: DO NOT CHANGE CODE BELOW ---------------
* linear regression in Mata using quadcross() - help mata cross(), example 2
mata:
mata clear
mata set matastrict on
real rowvector myreg(real matrix Xall)
{
real colvector y, b, Xy
real matrix X, XX

y = Xall[.,1] // dependent var is first column of Xall
X = Xall[.,2::cols(Xall)] // the remaining cols are the independent variables
X = X,J(rows(X),1,1) // add a constant

XX = quadcross(X, X) // linear regression, see help mata cross(), example 2
Xy = quadcross(X, y)
b = invsym(XX) * Xy

return(rows(X), b')
}
end
* --------- end of basic regression mata code: DO NOT CHANGE CODE ABOVE --------

* replicate http://www.stata.com/manuals13/tsrolling.pdf, example 3, p. 7
use http://www.stata-press.com/data/r13/ibm, clear
tsset t
gen double L_ibm = L.ibm
gen double L_spx = L.spx

* for each observation, the sample starts with the first observation
* and ends at the current observation.
sum t
gen low = r(min)
rangestat (myreg) ibm L_ibm L_spx, interval(t low 0) casewise
rename myreg* (obs b_L_ibm_Return b_L_spx b_cons)

* limit results to t >= 20
gen forecast0 = b_cons + b_L_ibm_Return * ibm + b_L_spx * spx if t >= 20
gen actual0 = F.ibm if t >= 20
corr actual0 forecast0
save "rangestat_results.dta", replace

* repeat using the manual's code on page 7
program myforecast, rclass
syntax [if]
regress ibm L.ibm L.spx `if'
// Find last time period of estimation sample and
// make forecast for period just after that
summ t if e(sample)
local last = r(max)
local fcast = _b[_cons] + _b[L.ibm]*ibm[`last'] + ///
_b[L.spx]*spx[`last']
return scalar forecast = `fcast'
// Next period’s actual return
// Will return missing value for final period
return scalar actual = ibm[`last'+1]
end

rolling actual=r(actual) forecast=r(forecast), recursive window(20): myforecast
corr actual forecast

* combine the rolling results with the original data plus rangestat results
rename end t
merge 1:1 t using "rangestat_results.dta", assert(match using) nogen
sort t

* show that the results match
gen dforecast = abs(forecast0 - forecast)
gen dactual = abs(actual0 - actual)
sum dforecast dactual

Announcement

Rolling-window and recursive estimation and forecasting

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment