Rolling Window Regression: Combining the -rolling- command with the -program- command

Carl Groesbrink

Join Date: Apr 2017
Posts: 8

Rolling Window Regression: Combining the -rolling- command with the -program- command

28 Apr 2017, 04:28

Hi everyone,

I'm trying to conduct a rolling window regression by using Stata's -rolling- command and generating/saving certain results after each window in a matrix. To achieve this object I wrote a wrapper function by using Stata's -program- command and named it "MyRegression". The Code itself in the function "MyRegression" is working totaly fine.
But the problem is, instead of conducting a real rolling window regression by using only the observations of the window (number of observations 52), in each single regression all observations are used (number of observations 2036). Meaning in each window 2036 observaions are used instead of 52 observations.
I'm using the variable "AssetReturn" as an dependent variable and the two variables "MarketReturn" and "InterestReturn" as regressors.

How do I have to adjust my code to achieve this objective?
Finally, this code should be used in a loop to use different dependent and independent variables in the regression.

In the following the code I used up to now:

Code:

program MyRegression,
        
        // Defining a local macro, used for the rownames of resultmatrix.
        local Regressand `Regressand' AssetReturn
        
        // Calculating number of lags needed for -newey- regression.
        quietly summarize AssetReturn
        scalar Lagorder = floor(4*(r(N)/100)^(2/9))    
        
        // Newey-West Regression.
        newey AssetReturn MarketReturn InterestReturn, lag(`=Lagorder')

        // Calculation of p-values.
        scalar p_cons = 2 * ttail(e(df_r), abs(_b[_cons] / _se[_cons]))
        scalar p_MarketReturn = 2 * ttail(e(df_r), abs(_b[MarketReturn] / _se[MarketReturn]))
        scalar p_InterestReturn = 2 * ttail(e(df_r), abs(_b[InterestReturn] / _se[InterestReturn]))      
        
        // Saving the residuals in a variable named "resid".
        predict resid, res
        
        // Square residuals.
        generate resid2 = resid^2
        
        // Sum squared residuals.
        egen S_resid2 = sum(resid2)

        // Calculating root mean squar error "RMSE".
        // Dividing squared residuals by the number of degrees of freedom and afterwards extracting a root.
        scalar RMSE = sqrt(S_resid2/e(df_r))
        
        // Saving the degrees of freedom "DF".
        scalar DF = e(df_r)

        // Saving results in a matrix named "Results".
        matrix Results = nullmat(Results) \ _b[_cons], p_cons, _b[MarketReturn], p_MarketReturn, _b[InterestReturn], p_InterestReturn, RMSE, DF     
        
        // Matrix columnnames.
        matrix colnames Ergebnisse2 = Cons p_cons B_RM p_RM B_RZ p_RZ RMSE DF

        // Matrix rownames.
        matrix rownames Ergebnisse2 = `Regressand' 
        
        // Delete generated variables.
        drop resid resid2 S_resid2
        
        
        exit
        
end    


// Rolling Window Regression.       
rolling _b _se e(df_r), window(52) stepsize(52) saving(RWR_Results , replace ): MyRegression

Any help is greatly appreciated!

Kind regards,
Carl

Tags: program, rolling, Rolling Window Regression

Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#2

28 Apr 2017, 08:35

-rolling- works by repeatedly calling your program with an -if- qualifier that designates the observations in the window. Your program is not written in such a way as to recognize the -if- qualifier. You need to add a -syntax- statement that will parse out the -if- and then you need to modify the other steps in the program so that they only apply to those observations (as appropriate).
Comment

Carl Groesbrink

Join Date: Apr 2017
Posts: 8

29 Apr 2017, 03:56

Hi Clyde,

thank you very much for your help! Your advice helped me a lot.

I accounted my code for your comments and now it's working almost perfectly fine. Just one issue remains.
In the matrix "Results", where I'm saving the regression results, now (only) the first row contains results based on a window including all observations (whole sample), and the following windows contain the desired number of observations (52 observations) as you can see in the attached picture, if you have a look at the last column which is stating the number of degree of freedoms ("DF").
To make it clear, rows number 2 to 22 contain the desired results of the rolling window regression (each row represents one window) and also the number of windows is complete. But the first row represents an additional window where all observations were used.

So, I want to adjust my code in that way, that the results from the first row are never calculated and never saved.
Do you know how I further have to adjust my code? I guess I have to insert an additional -if- qualifier somewhere?

In the following: Output of the matrix "Results"

Click image for larger version

Name: RollingWindowRegression.png
Views: 1
Size: 44.5 KB
ID: 1386032

In the following I provide my adjusted code (adjustments are bold):

Code:

// Define program.
program MyRegression,
       
        // Stata version.
        version 12.1
       
        // Define program syntax.
        syntax varlist [if]
       
        // Defining a local macro, used for the rownames of resultmatrix.
        local Regressand `Regressand' AssetReturn

        // Calculating number of lags needed for -newey- regression.
        quietly summarize AssetReturn
        scalar Lagorder = floor(4*(r(N)/100)^(2/9))
       
        // Newey-West Regression.
        newey `varlist' `if', lag(`=Lagorder')

        // Calculation of p-values.
        scalar p_cons = 2 * ttail(e(df_r), abs(_b[_cons] / _se[_cons]))
        scalar p_MarketReturn = 2 * ttail(e(df_r), abs(_b[MarketReturn] / _se[MarketReturn]))
        scalar p_InterestReturn = 2 * ttail(e(df_r), abs(_b[InterestReturn] / _se[InterestReturn]))
          
        // Saving the residuals in a variable named "resid".
        predict resid if e(sample), res
 
        // Square residuals.
        generate resid2 = resid^2

        // Sum squared residuals.
        egen S_resid2 = sum(resid2)

        // Calculating root mean squar error "RMSE".
        // Dividing squared residuals by the number of degrees of freedom and afterwards extracting a root.
        scalar RMSE = sqrt(S_resid2/e(df_r))
               
        // Saving the degrees of freedom "DF".
        scalar DF = e(df_r)

        // Saving results in a matrix named "Results".
        matrix Results = nullmat(Results) \ _b[_cons], p_cons, _b[MarketReturn], p_MarketReturn, _b[InterestReturn], p_InterestReturn, RMSE, DF
       
        // Matrix columnnames.
        matrix colnames Results = Cons p_cons B_RM p_RM B_RZ p_RZ RMSE DF

        // Matrix rownames.
        matrix rownames Results = `Regressand'
         
        // Delete generated variables.
        drop resid resid2 S_resid2
        
        exit
        
end    

    


rolling _b _se e(df_r), window(52) stepsize(52) saving(RWR_Results , replace ): MyRegression AssetReturn MarketReturn InterestReturn

Thank you very much in advance,
Carl

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#4

29 Apr 2017, 09:41

Your code violates the spirit of -rolling- and that is causing your problem. Although you have specified _b _se e(df_r) and a -saving()- option in your -rolling- command, you are not really using those things. You are, instead, building a matrix each time you call MyRegression, and you are putting various things into that matrix that aren't even mentioned in your -rolling- command, such as the RMSE.

Now, for reasons that I have never understood, but I have observed it to be true many times, and this is one more, when -rolling- executes, it starts by running the program MyRegression without any window, that is, on the whole data set. It does not save those results in the file specified in -saving()-. But your program saves them in your matrix Results! You cannot prevent -rolling- from doing that unless you want to try to hack its code. Moreover, although I have never understood why it does this, I imagine there is some good reason it does this. So I would be wary of changing that. What you should do is ditch the matrix you are building and let -rolling- store all the results you want in the RWR_results file that your -saving()- option specifies. This entails changing the code of MyRegression so that instead of calculating scalars and stuffing them into a matrix, it returns those scalars in r(). (N.B. In order to return those scalars you will need to make MyRegression an rclass program.) Similarly, each of those returned scalars should be mentioned in your -rolling- command along with _b _se e(df_r).

Then at the end, you will have a Stata data set containing all the things that you currently get in that matrix (but without that unwanted first row). If you actually need to make a matrix out of them, there is always the mkmat() command. But, probably whatever you want to do with these results will actually be easier if you just work with the RWR_Results data set than with a matrix anyway.
Comment
Carl Groesbrink

Join Date: Apr 2017

Posts: 8
#5

29 Apr 2017, 12:11

Thank you again for your explanations!
I finally solved my problem by subsetting the matrix via the -matselrc- command. Probably not the best solution but an easy one.

Best regards,
Carl
Comment

Announcement

Rolling Window Regression: Combining the -rolling- command with the -program- command

Comment

Comment

Comment

Comment