Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting the variance of residuals in a rolling regression

    Hello! I have a panel dataset with around 10000 companies and I want to perform rolling regressions while obtaining the variance of the residuals for each regression performed. I can't use the rolling command and e(rmse) as the regressions suffer from heteroscedasticity and as such rmse is not the standard deviation of the residuals.

    I would like to do something like this:

    First window:
    Reg Y X
    Calculate residuals
    Compute Standard Deviation of residuals
    Store result in "new variable" 1st cell

    Second window:
    Reg Y X
    Calculate residuals
    Compute Standard Deviation of residuals
    Store result in "new variable" 2nd cell

    And keep going like this.

    I probably need to write this in code but unfortunately I have never coded in stata before.

    Please help either by providing an alternative way of doing this (as it seems rolling cannot) or by helping me code it.

    Thank you in advance!
    Last edited by Andre Sacras; 07 Apr 2015, 19:58.

  • #2
    I think you can still use -rolling-, you just have to write a program that wraps your regression and residual calculations and returns the standard deviation of residuals. So it would be something like this:

    Code:
    capture program drop my_regress
    program define my_regress, rclass
        syntax varlist [if]
        regress `varlist' `if'
        tempvar resid
        predict `resid' if e(sample), resid
        summ `resid'
        return scalar sdr = r(sd)
        exit
    end
    And then you can invoke that with something like:

    Code:
    rolling sd_resid = r(sdr) ...: my_regress Y X
    You would, of course, replace the ... with whatever -rolling- syntax you would have used otherwise. This my_regress program is a bare-bones version that will just minimally accomplish what you set out in your post. If there is more to your problem than you describe, then you may need to embellish my_regress to accomplish that.

    I think this approach will work, and it would be simpler than attempting to hand-code the management of the rolling windows that -rolling- does for you.

    Comment


    • #3
      Thank you very much Clyde Schechter! Worked like a charm!

      Comment


      • #4
        Hello ,

        I had the same problem actually, but the above solution did not work (I waited several hutrs and got only a tiny percent of solutions, from over 250 000 observations).

        I work on a database with monthly observations of Dow Jones stocks performance, consisting of ID, date in months and return

        My aim is to obtain for every observation a 12 month variance of monthly returns. As already stated the following formula is too slow:


        rolling r(return), window(12) clear: summarize return


        I would really appreciate any help because I could not find any solution for last three days.

        Comment


        • #5
          You may find the thread at http://www.statalist.org/forums/foru...faster-program helpful here, especially #4. It deals with -statsby- instead of -rolling-, but the overall approach would be the same. -rolling- is just a wrapper command that repeatedly applies -if- conditions to your command and then posts the results to another file (on disk)

          You first have to strongly balance the data set with -tsfill, full-. Then you can set up your various windows for summarize using -in- rather than -if- qualifiers. Qualifying with -in- is much faster, O(1), then with -if-, O(N), and -rolling- uses -if- repeatedly. The key is just calculating the right values to put in the -in- condition. It's just some simple algebra because every panel has the same number of observations now. Also, by setting up variables in the original data set to receive the results and using -replace- to update the appropriate observation (again with an -in- condition) you avoid a lot of time spent writing things to a disk file. The overall speedup should be quite appreciable.

          Comment


          • #6
            Hi Clyde Schechter,

            I am new to STATA and would like to store the residuals after each rolling regression. I found that your code could be useful to me. However, I am confused with following code, could you please explain more on it? What should I include in the -if- condition?
            syntax varlist [if] regress `varlist' `if'

            I would greatly appreciate your help.

            Comment


            • #7
              What should I include in the -if- condition?
              Nothing.

              When you run -rolling-, the code in -rolling- supplies an -if- condition when it calls my_regress, the if-condition being one that identifies the observations to be included in its current interation. This is done for you automatically and you don't have to specify anything there. You just need to have the -[if]- and `if' parts present in the syntax in my_regress so that -rolling- can use it properly.

              Comment


              • #8
                Hi Clyde Schechter,

                Many thanks for your reply.

                What should I do if I want to store the last residual from each rolling window? For example:

                First window:
                Period: 1966m1 to 1970m12
                Regress Y on X
                Store the last residual in 1970m12

                Second window:
                Period: 1966m2 to 1971m1
                Regress Y on X
                Store the last residual in 1971m1

                I think I should do some alterations to the code you provided but I don't have any ideas on how to do it. Would greatly appreciate if you could help me further.

                Comment


                • #9
                  Since you tacked your post in #6 onto an existing thread, I assumed in #7 that you had exactly the same problem and requirements as those of the original question in #1 and were just asking for an explanation of how the code works. Now it appears you have something different, and -rolling- may not be an appropriate approach to this.

                  What is your data structure? If you have a single time series, this can be done . If you have panel data, then there is a "last residual" in each panel, and it would be impractical, at best, to do this using -rolling-; a different approach is required. Please elaborate about your problem, and include a small example of your data using the -dataex- command. (Run -ssc install dataex- to get it; -help dataex- will give you instructions for using it.)

                  Comment


                  • #10
                    Hi Clyde Schechter,

                    Many thanks for your reply.

                    Following is a small example of my data. I have three independent variables (x1, x2, and x3) and the dependent variable, y. I would like to run a rolling window regression with a window size of 60 months and store only the residual from the last observation in each window (not all 60 residuals in each window). For example, in my first window (Jan 1966 - Dec 1970), I would like to store only the residual in Dec 1970 (last observation in the first window). For the second window (Feb 1966 - Jan 1971), I would like to store only the residual in Jan 1971 (last observation in the second window). The same process continues up to Dec 2014.

                    Code:
                     
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str8 date float(y x1 x2 x3)
                    "Jan-1966" -.018 .096 .169 .065
                    "Feb-1966" -.022 .096  .12 .055
                    "Mar-1966"   .02 .097 .099 .069
                    "Apr-1966" -.056 .094 .086 .066
                    "May-1966" -.016 .096 .058 .055
                    "Jun-1966" -.014 .092 .097 .077
                    "Jul-1966" -.081 .088  .09 .066
                    "Aug-1966" -.007 .084 .081 .059
                    "Sep-1966"  .046 .092  .05 .054
                    "Oct-1966"  .003 .088 .043 .013
                    end



                    Comment


                    • #11
                      OK. While you could mark up program my_regress and use it with -rolling- to do this, I think that it's easier to just do it in a simple loop.

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input str8 date float(y x1 x2 x3)
                      "Jan-1966" -.018 .096 .169 .065
                      "Feb-1966" -.022 .096  .12 .055
                      "Mar-1966"   .02 .097 .099 .069
                      "Apr-1966" -.056 .094 .086 .066
                      "May-1966" -.016 .096 .058 .055
                      "Jun-1966" -.014 .092 .097 .077
                      "Jul-1966" -.081 .088  .09 .066
                      "Aug-1966" -.007 .084 .081 .059
                      "Sep-1966"  .046 .092  .05 .054
                      "Oct-1966"  .003 .088 .043 .013
                      end
                      gen mdate = monthly(date, "MY")
                      format mdate %tm
                      
                      isid mdate, sort
                      gen last_residual = .
                      tempvar holding
                      
                      local window 4
                      summ mdate, meanonly
                      local first_date = r(min) 
                      local last_date = r(max) - `window' + 1
                      
                      forvalues m = `first_date'/`last_date' {
                          regress y x1 x2 x3 if inrange(mdate, `m', `m'+`window'-1)
                          predict `holding', resid
                          replace last_residual = `holding' if mdate == `m' + `window' - 1
                          drop `holding'
                      }
                      Notes:

                      1. Dates represented as strings are nearly useless in Stata. So I created a Stata internal format numerical monthly date to work with.
                      2. This code uses a window of 4 months, just for demonstration. Change 4 to 60 when running with your real data.
                      3. This code assumes that there is only one observation for each month, and verifies this assumption in an -assert- statement.

                      Comment


                      • #12
                        Thank you very much! I got my problem solved!

                        Comment


                        • #13
                          Hello Clyde,

                          I am using your my_regress code, but my variable X is the one-quarter lag of Y, L1.Y. but the program came out to be, " factor variables and time-series operators not allowed
                          an error occurred when rolling executed my_regress." If I use [_n-1] instead, it says weights not allowed.

                          How can I fix this?

                          Many thanks for your help

                          Comment

                          Working...
                          X