Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rolling and recursive regressions while storing fitted and residual values

    Dear all
    I am looking for two alternatives to calculate fitted values and residuals (The dependent variable is ret and the independent variables are x y z ) from :
    1- a rolling 5 year regression:
    Here I want to run a regression using data from the most recent 5 years and calculate the fitted and residual, then move one year forward and drop one year and then calculate the fitted and residual. I understand that I will be able to calculate the fitted and residual values starting from the fifth year in this case.

    In this option, can I do something like:
    rolling _b[constant] _b[x] _b[y] _b[z] , window(5) : reg ret x y z
    gen fitted=_b[_cons]+_b[x]+_b[y]+_b[z]
    gen residual= ret-fitted


    2- a recursive regression that adds one year each time
    Here I want to run a regression using data from the most recent 5 years, calculate the fitted and residual values, then move one year forward WITHOUT dropping a year, i.e. the regression will be subsequently estimated for 6 years, then 7 years, then 8 years, and so one. I also want to calculate the fitted and residual values each time the regression is estimated.

    In the second option, can I do something like:

    rolling _b[constant] _b[x] _b[y] _b[z] , window(5) recursive : reg ret x y z
    gen fitted=_b[_cons]+_b[x]+_b[y]+_b[z]
    gen residual= ret-fitted

    Thanks in a dvance

  • #2
    What is your question?

    By the way, I think you mean

    Code:
    gen fitted = _b[_cons] + _b[x]*x+_b[y]+*y_b[z]*z

    Comment


    • #3
      My question is that I want to calculate fitted values and residuals by two different approaches:

      1-Using a 5 year rolling regression:
      Here I run a regression using data for the first 5 years in the sample (for example 1990 to 1994) and calculate the fitted and residual, then move one year forward and drop one year (i.e. from 1991 to 1995) .

      2- Using a recursive regression that adds one year each time
      Here I want to run a regression using data for the first 5 years in the sample (for example 1990 to 1994), calculate the fitted and residual values, then move one year forward WITHOUT dropping a year (i.e. from 1990 to 1995, then 1990 to 1996, and so on).

      Can you help please?

      Comment


      • #4
        Is there an efficient way to get the fitted and residual values directly rather than estimating each coefficient, multiplying by the variables and adding them up? I have more variables than those here.

        Comment


        • #5
          Hi again
          I revised the first option as following:
          rolling _b[_cons] _b[x] _b[y] _b[z] _b[g] _b[d], window(5) : reg ret x y z g d
          gen fitted=_b[_cons]+_b[x]*x+ _b[y]*y+ _b[z]*z+ _b[g]*g+ _b[d]*d
          gen residual= ret-fitted

          I get the following error message:
          (running regress on estimation sample)
          no; data in memory would be lost
          r(4);


          Can anyone help please?

          Comment


          • #6
            I have also tried :

            gen RES=.
            capture program drop my_regress
            program define my_regress, rclass
            syntax varlist [if]
            regress `varlist' `if'
            tempvar resid
            predict `resid' if e(sample), resid
            replace RES=`resid'
            exit
            end


            rolling RES, window(5)clear: my_regress ret x y z g d


            The programme starts to run but produces the following:

            Rolling replications (19)
            1 ---+--- 2 ---+ -- 3 ---+--- 4 ---+--- 5
            eeeeeeeeeeeeeeeeeee

            -> permno = 10002

            Rolling replications
            1 ---+--- 2 ---+ -- 3 ---+--- 4 ---+--- 5


            -> permno = 10010

            Rolling replications
            1 ---+--- 2 ---+ -- 3 ---+--- 4 ---+--- 5


            -> permno = 10011

            Rolling replications (2)
            1 ---+--- 2 ---+ -- 3 ---+--- 4 ---+--- 5
            ee




            I appreciate any help please!!
            Thanks


            Comment


            • #7
              With regard to #5, -rolling- gives you two options as to what to do with its results. If you want them to replace the data in memory, you have to specify the -clear- option. If you want to save them to a Stata .dta file and leave the data in memory intact, then you have to specify the -saving()- option. If you specify neither you get the error message you found.

              With regard to #6, it is hard to know exactly what is going wrong. But I see at least one error. You can't have -gen RES = .- in your program like that because after the first time my_regress gets called, RES will already exist, so the -gen RES = .- command will throw an error. The simplest way to get around that would be to have -gen RES = .- in the code before you call -rolling-, and then use only -replace RES = ...- in your program.

              Unfortunately, that simplest way won't work for your purposes because RES will get written over with each successive rolling window, so you will not be left with the residuals you want. Actually, I'm not entirely sure what you want with residuals. After all each regression generates a residual for every observation, and each observation will participate in five different regressions (more or less), or even in more if you use the recursive window approach. But based on the code in your original post, I'm inferring that what you want is for each observation to keep the residual for the regression in which it serves as the last observation. If true, painful as it may seem, I think the most efficient way to go is your original approach: calculate that linear combination directly and subtract it from the observed value. The alternative is to modify my_regress to include -predict if e(sample)- and then identify the last residual, return that from my_regress, and have your rolling command pick that up. So overall it would look something like this (using the grunfeld data set as an example).

              Code:
              clear*
              
              capture program drop my_regress
              program define my_regress, sortpreserve rclass
                  syntax [if]
                  regress mvalue invest `if'
                  tempvar r
                  predict `r' if e(sample), resid
                  tempvar in_sample
                  gen byte `in_sample' = e(sample)
                  sort `in_sample', stable
                  return scalar residual = `r'[_N]
                  exit
              end
              
              webuse grunfeld, clear
              regress mvalue invest in 1/5
              predict resid, resid
              list resid in 1/10
              
              
              replace resid = .
              rolling _b r(residual), window(5) keep(resid) clear: my_regress
              Finally with regard to #1 and #3, I still don't understand what your question is. You have made a number of declarative statements. The only question you ask is "Can you help please?" Help with what?

              Comment


              • #8
                Dear Clyde
                Many thanks for your assistance. I am trying to understand the code.
                Could you please clarify why do you use
                regress mvalue invest in 1/5 ....why do you use in 1/5 . Is this related to the window length as well?
                list resid in 1/10 ...........why do you need this? we do not want to get any specific stats afterwards, right?

                Comment


                • #9
                  Oh, sorry. The part
                  Code:
                  regress mvalue invest in 1/5
                  predict resid, resid
                  list resid in 1/10
                  
                  replace resid = .
                  was just in there while I was testing it. It isn't necessary; I meant to edit it out before posting. You just need program my_regress and the -rolling- command. Sorry for the confusion.

                  Comment


                  • #10
                    Dear Clyde

                    I run the code after removing the parts you suggessted as following

                    **
                    capture program drop my_regress
                    program define my_regress, sortpreserve rclass
                    syntax [if]
                    regress annual_ret x y z `if'
                    tempvar r
                    predict `r' if e(sample), resid
                    tempvar in_sample
                    gen byte `in_sample' = e(sample)
                    sort `in_sample', stable
                    return scalar residual = `r'[_N]
                    exit
                    end

                    rolling _b r(residual), window(5) keep(resid) clear: my_regress

                    **

                    But I got the error message:

                    keep() invalid: resid does not exist

                    Comment


                    • #11
                      Mike, sorry.

                      When I worked on your problem, I first tried an approach that involved saving the targeted residual in the original data set. I ultimately decided it was unworkable and opted instead for the approach in the version of -my_regress- that I posted, returning the targeted residual in r(). I needed to then purge the code of all references to the first approach, but apparently I was sloppy in the way I did it. I'm sorry for confusing you and delaying your work. Just remove the -keep(resid)- option from the -rolling- command.

                      Comment


                      • #12
                        Thank You Clyde
                        I have been trying to run the code again on my panel data (for more than 60,000 firm-years). It appears that it has been taking hours....
                        I think these should be an efficient way to run a loop and produce the same results. But do know how this is possible.

                        Do you or other members have any suggestions about a more efficient loop?

                        Best wishes

                        Comment


                        • #13
                          Hi again
                          I left the program running and took hours till now. The problem is that even if works properly, it will be so time consuming once I try to re-run with different sets of variables as I aim to do.
                          Can a foreach or farval loop help to solve this problem and makes the code faster?
                          Thanks

                          Comment


                          • #14
                            In fact, with a simple calculation for the number of panels and the time Stat takes to "roll" through each panel, I would need not less than 70 hours to get this done for a one set of variables!

                            Comment


                            • #15
                              Take a look at rangestat (from SSC). Here are a few recent examples of performing regressions over a rolling window:

                              http://www.statalist.org/forums/foru...rolling-window
                              http://www.statalist.org/forums/foru...ow-regressions

                              In the second example, rangestat can perform over 3 million regressions a minute.

                              Comment

                              Working...
                              X