Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rolling Regression in STATA

    Hi,

    I have a panel and want to run a rolling regression. Assume that I have dependent variable Y and independent variable X each of which has T time series observations. At each point of time (say t), I want to only consider the observations before t and run a regression. I want to repeat this for all the T observations. How is it possible?

    Thanks for your help in advance.

    Best,
    Mohammad

  • #2
    Several ways to do this. Here is one using rangestat (SSC) by Robert Picard and friends.

    See e.g http://www.statalist.org/forums/foru...updated-on-ssc


    Code:
    webuse grunfeld, clear
    rangestat (reg) mvalue invest, interval(year . -1)
    collapse reg_* b_* se_*, by(year) 
    list in 2/L
    You didn't mention any panel structure, but adding a by() option to the above would produce separate regressions for each panel.

    Naturally there is always the question of what sample size is needed for plausible results. Or indeed what makes substantive sense; my example is just to show some technique

    Comment


    • #3
      Dear Nick,

      Thanks a lot for your help.

      Best,
      Mohammad

      Comment


      • #4
        Dear Nick Cox,

        Following your helpful post, I wrote this code to do a rolling WLS instead of OLS:

        clear
        webuse grunfeld, clear

        rangestat (last) time (first) time (count) invest, ///
        interval(year . 0) by(company)

        rangestat (reg) mvalue invest [aweight = exp(-abs(time_last-time))/sum(exp(-abs(time_last-time)))] ///
        interval(year . 0) by(company)

        However, stata shows me this error: "weights not allowed"! :-( I have to run the above rolling WLS, but if weights are not allowed in rangstat, then how is it possible to run that?


        Attached is the regression that I must run. R is dep var and u is indep var.
        Click image for larger version

Name:	WLS.jpg
Views:	1
Size:	34.2 KB
ID:	1391303

        Thanks for your help in advance.

        Best,
        Mohammad

        Comment


        • #5
          rangestat (SSC) does not support weights. You need some other program, perhaps your own code. rolling may help, but at a minimum I guess you have to calculate your weights in advance.

          Comment


          • #6
            At the least you will need to right a program that calculates the weights (which change from window to window) and runs the regression, and then have -rolling- iterate that. If your data set is large, this is going to be very slow.

            But my main point here is to just point out two problems with your -aweight- specification. First, you can't express an -aweight- as an expression: you have to calculate the expression as a variable and specify that variable in your -aweight- syntax. Second, the expression you show is incorrect in #4 is incorrect in two ways. A) syntactically that -sum()- in the denominator is the wrong expression for a fixed total, it gives a running sum, and B) it doesn't include the h = log(2)/60 factors that appear in the formula you appear to be trying to emulate.

            Comment


            • #7
              As I understand it a term exp(h) factors out of the weights as occurring on both top and bottom. That being so, this may help.

              Code:
              clear
              webuse grunfeld, clear
              
              gen a = .
              gen b = .
              gen w = .
              gen n = .
              
              quietly forval T = 1936/1954 {
                  replace w = exp(-(`T' - year)) if `T' > year
                  forval j = 1/10 {    
                      capture {
                          regress mvalue invest [aweight=w] if company == `j'
                          replace a = _b[_cons] if company == `j' & year == `T'
                          replace b = _b[invest] if company == `j' & year == `T'
                          replace n = e(N) if company == `j' & year == `T'
                      }
                  }
              }

              Comment


              • #8
                Dear Nick and Clyde,

                Thanks a lot for all the help and time. I am working on it and will come back if I have any other questions.

                Best,
                Mohammad
                Nick Cox Clyde Schechter

                Comment


                • #9
                  As I understand it a term exp(h) factors out of the weights as occurring on both top and bottom. That being so, this may help.
                  I don't think that's correct here. The h is a factor within the exp() as shown in #4. So what we have is of the form exp(Q*h)/Sum(exp(Q*h)) but neither the numerator nor the denominator is itself a direct multiple of exp(h). You can rewrite exp(Q*h), not as exp(Q)*exp(h), nor anything else * exp(h), but only as [exp(Q)]h or [exp(h)]Q, neither of which leads to anything that can be removed from as common to the numerator and denominator there.

                  It is true that when the aweights are used, Stata will automatically rescale them, so that division by that sum, which does nothing other than normalizing the weights so they sum to 1, serves no purpose. And Nick's code appropriately ignores the sum in the denominator.

                  But I do believe that in #7
                  Code:
                  replace w = exp(-(`T' - year)) if `T' > year
                  should be
                  Code:
                  replace w = exp(-(`T' - year)*h) if `T' > year
                  , where h must first be appropriately defined as log(2)/60. (Probably best done as a scalar or local macro, rather than a "variable" that's actually constant.)

                  Comment


                  • #10
                    Clyde's right. The weights must be calculated with the extra factor.

                    Code:
                    clear
                    webuse grunfeld, clear
                    
                    gen a = .
                    gen b = . 
                    gen double w = .
                    gen n = .
                    
                    quietly forval T = 1936/1954 {
                        replace w = exp(-(`T' - year) * log(2)/60) if `T' > year
                        forval j = 1/10 {    
                            capture {
                                regress mvalue invest [aweight=w] if company == `j'
                                replace a = _b[_cons] if company == `j' & year == `T'
                                replace b = _b[invest] if company == `j' & year == `T'
                                replace n = e(N) if company == `j' & year == `T'
                            }
                        }
                    }

                    Comment


                    • #11
                      Thanks a lot. (Nick Cox Clyde Schechter)
                      You are right and I used the correct term when I wrote the code. I am running the code now, but the problem is that it is too time-consuming!!! I have more than 4,000 firms (panel variable) and 400 time-points for each one (time series). That is, approximately, I have 1,600,000 observations (=rows).

                      I know in MATLAB, we can use parallel computing to increase the speed. However, I am a beginner in stata, so, unfortunately, I have no idea to cope with this problem. Is there any other more efficient solution to code this problem (For example, to use other commands or structures, etc.)? Or, can I make the current code more efficient in terms of execution speed?

                      Thanks in advance for all your help.
                      Last edited by Mohammad Khodadadi; 16 May 2017, 18:03.

                      Comment


                      • #12
                        It could be done faster, I guess. In a big dataset if is expensive in time and could be perhaps be turned into in.

                        Sorry, but no interest in working further at this in practice.
                        Last edited by Nick Cox; 16 May 2017, 18:34.

                        Comment


                        • #13
                          Thanks for your help Nick. :-)

                          Comment


                          • #14
                            If you want help, then show your real code, which won't be #10! It's possible that something you did makes it extra slow or that your real code will inspire someone.

                            Comment


                            • #15
                              Dear Nick,

                              Bellow is my code. :-) It does not allow me to attach my data! I don't know why! However, if you know how I can, then I have no problem to attach the data as well.



                              clear all
                              cd "C:\Users\md_kh\Dropbox\ECF\Codes"
                              use "stock_inflation_data.dta", clear

                              * create "time" variable which is consecutive date of observations
                              by permno: gen time = _n
                              order permno firm date time dur shrout price exret inflation

                              * variable related to the regressions
                              gen a = .
                              gen b = .
                              gen w = .
                              gen n = .
                              gen w1 = .

                              * finding numbers of firms
                              by permno, sort: gen nvals = _n == 1
                              replace nvals = sum(nvals) /* the last value is sum of the ditinct permnos */
                              scalar firm_number = nvals[_N]
                              drop nvals

                              quietly forval j = 1/`=firm_number' {

                              * finding max and min of time for each firm j (min for all of them is 1)
                              egen max_tm = max(time) if firm == `j'
                              egen maxtm = max(max_tm)
                              drop max_tm

                              egen min_tm = min(time) if firm == `j'
                              egen mintm = max(min_tm)
                              drop min_tm
                              forval T = `=mintm'/`=maxtm' {
                              rangestat (count) exret if (time <= `T') & (time >= `T'-12) & firm ==`j', interval(time -12 0) excludeself
                              summarize exret_count
                              if r(max) < 4{
                              drop exret_count
                              continue
                              }
                              replace w1 = exp(-(`T' - time)* log(2)/60) if `T' > time & firm == `j'
                              egen w_s = total(w1) if firm == `j' & `T' > time
                              replace w = w1/w_s if firm == `j' & `T' > time


                              capture {
                              regress exret inflation [aweight=w] if firm == `j'
                              replace a = _b[_cons] if firm == `j' & time == `T'
                              replace b = _b[inflation] if firm == `j' & time == `T'
                              replace n = e(N) if firm == `j' & time == `T'
                              }
                              drop w_s exret_count

                              }
                              drop maxtm mintm

                              }

                              save "beta.dta"


                              Comment

                              Working...
                              X