Rolling Regression in STATA

Mohammad Khodadadi

Join Date: May 2017

Posts: 19
#1

Rolling Regression in STATA

04 May 2017, 11:12

Hi,

I have a panel and want to run a rolling regression. Assume that I have dependent variable Y and independent variable X each of which has T time series observations. At each point of time (say t), I want to only consider the observations before t and run a regression. I want to repeat this for all the T observations. How is it possible?

Thanks for your help in advance.

Best,
Mohammad
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35468
#2

04 May 2017, 11:33

Several ways to do this. Here is one using rangestat (SSC) by Robert Picard and friends.

See e.g http://www.statalist.org/forums/foru...updated-on-ssc

Code:

webuse grunfeld, clear rangestat (reg) mvalue invest, interval(year . -1) collapse reg_* b_* se_*, by(year) list in 2/L

You didn't mention any panel structure, but adding a by() option to the above would produce separate regressions for each panel.

Naturally there is always the question of what sample size is needed for plausible results. Or indeed what makes substantive sense; my example is just to show some technique
Comment
Mohammad Khodadadi

Join Date: May 2017

Posts: 19
#3

04 May 2017, 11:39

Dear Nick,

Thanks a lot for your help.

Best,
Mohammad
Comment
Mohammad Khodadadi

Join Date: May 2017

Posts: 19
#4

06 May 2017, 11:15

Dear Nick Cox,

Following your helpful post, I wrote this code to do a rolling WLS instead of OLS:

clear
webuse grunfeld, clear

rangestat (last) time (first) time (count) invest, ///
interval(year . 0) by(company)

rangestat (reg) mvalue invest [aweight = exp(-abs(time_last-time))/sum(exp(-abs(time_last-time)))] ///
interval(year . 0) by(company)

However, stata shows me this error: "weights not allowed"! :-( I have to run the above rolling WLS, but if weights are not allowed in rangstat, then how is it possible to run that?

Attached is the regression that I must run. R is dep var and u is indep var.

Thanks for your help in advance.

Best,
Mohammad
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35468
#5

06 May 2017, 12:08

rangestat (SSC) does not support weights. You need some other program, perhaps your own code. rolling may help, but at a minimum I guess you have to calculate your weights in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29968
#6

06 May 2017, 12:35

At the least you will need to right a program that calculates the weights (which change from window to window) and runs the regression, and then have -rolling- iterate that. If your data set is large, this is going to be very slow.

But my main point here is to just point out two problems with your -aweight- specification. First, you can't express an -aweight- as an expression: you have to calculate the expression as a variable and specify that variable in your -aweight- syntax. Second, the expression you show is incorrect in #4 is incorrect in two ways. A) syntactically that -sum()- in the denominator is the wrong expression for a fixed total, it gives a running sum, and B) it doesn't include the h = log(2)/60 factors that appear in the formula you appear to be trying to emulate.
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35468

08 May 2017, 10:22

As I understand it a term exp(h) factors out of the weights as occurring on both top and bottom. That being so, this may help.

Code:

clear
webuse grunfeld, clear

gen a = .
gen b = .
gen w = .
gen n = .

quietly forval T = 1936/1954 {
    replace w = exp(-(`T' - year)) if `T' > year
    forval j = 1/10 {    
        capture {
            regress mvalue invest [aweight=w] if company == `j'
            replace a = _b[_cons] if company == `j' & year == `T'
            replace b = _b[invest] if company == `j' & year == `T'
            replace n = e(N) if company == `j' & year == `T'
        }
    }
}

Comment

Mohammad Khodadadi

Join Date: May 2017

Posts: 19
#8

08 May 2017, 12:20

Dear Nick and Clyde,

Thanks a lot for all the help and time. I am working on it and will come back if I have any other questions.

Best,
Mohammad
Nick Cox Clyde Schechter
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29968
#9

08 May 2017, 15:33

As I understand it a term exp(h) factors out of the weights as occurring on both top and bottom. That being so, this may help.

I don't think that's correct here. The h is a factor within the exp() as shown in #4. So what we have is of the form exp(Q*h)/Sum(exp(Q*h)) but neither the numerator nor the denominator is itself a direct multiple of exp(h). You can rewrite exp(Q*h), not as exp(Q)*exp(h), nor anything else * exp(h), but only as [exp(Q)]^h or [exp(h)]^Q, neither of which leads to anything that can be removed from as common to the numerator and denominator there.

It is true that when the aweights are used, Stata will automatically rescale them, so that division by that sum, which does nothing other than normalizing the weights so they sum to 1, serves no purpose. And Nick's code appropriately ignores the sum in the denominator.

But I do believe that in #7

Code:

replace w = exp(-(`T' - year)) if `T' > year

should be

Code:

replace w = exp(-(`T' - year)*h) if `T' > year

, where h must first be appropriately defined as log(2)/60. (Probably best done as a scalar or local macro, rather than a "variable" that's actually constant.)
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35468

#10

08 May 2017, 17:18

Clyde's right. The weights must be calculated with the extra factor.

Code:

clear
webuse grunfeld, clear

gen a = .
gen b = . 
gen double w = .
gen n = .

quietly forval T = 1936/1954 {
    replace w = exp(-(`T' - year) * log(2)/60) if `T' > year
    forval j = 1/10 {    
        capture {
            regress mvalue invest [aweight=w] if company == `j'
            replace a = _b[_cons] if company == `j' & year == `T'
            replace b = _b[invest] if company == `j' & year == `T'
            replace n = e(N) if company == `j' & year == `T'
        }
    }
}

Comment

Mohammad Khodadadi

Join Date: May 2017

Posts: 19
#11

16 May 2017, 17:59

Thanks a lot. (Nick Cox Clyde Schechter)
You are right and I used the correct term when I wrote the code. I am running the code now, but the problem is that it is too time-consuming!!! I have more than 4,000 firms (panel variable) and 400 time-points for each one (time series). That is, approximately, I have 1,600,000 observations (=rows).

I know in MATLAB, we can use parallel computing to increase the speed. However, I am a beginner in stata, so, unfortunately, I have no idea to cope with this problem. Is there any other more efficient solution to code this problem (For example, to use other commands or structures, etc.)? Or, can I make the current code more efficient in terms of execution speed?

Thanks in advance for all your help.

Last edited by Mohammad Khodadadi; 16 May 2017, 18:03.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35468
#12

16 May 2017, 18:04

It could be done faster, I guess. In a big dataset if is expensive in time and could be perhaps be turned into in.

Sorry, but no interest in working further at this in practice.

Last edited by Nick Cox; 16 May 2017, 18:34.
Comment
Mohammad Khodadadi

Join Date: May 2017

Posts: 19
#13

17 May 2017, 02:32

Thanks for your help Nick. :-)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35468
#14

17 May 2017, 02:57

If you want help, then show your real code, which won't be #10! It's possible that something you did makes it extra slow or that your real code will inspire someone.
1 like
Comment
Mohammad Khodadadi

Join Date: May 2017

Posts: 19
#15

17 May 2017, 08:09

Dear Nick,

Bellow is my code. :-) It does not allow me to attach my data! I don't know why! However, if you know how I can, then I have no problem to attach the data as well.

clear all
cd "C:\Users\md_kh\Dropbox\ECF\Codes"
use "stock_inflation_data.dta", clear

* create "time" variable which is consecutive date of observations
by permno: gen time = _n
order permno firm date time dur shrout price exret inflation

* variable related to the regressions
gen a = .
gen b = .
gen w = .
gen n = .
gen w1 = .

* finding numbers of firms
by permno, sort: gen nvals = _n == 1
replace nvals = sum(nvals) /* the last value is sum of the ditinct permnos */
scalar firm_number = nvals[_N]
drop nvals

quietly forval j = 1/`=firm_number' {

* finding max and min of time for each firm j (min for all of them is 1)
egen max_tm = max(time) if firm == `j'
egen maxtm = max(max_tm)
drop max_tm

egen min_tm = min(time) if firm == `j'
egen mintm = max(min_tm)
drop min_tm
forval T = `=mintm'/`=maxtm' {
rangestat (count) exret if (time <= `T') & (time >= `T'-12) & firm ==`j', interval(time -12 0) excludeself
summarize exret_count
if r(max) < 4{
drop exret_count
continue
}
replace w1 = exp(-(`T' - time)* log(2)/60) if `T' > time & firm == `j'
egen w_s = total(w1) if firm == `j' & `T' > time
replace w = w1/w_s if firm == `j' & `T' > time

capture {
regress exret inflation [aweight=w] if firm == `j'
replace a = _b[_cons] if firm == `j' & time == `T'
replace b = _b[inflation] if firm == `j' & time == `T'
replace n = e(N) if firm == `j' & time == `T'
}
drop w_s exret_count

}
drop maxtm mintm

}

save "beta.dta"
Comment

Announcement

Rolling Regression in STATA

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment