Rolling 24 month step-wise rgression on panel data

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#16

23 Sep 2017, 14:44

That's a bit clearer, but:

I want identify the two best and worst returns of ex_ret and then check if the risk factors also experience the best and worst returns on those exact months, i.e. if they coincide.

How do you calculate the returns of the risk factors? Also, does the variable id represent a security?
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#17

23 Sep 2017, 15:32

The return of the risk factors are given (monthly time series) - I shared it on one my earlier posts. And Robert used some of the data to create a sample code. Correct. id represent a security.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#18

23 Sep 2017, 15:44

Sorry for being dense, but I don't see it. Which variables in which post give the returns of risk factors lnRVIX, R3, dax, and cac?
Comment

john Abe

Join Date: Sep 2017
Posts: 70

#19

23 Sep 2017, 15:48

Sorry. I am including a snapshot for particular security (id) - ex_ret and returns for the risk factors.

id	year	month	newdate	Year	Month	lnRVIX	R3	dax	cac	ex_ret
28	1994	1	1/1/1994	1994	1	0.234746	0.030602	-0.03937	0.029462	-0.0645
28	1994	2	2/1/1994	1994	2	0.408289	-0.02416	-0.03944	-0.04119	-0.0162
28	1994	3	3/1/1994	1994	3	0.507571	-0.04372	0.019861	-0.06952	0.0803
28	1994	4	4/1/1994	1994	4	0.55013	0.011437	0.052913	0.040943	-0.0282
28	1994	5	5/1/1994	1994	5	0.270504	0.011018	-0.05266	-0.06005	0.0378
28	1994	6	6/1/1994	1994	6	0.398908	-0.02738	-0.04811	-0.05238	0.1099
28	1994	7	7/1/1994	1994	7	0.293058	0.030994	0.059891	0.107803	-0.0238
28	1994	9	9/1/1994	1994	9	0.244665	-0.02127	-0.09088	-0.09174	0.0237
28	1994	10	10/1/1994	1994	10	0.24218	0.016536	0.029765	0.014073	0.0159
28	1994	11	11/1/1994	1994	11	0.215568	-0.03649	-0.01128	0.037351	0.0493
28	1994	12	12/1/1994	1994	12	0.428878	0.015572	0.028473	-0.04798	0.0081
28	1995	1	1/1/1995	1995	1	0.249812	0.021911	-0.0405	-0.04389	0.0373
28	1995	2	2/1/1995	1995	2	0.143285	0.040796	0.040029	-0.01164	-0.0117
28	1995	4	4/1/1995	1995	4	0.162083	0.026146	0.048554	0.032451	0.0377
28	1995	5	5/1/1995	1995	5	0.153643	0.036328	0.037814	0.018387	-0.0084
28	1995	6	6/1/1995	1995	6	0.224073	0.028918	-0.00394	-0.02586	-0.1104
28	1995	7	7/1/1995	1995	7	0.157864	0.040155	0.06469	0.04051	0.0154
28	1995	8	8/1/1995	1995	8	0.179134	0.008876	0.00882	-0.01911	0.0101
28	1995	9	9/1/1995	1995	9	0.174786	0.038749	-0.02291	-0.05045	-0.0337
28	1995	10	10/1/1995	1995	10	0.156483	-0.00864	-0.00875	0.014334	0.0305
28	1995	11	11/1/1995	1995	11	0.147585	0.044351	0.034559	0.00837	-0.018
28	1996	1	1/1/1996	1996	1	0.312375	0.029025	0.09595	0.079892	0.0184
28	1996	2	2/1/1996	1996	2	0.297906	0.014751	0.00138	-0.01492	0.0065
28	1996	3	3/1/1996	1996	3	0.251314	0.010052	0.004981	0.027126	-1E-04
28	1996	4	4/1/1996	1996	4	0.317465	0.018961	0.007796	0.050968	-0.0456
28	1996	5	5/1/1996	1996	5	0.222875	0.025592	0.014989	-0.01189	-0.0579
28	1996	6	6/1/1996	1996	6	0.309375	-0.00323	0.007311	0.024326	0.1171
28	1996	8	8/1/1996	1996	8	0.313601	0.030336	0.028496	-0.01267	-0.0148

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#20

23 Sep 2017, 16:10

I see the risk factors themselves. But I do not see any variables for "returns of the risk factors." What am I missing here?
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#21

23 Sep 2017, 16:19

The return of the risk factors are in the table ( in 1994/01dax returned -.03937 etc.). For every id (security) the ex_ret changes every month, but the return of the risk factors remain the same over the history. Hope I am not confusing you again.
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#22

23 Sep 2017, 16:28

To be precise, the return of the risk factors does not depend on individual security (id).
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#23

23 Sep 2017, 16:30

So for every security, the return of risk factor "dax" on 1994/01 is -.03937.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#24

23 Sep 2017, 17:24

Oh, so those variables are the returns of the risk factors. They are not the risk factors themselves?

Anyway, so I think you want to do this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte id int year byte month float date int Year byte Month float(lnrvix r3 dax cac ex_ret) 28 1994 1 408 1994 1 .234746 .030602 -.03937 .029462 -.0645 28 1994 2 409 1994 2 .408289 -.02416 -.03944 -.04119 -.0162 28 1994 3 410 1994 3 .507571 -.04372 .019861 -.06952 .0803 28 1994 4 411 1994 4 .55013 .011437 .052913 .040943 -.0282 28 1994 5 412 1994 5 .270504 .011018 -.05266 -.06005 .0378 28 1994 6 413 1994 6 .398908 -.02738 -.04811 -.05238 .1099 28 1994 7 414 1994 7 .293058 .030994 .059891 .107803 -.0238 28 1994 9 416 1994 9 .244665 -.02127 -.09088 -.09174 .0237 28 1994 10 417 1994 10 .24218 .016536 .029765 .014073 .0159 28 1994 11 418 1994 11 .215568 -.03649 -.01128 .037351 .0493 28 1994 12 419 1994 12 .428878 .015572 .028473 -.04798 .0081 28 1995 1 420 1995 1 .249812 .021911 -.0405 -.04389 .0373 28 1995 2 421 1995 2 .143285 .040796 .040029 -.01164 -.0117 28 1995 4 423 1995 4 .162083 .026146 .048554 .032451 .0377 28 1995 5 424 1995 5 .153643 .036328 .037814 .018387 -.0084 28 1995 6 425 1995 6 .224073 .028918 -.00394 -.02586 -.1104 28 1995 7 426 1995 7 .157864 .040155 .06469 .04051 .0154 28 1995 8 427 1995 8 .179134 .008876 .00882 -.01911 .0101 28 1995 9 428 1995 9 .174786 .038749 -.02291 -.05045 -.0337 28 1995 10 429 1995 10 .156483 -.00864 -.00875 .014334 .0305 28 1995 11 430 1995 11 .147585 .044351 .034559 .00837 -.018 28 1996 1 432 1996 1 .312375 .029025 .09595 .079892 .0184 28 1996 2 433 1996 2 .297906 .014751 .00138 -.01492 .0065 28 1996 3 434 1996 3 .251314 .010052 .004981 .027126 -.0001 28 1996 4 435 1996 4 .317465 .018961 .007796 .050968 -.0456 28 1996 5 436 1996 5 .222875 .025592 .014989 -.01189 -.0579 28 1996 6 437 1996 6 .309375 -.00323 .007311 .024326 .1171 28 1996 8 439 1996 8 .313601 .030336 .028496 -.01267 -.0148 end format %tm date capture program drop program3 program define program3 // IDENTIFY, FOR EACH OF lnvrix-cac, WHETHER OR NOT // ITS TWO HIGHEST AND TWO LOWEST VALUES COINCIDE WITH // THE TWO HIGHEST AND TWO LOWEST VALUES OF ex_ret local xlist lnrvix r3 dax cac sort ex_ret local size = _N foreach x of local xlist { gen test1 = sum(`x' < `x'[1]) gen test2 = sum(`x' < `x'[2]) gen test3 = sum(`x' > `x'[_N-1]) gen test4 = sum(`x' > `x'[_N]) gen `x'_coincide = (test1[_N] == 0 & test2[_N] <= 1 & test3[_N] <= 1 /// & test4[_N] == 0) drop test1 test2 test3 test4 } exit end rangerun program3, interval(date -23 0) by(id)

First, I cleaned up your example data so that we have a real monthly date variable for date, not a string that looks like a daily date. The logic of program3 is as follows. The data are sorted in order of ex_ret. Then for each of the risk factor returns (looped over as `x'), we do four tests. Test 1 asks whether the value of `x' when ex_ret is minimum is also the lowest value of `x'. If it is, there will be no value of `x' < `x'[1], and test1 will be 0 throughout., but if any observations have a value of `x' that is smaller than the first, then the value of test1 in the last observation will be the total number of such observations. Similarly test2 looks at the second observation (corresponding to the second lowest value of ex_ret) and asks whether it is the second smallest value of `x' by counting up the number of `x' values that are smaller. Similar reasonings apply to the creation of test3 and test4 with regard to the highest values. If there is a coincidence of the two lowest values of `x' with the two lowest values of ex_ret and the two highest values of `x' with the two highest values of ex_ret, then test1 will be 0, test2 and test3 will be 1, and test4 will be 0 in the final observation. So we set `x'_coincide accordingly.

Note: This code will not perform correctly if ex-ret or any of the `x' variables contains missing values. There are other approaches to this problem that are more robust to this problem, but they require sorting the data, which will slow things down enormously. The approach here was taken specifically because you have a very large data set and need the code to run as quickly as we can figure out how to make it run.

All of that said, I don't quite grasp the logic of your approach. This correspondence of extreme values is a rather blunt way to see if there is a linear relationship that is being missed, and I think it will misclassify things in both directions. Wouldn't a simple Pearson or Spearman correlation be a better idea? Moreover, given that these four variables lnrvix, r3, dax, and cac are so strongly correlated with each other, I would certainly expect a stepwise regression to throw out things that, on their own, look quite strongly related when there is something else that can substitute for them. I have a lot of objections to stepwise regression, but that isn't one of them.

Anyway, I hope this proves helpful to you.

For your future example data posts, please install the -dataex- program from SSC (also by Robert Picard!). Run -help dataex- to read the instructions for using it, and make it your one and only way to show example data here on the forum. Using -dataex- makes it possible for those who are helping you to create a complete and faithful replica of your Stata example with a simple copy/paste operation.

.
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#25

23 Sep 2017, 17:43

Hi Clyde,

Thanks, this is indeed very helpful. You make a very good point about strong correlation among factors. My data set includes factors from different asset classes so it should not pose any serious problem. For illustration purpose I just included a few factors.

Regarding correspondence of extreme values, I agree that it is a crude test. But the idea here is to understand whether securities are susceptible to certain factors in the tail of distribution (left or right or both) but may not show any significant linear dependence. This is the first step towards a more sophisticated model that will include non-linearity.

Best,
John.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#26

23 Sep 2017, 17:54

I see. Thanks for explaining that.
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#27

23 Sep 2017, 18:06

Thinking through this a little more. You are right, probably looking at low returns and high return coincidence separately is a better idea. It will be a very rare case that both high and low returns coincide.

I can think of five possible options:

1. both low returns coincide
2. only one low return coincide
3. both high returns coincide
4. only one high return coincide
5. no coincidence.

Also, it would be very useful to output the returns of the security and return of the risk factor when they coincide.
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#28

30 Sep 2017, 06:33

Hi Clyde, Rober

One follow up question. I have successfully ran the step-wise regressions and second time around I want to run simple regression with the factors identified by step-wise (for each period and id) but add one constant factor.

Here's the code that I am using (made a couple of changes). But looks like the code ignores the existing factors and just takes the new additional factor in every regression.

// second regression with additional constant variable

capture program drop two
program define two
local xlist R3 dax cac
local retained
foreach v of local xlist {
if !missing(b_`v') {
local retained `retained' `v'
}
}
regress ex_ret additional_factor `retained'
matrix M = r(table)
gen adj_r2_5 = e(r2_a)
gen nobs_5 = e(N)
if e(N) >= 20 {
foreach v in additional_factor retained {
local c = colnumb(M, "`v'")
if !missing(`c') {
gen b5_`v' = M[1, `c']
gen se5_`v' = M[2, `c']
gen t5_`v' = M[3, `c']
gen pw5_`v' = M[4, `c']
}
}
exit
else {
drop _all
}
}
end
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment