Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    That's a bit clearer, but:

    I want identify the two best and worst returns of ex_ret and then check if the risk factors also experience the best and worst returns on those exact months, i.e. if they coincide.
    How do you calculate the returns of the risk factors? Also, does the variable id represent a security?

    Comment


    • #17
      The return of the risk factors are given (monthly time series) - I shared it on one my earlier posts. And Robert used some of the data to create a sample code. Correct. id represent a security.

      Comment


      • #18
        Sorry for being dense, but I don't see it. Which variables in which post give the returns of risk factors lnRVIX, R3, dax, and cac?

        Comment


        • #19
          Sorry. I am including a snapshot for particular security (id) - ex_ret and returns for the risk factors.
          id year month newdate Year Month lnRVIX R3 dax cac ex_ret
          28 1994 1 1/1/1994 1994 1 0.234746 0.030602 -0.03937 0.029462 -0.0645
          28 1994 2 2/1/1994 1994 2 0.408289 -0.02416 -0.03944 -0.04119 -0.0162
          28 1994 3 3/1/1994 1994 3 0.507571 -0.04372 0.019861 -0.06952 0.0803
          28 1994 4 4/1/1994 1994 4 0.55013 0.011437 0.052913 0.040943 -0.0282
          28 1994 5 5/1/1994 1994 5 0.270504 0.011018 -0.05266 -0.06005 0.0378
          28 1994 6 6/1/1994 1994 6 0.398908 -0.02738 -0.04811 -0.05238 0.1099
          28 1994 7 7/1/1994 1994 7 0.293058 0.030994 0.059891 0.107803 -0.0238
          28 1994 9 9/1/1994 1994 9 0.244665 -0.02127 -0.09088 -0.09174 0.0237
          28 1994 10 10/1/1994 1994 10 0.24218 0.016536 0.029765 0.014073 0.0159
          28 1994 11 11/1/1994 1994 11 0.215568 -0.03649 -0.01128 0.037351 0.0493
          28 1994 12 12/1/1994 1994 12 0.428878 0.015572 0.028473 -0.04798 0.0081
          28 1995 1 1/1/1995 1995 1 0.249812 0.021911 -0.0405 -0.04389 0.0373
          28 1995 2 2/1/1995 1995 2 0.143285 0.040796 0.040029 -0.01164 -0.0117
          28 1995 4 4/1/1995 1995 4 0.162083 0.026146 0.048554 0.032451 0.0377
          28 1995 5 5/1/1995 1995 5 0.153643 0.036328 0.037814 0.018387 -0.0084
          28 1995 6 6/1/1995 1995 6 0.224073 0.028918 -0.00394 -0.02586 -0.1104
          28 1995 7 7/1/1995 1995 7 0.157864 0.040155 0.06469 0.04051 0.0154
          28 1995 8 8/1/1995 1995 8 0.179134 0.008876 0.00882 -0.01911 0.0101
          28 1995 9 9/1/1995 1995 9 0.174786 0.038749 -0.02291 -0.05045 -0.0337
          28 1995 10 10/1/1995 1995 10 0.156483 -0.00864 -0.00875 0.014334 0.0305
          28 1995 11 11/1/1995 1995 11 0.147585 0.044351 0.034559 0.00837 -0.018
          28 1996 1 1/1/1996 1996 1 0.312375 0.029025 0.09595 0.079892 0.0184
          28 1996 2 2/1/1996 1996 2 0.297906 0.014751 0.00138 -0.01492 0.0065
          28 1996 3 3/1/1996 1996 3 0.251314 0.010052 0.004981 0.027126 -1E-04
          28 1996 4 4/1/1996 1996 4 0.317465 0.018961 0.007796 0.050968 -0.0456
          28 1996 5 5/1/1996 1996 5 0.222875 0.025592 0.014989 -0.01189 -0.0579
          28 1996 6 6/1/1996 1996 6 0.309375 -0.00323 0.007311 0.024326 0.1171
          28 1996 8 8/1/1996 1996 8 0.313601 0.030336 0.028496 -0.01267 -0.0148

          Comment


          • #20
            I see the risk factors themselves. But I do not see any variables for "returns of the risk factors." What am I missing here?

            Comment


            • #21
              The return of the risk factors are in the table ( in 1994/01dax returned -.03937 etc.). For every id (security) the ex_ret changes every month, but the return of the risk factors remain the same over the history. Hope I am not confusing you again.

              Comment


              • #22
                To be precise, the return of the risk factors does not depend on individual security (id).

                Comment


                • #23
                  So for every security, the return of risk factor "dax" on 1994/01 is -.03937.

                  Comment


                  • #24
                    Oh, so those variables are the returns of the risk factors. They are not the risk factors themselves?

                    Anyway, so I think you want to do this:

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input byte id int year byte month float date int Year byte Month float(lnrvix r3 dax cac ex_ret)
                    28 1994  1 408 1994  1 .234746 .030602 -.03937 .029462 -.0645
                    28 1994  2 409 1994  2 .408289 -.02416 -.03944 -.04119 -.0162
                    28 1994  3 410 1994  3 .507571 -.04372 .019861 -.06952  .0803
                    28 1994  4 411 1994  4  .55013 .011437 .052913 .040943 -.0282
                    28 1994  5 412 1994  5 .270504 .011018 -.05266 -.06005  .0378
                    28 1994  6 413 1994  6 .398908 -.02738 -.04811 -.05238  .1099
                    28 1994  7 414 1994  7 .293058 .030994 .059891 .107803 -.0238
                    28 1994  9 416 1994  9 .244665 -.02127 -.09088 -.09174  .0237
                    28 1994 10 417 1994 10  .24218 .016536 .029765 .014073  .0159
                    28 1994 11 418 1994 11 .215568 -.03649 -.01128 .037351  .0493
                    28 1994 12 419 1994 12 .428878 .015572 .028473 -.04798  .0081
                    28 1995  1 420 1995  1 .249812 .021911  -.0405 -.04389  .0373
                    28 1995  2 421 1995  2 .143285 .040796 .040029 -.01164 -.0117
                    28 1995  4 423 1995  4 .162083 .026146 .048554 .032451  .0377
                    28 1995  5 424 1995  5 .153643 .036328 .037814 .018387 -.0084
                    28 1995  6 425 1995  6 .224073 .028918 -.00394 -.02586 -.1104
                    28 1995  7 426 1995  7 .157864 .040155  .06469  .04051  .0154
                    28 1995  8 427 1995  8 .179134 .008876  .00882 -.01911  .0101
                    28 1995  9 428 1995  9 .174786 .038749 -.02291 -.05045 -.0337
                    28 1995 10 429 1995 10 .156483 -.00864 -.00875 .014334  .0305
                    28 1995 11 430 1995 11 .147585 .044351 .034559  .00837  -.018
                    28 1996  1 432 1996  1 .312375 .029025  .09595 .079892  .0184
                    28 1996  2 433 1996  2 .297906 .014751  .00138 -.01492  .0065
                    28 1996  3 434 1996  3 .251314 .010052 .004981 .027126 -.0001
                    28 1996  4 435 1996  4 .317465 .018961 .007796 .050968 -.0456
                    28 1996  5 436 1996  5 .222875 .025592 .014989 -.01189 -.0579
                    28 1996  6 437 1996  6 .309375 -.00323 .007311 .024326  .1171
                    28 1996  8 439 1996  8 .313601 .030336 .028496 -.01267 -.0148
                    end
                    format %tm date
                    
                    capture program drop program3
                    program define program3
                    //    IDENTIFY, FOR EACH OF lnvrix-cac, WHETHER OR NOT
                    //    ITS TWO HIGHEST AND TWO LOWEST VALUES COINCIDE WITH
                    //    THE TWO HIGHEST AND TWO LOWEST VALUES OF ex_ret
                        local xlist lnrvix r3 dax cac
                        sort ex_ret
                        local size = _N
                        foreach x of local xlist {
                            gen test1 = sum(`x' < `x'[1])
                            gen test2 = sum(`x' < `x'[2]) 
                            gen test3 = sum(`x' > `x'[_N-1])
                            gen test4 = sum(`x' > `x'[_N])
                            gen `x'_coincide = (test1[_N] == 0 & test2[_N] <= 1 & test3[_N] <= 1 ///
                                & test4[_N] == 0)
                            drop test1 test2 test3 test4
                        }
                        exit
                    end
                    
                    rangerun program3, interval(date -23 0) by(id)
                    First, I cleaned up your example data so that we have a real monthly date variable for date, not a string that looks like a daily date. The logic of program3 is as follows. The data are sorted in order of ex_ret. Then for each of the risk factor returns (looped over as `x'), we do four tests. Test 1 asks whether the value of `x' when ex_ret is minimum is also the lowest value of `x'. If it is, there will be no value of `x' < `x'[1], and test1 will be 0 throughout., but if any observations have a value of `x' that is smaller than the first, then the value of test1 in the last observation will be the total number of such observations. Similarly test2 looks at the second observation (corresponding to the second lowest value of ex_ret) and asks whether it is the second smallest value of `x' by counting up the number of `x' values that are smaller. Similar reasonings apply to the creation of test3 and test4 with regard to the highest values. If there is a coincidence of the two lowest values of `x' with the two lowest values of ex_ret and the two highest values of `x' with the two highest values of ex_ret, then test1 will be 0, test2 and test3 will be 1, and test4 will be 0 in the final observation. So we set `x'_coincide accordingly.

                    Note: This code will not perform correctly if ex-ret or any of the `x' variables contains missing values. There are other approaches to this problem that are more robust to this problem, but they require sorting the data, which will slow things down enormously. The approach here was taken specifically because you have a very large data set and need the code to run as quickly as we can figure out how to make it run.

                    All of that said, I don't quite grasp the logic of your approach. This correspondence of extreme values is a rather blunt way to see if there is a linear relationship that is being missed, and I think it will misclassify things in both directions. Wouldn't a simple Pearson or Spearman correlation be a better idea? Moreover, given that these four variables lnrvix, r3, dax, and cac are so strongly correlated with each other, I would certainly expect a stepwise regression to throw out things that, on their own, look quite strongly related when there is something else that can substitute for them. I have a lot of objections to stepwise regression, but that isn't one of them.

                    Anyway, I hope this proves helpful to you.

                    For your future example data posts, please install the -dataex- program from SSC (also by Robert Picard!). Run -help dataex- to read the instructions for using it, and make it your one and only way to show example data here on the forum. Using -dataex- makes it possible for those who are helping you to create a complete and faithful replica of your Stata example with a simple copy/paste operation.




                    .

                    Comment


                    • #25
                      Hi Clyde,

                      Thanks, this is indeed very helpful. You make a very good point about strong correlation among factors. My data set includes factors from different asset classes so it should not pose any serious problem. For illustration purpose I just included a few factors.

                      Regarding correspondence of extreme values, I agree that it is a crude test. But the idea here is to understand whether securities are susceptible to certain factors in the tail of distribution (left or right or both) but may not show any significant linear dependence. This is the first step towards a more sophisticated model that will include non-linearity.

                      Best,
                      John.

                      Comment


                      • #26
                        I see. Thanks for explaining that.

                        Comment


                        • #27
                          Thinking through this a little more. You are right, probably looking at low returns and high return coincidence separately is a better idea. It will be a very rare case that both high and low returns coincide.

                          I can think of five possible options:

                          1. both low returns coincide
                          2. only one low return coincide
                          3. both high returns coincide
                          4. only one high return coincide
                          5. no coincidence.

                          Also, it would be very useful to output the returns of the security and return of the risk factor when they coincide.

                          Comment


                          • #28
                            Hi Clyde, Rober

                            One follow up question. I have successfully ran the step-wise regressions and second time around I want to run simple regression with the factors identified by step-wise (for each period and id) but add one constant factor.

                            Here's the code that I am using (made a couple of changes). But looks like the code ignores the existing factors and just takes the new additional factor in every regression.

                            // second regression with additional constant variable

                            capture program drop two
                            program define two
                            local xlist R3 dax cac
                            local retained
                            foreach v of local xlist {
                            if !missing(b_`v') {
                            local retained `retained' `v'
                            }
                            }
                            regress ex_ret additional_factor `retained'
                            matrix M = r(table)
                            gen adj_r2_5 = e(r2_a)
                            gen nobs_5 = e(N)
                            if e(N) >= 20 {
                            foreach v in additional_factor retained {
                            local c = colnumb(M, "`v'")
                            if !missing(`c') {
                            gen b5_`v' = M[1, `c']
                            gen se5_`v' = M[2, `c']
                            gen t5_`v' = M[3, `c']
                            gen pw5_`v' = M[4, `c']
                            }
                            }
                            exit
                            else {
                            drop _all
                            }
                            }
                            end

                            Comment

                            Working...
                            X