Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time series operators to find optimal lag in a regression

    I am doing a study correlating stock market (S&P 500) valuations to future annual market returns and want to use time series operators to find the lag that produces the best fit. Being new to Stata my first approach was to generate 20 new variables representing forward annual returns for each year 1 through 20. My data is monthly so the variable, for example, for 6-year forward returns is: sp500_6yf_ret = (((f72.sp500/sp500)^(1/6)) -1). I can then do 20 correlations of the valuation variable against each of these 20 return variables and find the time period with the highest correlation. The approach works but is sloppy and the output is limiting. I've searched the documentation and Statlist for help on this but I am stumped. Is there a single time series operator that would allow me to iterate/loop through the S&P500 monthly price data to generate the time lag that best correlates valuation with future annual returns?

  • #2
    You don't really say what you find "sloppy" or "limiting" about what you have done. I'm feeling my telepathic skills are strong today, so I'll guess that what you would like is a loop over 20 years that does all the calculations and just spits out the optimal lag and the corresponding correlation at the end. If so, try this:

    Code:
    local best_so_far -2
    local which
    gen returns = .
    forvalues y = 1/20 {
        quietly {
            local m = 12*`y'
            replace returns = (F`m'.sp500/sp500)^(1/`y') - 1
            corr valuation returns
            local correlation `r(rho)'
            if abs(`correlation') > abs(`best_so_far') {
                local which = `y'
                local best_so_far = `correlation'
            }
        }
    }
    
    display "Strongest correlation = %05.3f `best_so_far'"
    display "Obtained with lag of `which' years"
    Note: I have interpreted "produces the best fit" in a statistical sense. If the strongest correlation (largest magnitude) happens to be negative, that is what you will get. Also, if there are two different lags that produce equally strong (in magnitude) correlations, this code will show you the smaller lag.

    You did not provide sample data, so the code is untested. Beware of errors.

    Comment


    • #3
      Thank you Clyde; looping over the 20 years is exactly what I had in mind. I had to initiate best_so_far to 0 (instead of -2) and small matter... the display for Strongest Correlation doesn't format.
      local best_so_far 0 . local which . gen returns = . (831 missing values generated) . forvalues y = 1/20 { 2. quietly { 3. local m = 12*`y' 4. replace returns = (F`m'.sp500/sp500)^(1/`y') - 1 5. corr log_CAPE returns 6. local correlation `r(rho)' 7. if abs(`correlation') > abs(`best_so_far') { 8. local which = `y' 9. local best_so_far = `correlation' 10. } 11. } 12. } . . . display "Strongest correlation = %05.3f `best_so_far'" Strongest correlation = %05.3f -.8342929884608912 . display "Obtained with lag of `which' years" Obtained with lag of 17 years

      Comment


      • #4
        Sorry about that formatting. The crux of the issue is that "%05.3f" shows up as output instead of formatting the output.

        Comment


        • #5
          Oh, yes, sorry. That line should be
          Code:
          display "Strongest correlation = " %05.3f `best_so_far'

          Comment


          • #6
            Thank you again. Sorry for leaning so heavily on the forum but Stata and programming are quite new to me. Using the above loop to generate the optimal lag with the corresponding correlation, I've then been trying to use the xcorr operator in an attempt to generate a correlogram for this model (xcorr valuation, returns) but run into issues because returns is not a single variable but a dynamic variable replaced 20x as the loop runs. How would I xcorr two variables in a model like this when 1 variable (S&P500 annual returns over 20 different time period) is dynamic?

            Also, as another iteration on this data, I've been exploring rangestat and rangerun to develop a rolling 12-year correlation window between valuation and returns to see how it changes over time. I've tried to reuse code I have found in other rangerun threads, but again the nature of this dataset stumps me. The dataset below has sp500 prices and 2 of several valuation measures I am studying.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(date3 sp500 log_CAPE log_price_to_dividend)
            -120 16.88 2.3749058 2.6863675
            -119 17.21   2.38968 2.6970706
            -118 17.35   2.38968  2.696589
            -117 17.84 2.4141264  2.715929
            -116 18.44  2.438863  2.740569
            -115 18.74 2.4466856  2.748339
            -114 17.38 2.3551776  2.640209
            -113 18.43  2.401525 2.6593375
            -112 19.08 2.4283364  2.663462
            -111 19.87  2.456164 2.6671276
            -110 19.83  2.445819  2.636539
            -109 19.75  2.425687  2.597891
            -108 21.21 2.4765384 2.6556966
            -107    22  2.496506 2.6855774
            -106 21.63  2.471484  2.655371
            -105 21.92  2.480731  2.662132
            -104 21.93 2.4731715  2.649601
            -103 21.55  2.452728   2.62569
            -102 21.93  2.466403  2.649601
            -101 22.89  2.506342 2.7054324
            -100 23.48  2.520917  2.737439
             -99 23.36  2.510412  2.758983
             -98 22.71  2.472328  2.751242
             -97 23.41  2.497329  2.809574
             -96 24.19  2.528126 2.8423495
             -95 23.75  2.514465  2.816926
             -94 23.81  2.514465  2.819449
             -93 23.74  2.504709  2.809487
             -92 23.73  2.501436  2.802097
             -91 24.38  2.521721  2.822199
             -90 25.08  2.539237  2.850507
             -89 25.18  2.540026 2.8544865
             -88 24.78  2.520113  2.838473
             -87 24.26  2.495682  2.824186
             -86 25.03  2.523326 2.8694184
             -85 26.04   2.55955  2.916044
             -84 26.18 2.5657184  2.921406
             -83 25.86  2.554122  2.909108
             -82 25.99  2.551786  2.914122
             -81 24.71  2.498152 2.8636184
             -80 24.84  2.496506  2.861798
             -79 23.95  2.452728 2.8253114
             -78 24.29 2.4638534  2.839408
             -77 24.39  2.461297  2.843516
             -76 23.27  2.410542  2.796508
             -75 23.97  2.432736 2.8191285
             -74  24.5 2.4544475   2.83403
             -73 24.83 2.4638534  2.840489
             -72 25.46  2.484907  2.858672
             -71 26.02  2.503074  2.880429
             -70 26.57  2.519308   2.89452
             -69 27.63  2.558002  2.940466
             -68 28.73  2.588516  2.979505
             -67 28.96  2.592265  2.994352
             -66 30.13   2.62684  3.027085
             -65 30.73   2.64191  3.046803
             -64 31.45 2.6644466 3.0631366
             -63 32.18 2.6823905  3.072569
             -62 33.44 2.7160184 3.0910425
             -61 34.97  2.759377  3.122708
             -60  35.6 2.7719636  3.134091
             -59 36.79 2.7997174  3.166971
             -58  36.5  2.786245 3.1526265
             -57 37.76   2.81481 3.1865644
             -56  37.6  2.804572  3.175928
             -55 39.78  2.854745 3.2322886
             -54 42.69  2.915064 3.2902305
             -53 42.43   2.90252  3.277852
             -52 44.34  2.935982  3.309461
             -51 42.11  2.877512  3.251705
             -50 44.95  2.935982  3.316971
             -49 45.37  2.941276  3.320155
             -48 44.15 2.9063544  3.274769
             -47 44.43   2.90526 3.2632866
             -46 47.49 2.9637256  3.312398
             -45 48.05 2.9637256 3.3126264
             -44 46.54  2.919931  3.263699
             -43 46.27 2.8992214  3.246707
             -42 48.78 2.9370434 3.2939935
             -41 48.49  2.926918  3.277042
             -40 46.84  2.881443  3.236972
             -39 46.24  2.857619  3.240518
             -38 45.76 2.8402474  3.252431
             -37 46.44 2.8449094  3.284276
             -36 45.43 2.8166056 3.2622876
             -35 43.47 2.7625384   3.22395
             -34 44.03  2.766319   3.23675
             -33 45.05  2.780061  3.259652
             -32 46.78  2.809403  3.297334
             -31 47.55 2.8172035 3.3136604
             -30 48.51  2.825537  3.327885
             -29 45.84 2.7644305  3.265541
             -28 43.98   2.71866  3.218421
             -27 41.24  2.649715  3.148429
             -26 40.35  2.620311  3.120978
             -25 40.33 2.6152036   3.11488
             -24 41.12  2.623944 3.1398814
             -23 41.26  2.623218   3.14328
             -22 42.11  2.634045  3.169306
             -21 42.34  2.632608 3.1804185
            end
            format %tmMon_CCYY date3

            Comment


            • #7
              For your first question, the trick is to retain the ultimately selected set of returns in a different variable, here called chosen_returns.

              Code:
              local best_so_far -2
              local which
              gen returns = .
              gen chosen_returns = .
              forvalues y = 1/20 {
                  quietly {
                      local m = 12*`y'
                      replace returns = (F`m'.sp500/sp500)^(1/`y') - 1
                      corr valuation returns
                      local correlation `r(rho)'
                      if abs(`correlation') > abs(`best_so_far') {
                          local which = `y'
                          local best_so_far = `correlation'
                         replace chosen_returns = returns
                      }
                  }
              }
              
              display "Strongest correlation = " %05.3f `best_so_far'
              display "Obtained with lag of `which' years"
              xcorr valuation returns
              For your second question, I interpret it as meaning that you would like to do correlations between the period returns of sp500 (i.e. ratio of sp500 to its immediately preceding periods' value) and log_CAPE, and another correlation of period returns of sp500 to log_price_to_dividend (and perhaps, in your real data, other variables as well) using a rolling 12 year window.

              -rangestat- is indeed a good tool for this. You need to do this inside a small lope over the variables, because -rangestat- always creates the same variable names for correlations, so you can only do them one at a time and must then rename the result variables to indicate which valuation variable is in the correlation.

              Code:
              tsset date3
              gen returns = sp500/L1.sp500 - 1
              foreach v of varlist log_CAPE log_price_to_dividend {
                  rangestat (corr) returns `v', interval(date3 -143 0)
                  rename corr_nobs corr_`v'_nobs
                  rename corr_x corr_`v'
              }
              Note: This code assumes that the rolling 12 year window refers to the 143 months preceding plus the current month. If you had different window boundaries in mind, change the numbers in the -interval()- option accordingly.

              Comment

              Working...
              X