Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rolling 24 month step-wise rgression on panel data

    Hi,

    I am new to Stata. I have a monthly panel data consisting of securities (ids) and factors for 20 years. I am trying to estimate a rolling 24 month step-wise regression and store adj r2, t-stat, p values, coefficients etc. for each id and 24 month period.

    I am not able to get the code work. Any help will be much appreciated.

    xtset id date
    tempfile results
    levelsof id, local(ids)
    foreach id of local ids {
    keep if id==`id'
    quietly rolling: window(24) stepsize(1) saving(C:\Users\John\`results'.dta, replace), ///
    stepwise, pr(.15) pe(.10): regress y x1 x2 x3
    merge id using "`results'", sort update replace nokeep
    drop _merge
    }

    Thanks,
    John

  • #2
    -rolling- is basically incompatible with -stepwise-. -rolling- requires that the same statistics be returned in each run. When doing stepwise regression you are specifically asking for different models for each id. So that isn't going to fly.

    I believe that a solution to your problem could be written based on Robert Picard's -rangerun- program. But it would be somewhat complicated to code, and as I am among those people who view stepwise regression as "voodoo statistics," I'm hesitant. See, for a quick summary, https://www.stata.com/support/faqs/s...ems/index.html. Why do you specifically want to do this with stepwise regression? All of the many drawbacks of stepwise regression are only exacerbated in the context of a rolling calculation. In addition, apart from my own prejudice against the procedure, I've been very active on Statalist and I've never seen anybody seek to do this kind of rolling stepwise regression.

    Comment


    • #3
      Hi Clyde,

      I understand the drawbacks of step-wise regression. It is part of my research project where I am trying to replicate an academic study and show 1) some of the drawbacks of step-wise can be overcome by lasso (more sophisticated methods) and 2) incorporate non-linearity into the methodology.

      Any clues/suggestions is much appreciated.

      Thanks,
      John.

      Comment


      • #4
        Here's an example of how you could do this. Since you don't provide a data example, I have structured this around the Grunfeld data set. And due to the shorter time frame therein, I have used a window of 6 years (from 5 years before through current). You can make the appropriate adjustments to the code for your actual window and variable names.

        Code:
        clear*
        
        capture program drop one_window
        program define one_window
            stepwise, pr(.15) pe(.10): regress mvalue invest kstock time
            matrix M = r(table)
            gen adj_r2 = e(r2_a)
            foreach v of varlist invest kstock time {
                local c = colnumb(M, "`v'")
                if !missing(`c') {
                    gen b_`v' = M[1, `c']
                    gen se_`v' = M[2, `c']
                    gen t_`v' = M[3, `c']
                    gen p_`v' = M[4, `c']
                }
                else {
                    gen b_`v' = .
                    gen se_`v' = .
                    gen t_`v' = .
                    gen p_`v' = .
                }
            }
            exit
        end
        
        webuse grunfeld, clear
        
        rangerun one_window, interval(year -5 0) by(company)
        To run this you will need to install -rangerun- from SSC. I think that in order to use -rangerun- you will also need Robert Picard, Nick Cox & Roberto Ferrer's -rangestat-, also available from SSC.

        Comment


        • #5
          Thank you so much. This is a great help. Let me try to run this with my data. Thanks again. John.

          Comment


          • #6
            Hi Clyde,

            I am having some problem with the program (it is running but not updating the panel for some reason). May be I am not doing it right (I just started using stata).
            But if run it without rangerun, it is updating the panel. I have down loaded the required package.

            I open the panel data and create a global xlsit (my varlist) and then run the program. Do I need to load the panel data again? this is where I am not so sure.

            my updated program reads

            use "C:\Users\John\hf_database.dta", clear
            xtset id newdate
            sort id newdate
            global xlist x1 x2 x3 x4 x5

            capture program drop one_window
            program define one_window
            stepwise, pr(.15) pe(.10) forward : regress ex_ret $xlist
            matrix M = r(table)
            gen adj_r2 = e(r2_a)
            foreach v of varlist $xlist {
            local c = colnumb(M, "`v'")
            if !missing(`c') {
            gen b_`v' = M[1, `c']
            gen se_`v' = M[2, `c']
            gen t_`v' = M[3, `c']
            gen p_`v' = M[4, `c']
            }
            else {
            gen b_`v' = .
            gen se_`v' = .
            gen t_`v' = .
            gen p_`v' = .
            }
            }
            exit
            end

            use "C:\Users\John\hf_database.dta", clear -> is this still required? (even if I omit this statement it is still not working)

            rangerun one_window, interval(newdate -23 0) by(id)



            Thanks,
            John.


            Comment


            • #7
              In terms of the mechanics of using rangerun, you should first try on a small sample with the verbose option to understand what's happening. Since you have not posted a sample dataset to work with, I'm going to generate a very crude one that seems to match what you are working with. I don't see the need for globals here so I've substituted locals instead. Clyde's code can also be simplified a bit. Finally, results from the rolling window for the current observation are taken from the last observation when your program terminates. To help make sense of the output, I list the last observation.

              Code:
              clear all
              set seed 23
              set obs 10
              gen long id = _n
              expand 40
              bysort id: gen newdate = _n
              xtset id newdate
              gen ex_ret = runiform()
              forvalues i = 1/5 {
                  gen x`i' = runiform()
              }
              
              program define one_window
                  local xlist x1 x2 x3 x4 x5
                  stepwise, pr(.15) pe(.10) forward : regress ex_ret `xlist'
                  matrix M = r(table)
                  gen adj_r2 = e(r2_a)
                  gen nobs = e(N)
                  foreach v of varlist `xlist' {
                      local c = colnumb(M, "`v'")
                      if !missing(`c') {
                          gen b_`v' = M[1, `c']
                          gen se_`v' = M[2, `c']
                          gen t_`v' = M[3, `c']
                          gen p_`v' = M[4, `c']
                      }
                  }
                  // results for the current window are picked-up from the last observation
                  list in l
              end
              
              keep if id == 1
              rangerun one_window, interval(newdate -23 0) by(id) verbose

              Comment


              • #8
                Hi Robert,

                Thanks. This is very helpful. I am able to run the program and the panel data is now updated with the regression results.

                But it looks like the stepwise is ignoring the window (24 month), it should only have results starting month 24 onwards?

                also note, the newdate variable in stata is of the format 1994m1, m2... (somehow it is showing daily when I export to excel).


                see below results for one sample id:
                id year month newdate Year Month time svol lnRVIX R3 dax cac MSCI_EAFE Gasoline natgas WTICrude Brent Avg_oil Gold gsci _10YRMUNI Mortgage Barclays_Agg SP500 topix msciworld FTSE100 msciem NASDAQ100 NEKKIE225 ex_ret adj_r2 nobs b_R3 se_R3 t_R3 p_R3 b_smb se_smb t_smb p_smb
                28 1994 1 1/1/1994 1994 1 97 0.170018 0.234746 0.030602 -0.03937 0.029462 0.084761 0.097327 0.514563 0.073996 0.112855 0.199685 -0.03199 0.081403 0.017411 0.011866 0.016247 0.034002 0.132009 0.066122 0.022446 0.018194 0.03947 0.161442 -0.0645 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1994 2 2/1/1994 1994 2 98 0.493731 0.408289 -0.02416 -0.03944 -0.04119 -0.00256 0.03666 -0.22756 -0.04856 -0.0994 -0.08471 0.009385 -0.02173 -0.03867 -0.00834 -0.02085 -0.02715 0.001681 -0.01278 -0.04405 -0.01779 -0.00442 -0.01146 -0.0162 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1994 3 3/1/1994 1994 3 99 0.502945 0.507571 -0.04372 0.019861 -0.06952 -0.04286 -0.0057 -0.15768 0.01931 0.018643 -0.03136 0.020822 -0.01194 -0.05456 -0.03115 -0.0297 -0.0436 -0.03845 -0.04294 -0.06623 -0.09049 -0.07087 -0.04427 0.0803 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1994 4 4/1/1994 1994 4 100 0.545671 0.55013 0.011437 0.052913 0.040943 0.042652 0.120468 0.009852 0.14479 0.132504 0.101904 -0.03502 0.026217 0.015977 -0.00886 -0.00966 0.012824 0.025694 0.031089 0.015938 -0.02 -0.02536 0.032091 -0.0282 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1994 5 5/1/1994 1994 5 101 0.359179 0.270504 0.011018 -0.05266 -0.06005 -0.00553 0.089085 -0.14634 0.08156 0.067227 0.022883 0.031109 0.014884 0.011583 0.004766 -0.00017 0.016418 0.049423 0.002745 -0.04775 0.034225 0.015003 0.063286 0.0378 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1994 6 6/1/1994 1994 6 102 0.427691 0.398908 -0.02738 -0.04811 -0.05238 0.014356 -0.01128 0.177143 0.05847 0.107813 0.083036 0.000516 0.024604 -0.00621 -0.00261 -0.00268 -0.0245 -0.0052 -0.0026 -0.01256 -0.02756 -0.04896 -0.01572 0.1099 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1994 7 7/1/1994 1994 7 103 0.227066 0.293058 0.030994 0.059891 0.107803 0.009837 0.124572 -0.14078 0.048012 0.028431 0.01506 -0.01057 0.017219 0.02424 0.024107 0.024106 0.032826 -0.02144 0.019189 0.059708 0.062178 0.027366 -0.00942 -0.0238 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1994 9 9/1/1994 1994 9 105 0.226541 0.244665 -0.02127 -0.09088 -0.09174 -0.03128 -0.08682 -0.04138 0.043182 0.062112 -0.00573 0.019547 -0.00941 -0.01928 -0.01706 -0.01778 -0.02447 -0.03621 -0.0261 -0.06431 0.011365 -0.01018 -0.05161 0.0237 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1994 10 10/1/1994 1994 10 106 0.230041 0.24218 0.016536 0.029765 0.014073 0.033533 0.233794 0.244604 -0.01089 0.005848 0.118338 -0.02362 0.004501 -0.02097 -0.00069 -0.00109 0.022473 0.004951 0.028628 0.025488 -0.01804 0.04875 0.021764 0.0159 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1994 11 11/1/1994 1994 11 107 0.247198 0.215568 -0.03649 -0.01128 0.037351 -0.04784 -0.16322 -0.03468 -0.00551 0.002907 -0.05013 -0.0039 -0.03272 -0.02738 -0.00376 -0.00269 -0.03642 -0.0405 -0.04319 -0.00288 -0.05199 -0.01992 -0.04572 0.0493 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1994 12 12/1/1994 1994 12 108 0.277318 0.428878 0.015572 0.028473 -0.04798 0.006496 0.078485 0 -0.01606 -0.05971 0.000679 -0.00065 0.033857 0.026309 0.009598 0.008383 0.014825 0.025717 0.009871 -0.00122 -0.08032 -0.00136 0.033941 0.0081 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1995 1 1/1/1995 1995 1 109 0.184747 0.249812 0.021911 -0.0405 -0.04389 -0.03818 -0.00291 -0.15569 0.039955 0.050555 -0.01702 -0.02103 -0.02485 0.037531 0.025706 0.023961 0.025951 -0.06103 -0.01483 -0.02305 -0.10639 0.002622 -0.05442 0.0373 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1995 2 2/1/1995 1995 2 110 0.202359 0.143285 0.040796 0.040029 -0.01164 -0.00261 0.008758 0.056738 0.002165 0.024061 0.02293 0.004804 0.008077 0.040543 0.030529 0.028667 0.038958 -0.07871 0.014773 0.009463 -0.02565 0.067032 -0.0856 -0.0117 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1995 4 4/1/1995 1995 4 112 0.174462 0.162083 0.026146 0.048554 0.032451 0.037875 0.269141 0.070968 0.061522 0.111857 0.128372 -0.00804 0.030622 0.001714 0.016904 0.016749 0.029442 0.018306 0.035048 0.027463 0.044861 0.050117 0.041314 0.0377 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1995 5 5/1/1995 1995 5 113 0.187087 0.153643 0.036328 0.037814 0.018387 -0.01166 -0.05002 0.018072 -0.07269 -0.10312 -0.05194 -0.01145 -0.00845 0.044603 0.037393 0.046256 0.039978 -0.0583 0.008745 0.036063 0.053199 0.039484 -0.08151 -0.0084 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1995 6 6/1/1995 1995 6 114 0.186596 0.224073 0.028918 -0.00394 -0.02586 -0.01727 -0.09234 -0.12426 -0.07945 -0.07123 -0.09182 0.004035 -0.01784 -0.00862 0.006722 0.008698 0.023227 -0.04538 -0.00011 0.001893 0.002958 0.102295 -0.05956 -0.1104 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1995 7 7/1/1995 1995 7 115 0.213975 0.157864 0.040155 0.06469 0.04051 0.062528 -0.10352 -0.0473 0.013809 -0.02657 -0.04089 -0.00674 0.026115 0.020453 0.002012 -0.00264 0.033161 0.116254 0.050231 0.046264 0.022447 0.057339 0.148796 0.0154 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1995 8 8/1/1995 1995 8 116 0.25676 0.179134 0.008876 0.00882 -0.01911 -0.0379 0.102528 0.156028 0.015323 0.011166 0.071262 -0.00209 0.021198 0.01881 0.012189 0.014294 0.002517 0.068601 -0.0221 0.009604 -0.02356 0.013869 0.086325 0.0101 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1995 9 9/1/1995 1995 9 117 0.184015 0.174786 0.038749 -0.02291 -0.05045 0.01979 -0.00451 0.01227 -0.01956 0.010429 -0.00034 0.003793 0.004224 0.008815 0.01034 0.011518 0.0422 0.010487 0.029321 0.014266 -0.00475 0.014408 -0.01127 -0.0337 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1995 10 10/1/1995 1995 10 118 0.194106 0.156483 -0.00864 -0.00875 0.014334 -0.02662 -0.13423 0.084848 0.007412 0.007286 -0.00867 -0.00261 0.010065 0.015863 0.010442 0.015353 -0.00357 -0.01878 -0.01556 0.007947 -0.03828 0.023416 -0.01443 0.0305 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1995 11 11/1/1995 1995 11 119 0.160453 0.147585 0.044351 0.034559 0.00837 0.028085 0.167609 0.22905 0.033956 0.037975 0.117148 0.013325 0.026997 0.018385 0.013395 0.017651 0.043888 0.050415 0.034912 0.040627 -0.01783 -0.00845 0.061728 -0.018 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1996 1 1/1/1996 1996 1 121 0.257208 0.312375 0.029025 0.09595 0.079892 0.004343 -0.04506 -0.04217 -0.0911 -0.11832 -0.07416 0.048314 -0.00162 0.013769 0.008797 0.007776 0.034039 0.02249 0.018268 0.021399 0.071082 0.027055 0.047543 0.0184 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1996 2 2/1/1996 1996 2 122 0.294591 0.297906 0.014751 0.00138 -0.01492 0.00363 0.151304 -0.00629 0.103041 0.111045 0.089775 -0.01084 0.040088 -0.00557 -0.00968 -0.02035 0.009267 -0.03247 0.006274 -0.0057 -0.0159 0.052398 -0.03303 0.0065 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1996 3 3/1/1996 1996 3 123 0.290102 0.251314 0.010052 0.004981 0.027126 0.021489 0.045305 -0.21835 0.093925 0.107429 0.007076 -0.01196 0.065005 -0.01687 -0.00422 -0.00817 0.00963 0.053112 0.016824 0.001029 0.007789 -0.0211 0.063675 -1E-04 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1996 4 4/1/1996 1996 4 124 0.270828 0.317465 0.018961 0.007796 0.050968 0.02932 0.150207 -0.11741 -0.0224 -0.05792 -0.01188 -0.0121 0.057731 -0.00482 -0.00329 -0.00661 0.014738 0.046154 0.023695 0.035981 0.039983 0.093556 0.029638 -0.0456 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1996 5 5/1/1996 1996 5 125 0.266528 0.222875 0.025592 0.014989 -0.01189 -0.01816 -0.17518 0.091743 -0.05632 -0.06967 -0.05236 -0.00204 -0.0079 -0.00383 -0.00341 -0.00238 0.02579 -0.01857 0.001044 -0.01492 -0.00447 0.038486 -0.00386 -0.0579 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1996 6 6/1/1996 1996 6 126 0.26208 0.309375 -0.00323 0.007311 0.024326 0.005873 0.025806 0.10084 0.058169 0.059471 0.061072 -0.02225 0.038798 0.012989 0.016095 0.015816 0.003822 0.019096 0.005236 -0.00667 0.006244 -0.02179 0.026168 0.1171 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069
                28 1996 8 8/1/1996 1996 8 128 0.257048 0.313601 0.030336 0.028496 -0.01267 0.002449 0.004172 -0.181 0.08802 0.112383 0.005895 0.003244 0.032202 3.57E-05 -1.7E-05 -0.00197 0.021092 -0.02573 0.011685 0.052324 0.025597 0.043333 -0.02542 -0.0148 0.542836 28 -1.18254 0.261942 -4.51452 0.000131 -1.02415 0.276894 -3.69873 0.001069

                Comment


                • #9
                  Please follow the FAQ and present data examples using dataex. To install it, type in Stata's Command window:
                  Code:
                  ssc install dataex
                  Rather than speculating on what's going on, I extracted some variables from what you have posted and adjusted the code in #7 accordingly. This runs on my machine although I have no idea if the results make sense.

                  Code:
                  clear all
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input byte id str9 newdate float(ex_ret svol lnrvix r3 dax cac)
                  28 "1/1/1994"  -.0645 .170018 .234746 .030602 -.03937 .029462
                  28 "2/1/1994"  -.0162 .493731 .408289 -.02416 -.03944 -.04119
                  28 "3/1/1994"   .0803 .502945 .507571 -.04372 .019861 -.06952
                  28 "4/1/1994"  -.0282 .545671  .55013 .011437 .052913 .040943
                  28 "5/1/1994"   .0378 .359179 .270504 .011018 -.05266 -.06005
                  28 "6/1/1994"   .1099 .427691 .398908 -.02738 -.04811 -.05238
                  28 "7/1/1994"  -.0238 .227066 .293058 .030994 .059891 .107803
                  28 "9/1/1994"   .0237 .226541 .244665 -.02127 -.09088 -.09174
                  28 "10/1/1994"  .0159 .230041  .24218 .016536 .029765 .014073
                  28 "11/1/1994"  .0493 .247198 .215568 -.03649 -.01128 .037351
                  28 "12/1/1994"  .0081 .277318 .428878 .015572 .028473 -.04798
                  28 "1/1/1995"   .0373 .184747 .249812 .021911  -.0405 -.04389
                  28 "2/1/1995"  -.0117 .202359 .143285 .040796 .040029 -.01164
                  28 "4/1/1995"   .0377 .174462 .162083 .026146 .048554 .032451
                  28 "5/1/1995"  -.0084 .187087 .153643 .036328 .037814 .018387
                  28 "6/1/1995"  -.1104 .186596 .224073 .028918 -.00394 -.02586
                  28 "7/1/1995"   .0154 .213975 .157864 .040155  .06469  .04051
                  28 "8/1/1995"   .0101  .25676 .179134 .008876  .00882 -.01911
                  28 "9/1/1995"  -.0337 .184015 .174786 .038749 -.02291 -.05045
                  28 "10/1/1995"  .0305 .194106 .156483 -.00864 -.00875 .014334
                  28 "11/1/1995"  -.018 .160453 .147585 .044351 .034559  .00837
                  28 "1/1/1996"   .0184 .257208 .312375 .029025  .09595 .079892
                  28 "2/1/1996"   .0065 .294591 .297906 .014751  .00138 -.01492
                  28 "3/1/1996"  -.0001 .290102 .251314 .010052 .004981 .027126
                  28 "4/1/1996"  -.0456 .270828 .317465 .018961 .007796 .050968
                  28 "5/1/1996"  -.0579 .266528 .222875 .025592 .014989 -.01189
                  28 "6/1/1996"   .1171  .26208 .309375 -.00323 .007311 .024326
                  28 "8/1/1996"  -.0148 .257048 .313601 .030336 .028496 -.01267
                  end
                  
                  * convert string date to monthly date; see help datetime
                  gen mdate = mofd(daily(newdate,"MDY"))
                  format %tm mdate
                  
                  program define one_window
                      local xlist svol lnrvix r3 dax cac
                      stepwise, pr(.15) pe(.10) forward : regress ex_ret `xlist'
                      matrix M = r(table)
                      gen adj_r2 = e(r2_a)
                      gen nobs = e(N)
                      foreach v of varlist `xlist' {
                          local c = colnumb(M, "`v'")
                          if !missing(`c') {
                              gen b_`v' = M[1, `c']
                              gen se_`v' = M[2, `c']
                              gen t_`v' = M[3, `c']
                              gen p_`v' = M[4, `c']
                          }
                      }
                      // results for the current window are picked-up from the last observation
                      list in l
                  end
                  
                  rangerun one_window, interval(mdate -23 0) by(id) verbose
                  The observation count never reaches 24 with this data since you are missing some months. Here's what I get:
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input byte id float(mdate adj_r2 nobs b_r3 se_r3 t_r3 p_r3)
                  28 408         .  .          .         .          .           .
                  28 409         .  .          .         .          .           .
                  28 410         .  .          .         .          .           .
                  28 411         .  .          .         .          .           .
                  28 412         .  .          .         .          .           .
                  28 413         .  .          .         .          .           .
                  28 414  .4743649  7 -1.5984075  .6310986 -2.5327384   .05235492
                  28 416  .4729702  8  -1.534102  .5684987 -2.6985145  .035646386
                  28 417  .4424066  9 -1.4360656  .5297955 -2.7106035   .03017267
                  28 418  .4657262 10  -1.368374  .4600962  -2.974104  .017759517
                  28 419 .46201375 11  -1.322974  .4272588  -3.096423   .01279611
                  28 420  .3729046 12 -1.1718049  .4267125  -2.746123  .020614965
                  28 421  .3877695 13 -1.0987195  .3746502  -2.932654  .013625584
                  28 423  .3224571 14  -.9956002  .3713743 -2.6808536   .02000534
                  28 424  .3413152 15  -.9799694   .341089 -2.8730605   .01306619
                  28 425  .3312334 16 -1.1705235  .4031653  -2.903334  .011565593
                  28 426 .29094696 17 -1.0615203 .38593575 -2.7505105  .014870933
                  28 427 .29372427 18 -1.0615412 .37368205 -2.8407605  .011804475
                  28 428 .32056385 19 -1.0848535 .35211095  -3.080999  .006774712
                  28 429  .3304922 20 -1.0883465  .3378223  -3.221654  .004731689
                  28 430  .3389274 21 -1.0631093  .3169035  -3.354678 .0033285075
                  28 432  .3086555 21  -.9628926   .305578  -3.151053  .005260134
                  28 433   .408671 21 -1.1289345 .29323348 -3.8499506 .0010793929
                  28 434  .3211538 21 -1.0960817  .3388759  -3.234464   .00436339
                  28 435  .3262756 21 -1.1281804  .3451251  -3.268903  .004038377
                  28 436  .3208686 21 -1.1597493 .35877225   -3.23255   .00438218
                  28 437 .21711117 21 -1.0912902  .4265192  -2.558596   .01920547
                  28 439 .21306625 21 -1.0787785  .4259225  -2.532805   .02028486
                  end
                  format %tm mdate

                  Comment


                  • #10
                    Hi Robert,

                    Thanks. This makes sense now. Couple of follow up questions.

                    Is is possible to add one of the factors to be always in the regression? What I am trying to do here is, once I have the results for the step-wise, I want to run the same regressions with factors identified by stepwise but add one constant factor to the mix for every period.
                    If there is less than 24 months data for any id, can we not run the regression or say at least 18 months data is needed (can we specify any options).

                    John.



                    Comment


                    • #11
                      John,

                      -stepwise- has a -lockterm1- option which forces the first predictor to be included in the model. So just add that variable as the first predictor in the regression and use that option.

                      As for not running the regression unless there are at least 18 months data, it is probably easier to just run the stepwise regression and discard the results if there are fewer than 18 months than to check the availability ahead of time (though it can be done that way as well.)

                      So the program would be revised as:

                      Code:
                      program define one_window
                          local xlist svol lnrvix r3 dax cac
                          stepwise, pr(.15) pe(.10) forward : regress ex_ret keep_this_variable `xlist', lockterm1
                          if e(N) >= 18 {
                              matrix M = r(table)
                              gen adj_r2 = e(r2_a)
                              gen nobs = e(N)
                              foreach v of varlist `xlist' {
                                  local c = colnumb(M, "`v'")
                                  if !missing(`c') {
                                      gen b_`v' = M[1, `c']
                                      gen se_`v' = M[2, `c']
                                      gen t_`v' = M[3, `c']
                                      gen p_`v' = M[4, `c']
                                  }
                              }
                              // results for the current window are picked-up from the last observation
                              list in l
                          }
                          else {
                              drop _all // reject this window
                          }
                      end

                      Comment


                      • #12
                        Hi Robert,

                        Thanks. This is incredibly useful information.

                        One clarification, I don't want to run the stepwise again with the constant factor (it would be very costly). The second time around I want to run simple regression with the factors identified by stepwise (for each period and id) but add one constant factor.

                        Also I am assuming that the stepwise includes an intercept. Can I retrieve the intercept stats and residual (both in sample and predicted for every period) from the regression also.

                        Best,
                        John.

                        Comment


                        • #13
                          Hi Clyde, Robert

                          Thanks for all the help.

                          Another follow up. Since rangerun is so much faster than bystats, can we use the framework to identify if the two worst and best returns of the security matches with that of the independent factors (for every factor we would check if the two best and worst returns of the security happen at the same time for a rolling 24 month period by security). The goal here is to find out if the linear stepwise regression is missing any non-linear relationship (based on evidence from tail correlations).

                          Comment


                          • #14
                            Can I retrieve the intercept stats and residual (both in sample and predicted for every period) from the regression also.
                            Yes, just add _cons to the list of things you extract results from r(table) for. So -foreach v in _cons `xlist' {-. You will now get b__cons se__cons, etc., in addition to the other results.

                            The second time around I want to run simple regression with the factors identified by stepwise (for each period and id) but add one constant factor.
                            So start with the results of the previous, and do this:
                            Code:
                            capture program drop two
                            program define two
                                local xlist svol lnrvix r3 dax cac
                                local retained
                                foreach v of local xlist {
                                    if !missing(b_`v') {
                                        local retained `retained' `v'
                                    }
                                }
                                regress ex_ret additional_variable `retained'
                                matrix M = r(table)
                                gen adj_r2_2 = e(r2_a)
                                gen nobs_2 = e(N)
                                foreach v in additional_variable `retained' _cons {
                                    local c = colnumb(M, "`v'")
                                    if !missing(`c') {
                                        gen b2_`v' = M[1, `c']
                                        gen se2_`v' = M[2, `c']
                                        gen t2_`v' = M[3, `c']
                                        gen pw_`v' = M[4, `c']
                                    }
                                }
                                exit
                            end
                            
                            rangerun two, interval(date - 23 0) by(id)
                            Again, not tested. Usual caveats apply. The gist of it is that this time instead of doing a stepwise regression each time, we sense which variables are included in the stepwise results by just checking whether the coefficient for that variable is non-missing, and if so we put it in a list of retained variables. Then we regress on the retained variables and the additional variable. From there, the processing is really the same thing as before. The results are named b2_* se2_*, etc., to distinguish them from the earlier results.

                            Another follow up. Since rangerun is so much faster than bystats, can we use the framework to identify if the two worst and best returns of the security matches with that of the independent factors (for every factor we would check if the two best and worst returns of the security happen at the same time for a rolling 24 month period by security). The goal here is to find out if the linear stepwise regression is missing any non-linear relationship (based on evidence from tail correlations).
                            I have no idea whatsoever what you are talking abut here. What is a security match? What is a security? What factors? What do you mean by best and worst returns happening at the same time? In fact, what do you mean by returns? You are presuming a knowledge of what the variables in your data set mean, but you have not shared that knowledge, and the variable names are not suggestive, at least not to somebody who isn't in your field.

                            Comment


                            • #15
                              Hi Clyde,

                              Thank you. I will test the code on my data.

                              Really sorry for the confusing note. Let me try again. In this case, ex_ret is the security return and risk factors are lnrvix, dax, cac, r3 etc. Now for every 24 month window, I want identify the two best and worst returns of ex_ret and then check if the risk factors also experience the best and worst returns on those exact months, i.e. if they coincide. This will give us idea that ex_ret and risk factors have a nonlinear relationship that is not captured by linear stepwise regression. Hope this clears things a little bit.

                              Best,
                              John.

                              Comment

                              Working...
                              X