Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Out-of-sample regression loops

    Dear all,

    For a research project I am trying to create a code to do an out-of-sample regression.
    This out-of-sample regression has to be done for 7 different variables, and, therefore, also contains a loop for these 7 variables.
    I want this regression to start using the first 240 observations.

    forvalues i=1(1)7 {
    forvalues t=240(1) 'T-1'{
    reg y x`i', if inrange (time, 1, 't')
    mat tbl=r(table)
    mat b`i'=tbl[1,1]
    mat t`i'=tbl[3,1]
    mat rsq`i'=e(r2)
    }
    }

    I keep getting "syntax invalid", but I'm not sure where the error in the code is. Anyone ideas?

  • #2
    I don't know if this will completely solve your problems, but in the 2nd and 3rd lines of your code, you have a pair of right quotes rather than left quote-right quote.


    Code:
    forvalues t=240(1) 'T-1'{
    reg y x`i', if inrange (time, 1, 't')
    What is T?
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Welcome to Statalist.

      Reposting your code using CODE delimiters as requested by the Statalist FAQ (linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, please review it) makes it apparent that in two cases which I have highlighted in red you have used the incorrect marks to surround your local variables. There are other problems as well. I think the changes suggested might take you farther.

      Code:
      forvalues i=1(1)7 {
      forvalues t=240(1) 'T-1'{
      reg y x`i', if inrange (time, 1, 't')
      mat tbl=r(table)
      mat b`i'=tbl[1,1]
      mat t`i'=tbl[3,1]
      mat rsq`i'=e(r2)
      }
      }
      The affected lines should read
      Code:
      forvalues t=240(1)`=T-1'{
      reg y x`i', if inrange (time, 1, `t')

      Comment


      • #4
        Bruce Weaver
        William Lisowski

        Thanks for your input guys! The command worked after your hints on T. I had to insert the amount of time periods we got here.
        Next up, we have to present our findings (expected returns) by comparing them to the actual returns and showing the R^2.

        This is what I saw one displaying valuable information.

        forvalues i=1(1)7 {
        foreach var in b t rsq {
        local x=`var'`i'[1,1]
        disp "var'[i'] = `x'"

        But again, it is not working.. New hints on this codes or on how to calculate the R^2 are welcome!

        Comment


        • #5
          Do please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

          The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

          Section 12.1 is particularly pertinent

          12.1 What to say about your commands and your problem

          Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
          ...
          Never say just that something "doesn't work" or "didn't work", but explain precisely in what sense you didn't get what you wanted.

          Comment


          • #6
            I'm pretty sure that an out-of-sample regression makes no sense. It looks like you want to predict out-of-sample returns using regression coefficients from a regression that used past returns. Your loop suggests that you want a recursive window and I'll assume that you want to use all observations up to the preceding time period. The most efficient way to do this is to use rangestat (from SSC). Since you want to have regressions based on a minimum of 240 time periods, you can speed things up by specifying an invalid upper bound if the regression would be based on fewer than 240 periods.
            Code:
            * create a demonstration dataset
            clear all
            set seed 123123
            set obs 2
            gen long id = _n
            expand 360
            bysort id: gen time = _n
            gen y = runiform()
            forvalues i = 1/7 {
                gen x`i' = runiform()
            }
            
            * declare panel data
            xtset id time
            
            * define an invalid upper bound if there's fewer than 240 observations
            * otherwise, upper bound is the previous time period
            by id: gen high = cond(_n-1 < 240, -999, time-1)
            
            * perform regressions on recursive window
            foreach v of varlist x* {
              rangestat (reg) y `v', interval(time . high) by(id)
              gen double `v'_expected = b_cons + b_`v' * `v'
              rename reg_r2 `v'_r2
              drop reg_nobs reg_adj_r2 b_* se_*
            }
            You can spot check results for any observation using the following model:
            Code:
            * spot check for x1, id == 2, time == 300
            regress y x1 if id == 2 & inrange(time, 1, 300-1)
            predict p if id == 2 & time == 300
            list id time y x1* p if id == 2 & time == 300
            and the results:
            Code:
            . * spot check for x1, id == 2, time == 300
            . regress y x1 if id == 2 & inrange(time, 1, 300-1)
            
                  Source |       SS           df       MS      Number of obs   =       299
            -------------+----------------------------------   F(1, 297)       =      0.39
                   Model |  .030081969         1  .030081969   Prob > F        =    0.5348
                Residual |  23.1397895       297  .077911749   R-squared       =    0.0013
            -------------+----------------------------------   Adj R-squared   =   -0.0021
                   Total |  23.1698714       298  .077751246   Root MSE        =    .27913
            
            ------------------------------------------------------------------------------
                       y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                      x1 |   .0352716    .056764     0.62   0.535    -.0764391    .1469822
                   _cons |   .4982663   .0314584    15.84   0.000     .4363567    .5601759
            ------------------------------------------------------------------------------
            
            . predict p if id == 2 & time == 300
            (option xb assumed; fitted values)
            (719 missing values generated)
            
            . list id time y x1* p if id == 2 & time == 300
            
                 +--------------------------------------------------------------------+
                 | id   time          y         x1       x1_r2   x1_expe~d          p |
                 |--------------------------------------------------------------------|
            660. |  2    300   .4616739   .6088467   .00129832   .51974132   .5197413 |
                 +--------------------------------------------------------------------+

            Comment

            Working...
            X