Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating current quarter forecasts using only past data when the number of variables is very large

    Hi

    I have several variables at each week of the quarter (12 weeks per quarter in a stylized calendar).
    I want to run a lasso or ridge regression (elasticnet) to produce forecasts of quarterly GDP at each given week (WK) using ALL past information for a particular week.
    I have done the following to do this:

    Code:
    gen forecast=.
    
    levelsof WK, local(levels)
    
    foreach x of local levels {
    forvalues j=170(1)241 {    
    quietly lasso linear GDP var1-var500 if inrange(quarter_date,85,`j')  & WK==`x'
    estimates store lasso  
    predict tem_`j', postselection
    replace forecast=tem_`j'  if WK==`x'
    drop tem_`j'
    }
    }
    The problem is that in practice the dependent variable GDP is not available during the quarter while all other independent variables are available. Therefore, ideally, the regression should be estimated using all lagged variables (something like: quietly lasso linear lag_rgdp_first lag_var1-lag_var500 if inrange(quarter_date, 85, `j') ) and then the coefficients should be estimated. Once the coefficients are estimated, I can then produce more realistic forecasts that are sort of out-of-sample as follows:

    Code:
    gen forecast = b_cons + b_lagvar1 * var1 + b_lagvar2 * var2 + b_lagvar3 * var3 + ......  if WK==`x'
    In particular, I am not interested in the coefficients as the number of variables is very large anyways but I want to estimate realistic forecasts for the current quarter GDP without using the current quarter GDP variable in the regression. My aim of showing the forecast in the previous code is for you to see how they should be estimated in line with my objective.


    Can anyone help with coding this?

    I look forward to hearing from you

  • #2
    Hi
    It turns out that lasso saves " e(b_postselection) postselection coefficient vector". In this case, how can I estimate the forecast variable using an abbreviation for each coefficient (b_selection) and variable consistent with :

    Code:
     
     gen forecast = b_cons + b_lagvar1 * var1 + b_lagvar2 * var2 + b_lagvar3 * var3 + ......  if WK==`x'

    Comment


    • #3
      Well, you can apply that coefficient vector to the lagged variables using the -matrix score- command.

      But I don't understand the logic here. If your conceptual model is that the outcome is a function of the lagged predictors, why do you not just use the lagged predictors in the lasso in the first place and then use -predict-. Not only would it be simpler, it would seem to better specify the model you have in mind. What am I missing here?

      Added: Unrelated to your question, I don't understand the logic of your loops in #1. You have nested loops, j inside of x. So the inner loop is executed 72 times with different values of j. But you have only a single forecast variable that is overwritten every time through the loop. So when you exit the inner loop, everything before j = 241 has been discarded. Particularly given how computationally intensive the loop is, why are you 72 different things just to get the last result?
      Last edited by Clyde Schechter; 01 Oct 2020, 21:13.

      Comment


      • #4
        Thank you, professor @ClydeSchechter. I clarify below.
        But I don't understand the logic here. If your conceptual model is that the outcome is a function of the lagged predictors, why do you not just use the lagged predictors in the lasso in the first place and then use -predict-. Not only would it be simpler, it would seem to better specify the model you have in mind. What am I missing here?
        The outcome variable is quarterly GDP which I want to generate forecasts for using var1-var500 without any look-ahead bias. The var1-var500 are different series released at the end of every week of the quarter ( such as interest rates, stock returns, etc.)

        As a business cycle analyst, if I stand now at the end of week 4 of 2020 q3 for example, and want to forecast GDP of 2020 q3, I have only GDP data up till GDP of 2020 q2 but I have var1-var500 data for all week 4 of all past quarters UP TILL week 4 of 2020 q3 (i.e. including current quarter). the Var1-var500 are timely data. Therefore, when forecasting GDP of 2020 q3 in this case, I want to estimate the slope coefficients from a contemporaneous regression of GDP on var1-var500 but using all data up till 2020 q2. Then, I get those slope coefficients and multiply them by data for var1-var500 in week 4 of 2020 q3 to generate a forecast for GDP 2020 q3.

        My example here used week 4 but I want to do this for each week 1 to 12.

        My previous code can be incorrect and this could be the reason for confusion, I apologize for that, but I hope that my objective is clear now. The data is time-series and I am happy to send this if this will help.

        Comment


        • #5
          Sorry, but I'm afraid I still don't understand. This is out of my field and it doesn't sound like anything I have any experience with. There are many economists on the forum, and it is likely that one of them will recognize what you are trying to do here and be able to quickly jump in and help you to the finish line. I hope one of them will.

          Comment


          • #6
            Perhaps this can help give some direction. I think it makes sense to create a dummy where the forecast begins and then use replace. Note that the postselection option gives you unpenalized coefficients. Below is for a single unit, but you can generalize to multiple.

            Code:
            use "https://fmwww.bc.edu/RePEc/bocode/x/xthst_sample_dataset.dta", clear
            
            keep if id == 1 
            replace log_rgdpo = .  if year > 2000 // create missing 
            
            gen fcast_dummy = (year > 2000)
            
            forvalues yr = 2000/`=year[_N]-1' {
                lasso linear log_rgdpo  log_ngd if inrange(year, 1960,`yr')
                predict yhat_`yr', postselection
                replace log_rgdpo = yhat_`yr' if year == `=`yr'+1'
            
            }

            Comment


            • #7
              Thanks a lot, Professor Clyde Schechter for trying to understand my problem. I will simplify it for you so hopefully, I get more assistance for you as well as others. Perhaps my following response to Justin Blasongame will help you identify my issue.

              If I understand it correctly, the code in #6 estimates a lasso using data from year 1960 up till 2000 in the first window to make a forecast of log_rgdpo. After that, the one year ahead log_rgdpo is replaced with its forecasted value. Then, the second window will run from 1960 to 2001 and so on. If so, unfortunately, this was not my objective.

              To clarify my problem, I will refer to the same data that Justin Blasongame has used. In this data, suppose that we are in year 2000 now and we want to forecast log_rgdpo. The problem is that log_rgdpo data for year 2000 is not available in real time, but all independent variables (in this case we have only log_ngd but I have 500 variables in my case) have available data. To make a forecast, a business analyst will need to run a lasso regression of log_rgdpo on log_ngd using all observations from 1960 to 1999 and estimate the slope coefficients from the regression. Next, given that data for the explanatory variables (here is only log_ngd) are readily available for year 2000 in real tume, the business analyst can then multiply the estimated slope coefficients by the actual observation for the explanatory variables (log_ngdp) in 2000, to make a forecast for log_rgdpo in 2000.

              Next year, in 2001, the analyst will now have the "actual" data for log_rgdpo for year 2000, but not the actual log_rgdpo for year 2001. Fortunately, as before, the explanatory variables are timely so we will have the data for log_ngdp of year 2001; the analyst then estimates a lasso regression of log_rgdpo on log_ngd using all observations from 1960 to 2000 and re-estimates the slope coefficients. Next, given that data for the explanatory variables (here is only log_ngd) are available for year 2001, he can multiply the re-estimated slope coefficients by the actual observation for log_ngdp in 2001, to make a forecast for log_rgdpo in 2001.; and so on.

              In line with this, I tried to amend the code provided by Justin as following:


              Code:
              use "https://fmwww.bc.edu/RePEc/bocode/x/xthst_sample_dataset.dta", clear
              
              keep if id == 1
              
              gen prediction=.
              
              forvalues yr = 2000/`=year[_N]-1' {
                  lasso linear log_rgdpo  log_ngd if inrange(year, 1960,`yr')
                  predict yhat, postselection
                  replace prediction = yhat if year == `=`yr'+1'
                drop yhat
              }
              Now I think the forecast for year 2001 for example is based on the estimated slope coefficient on log_ngd using data from 1960-2000 multiplied by the actual value of log_ngd in 2000. This is my problem as I want the forecast of year 2001 to be based on the estimated slope coefficient on log_ngd using data from 1960-2000 BUT multiplied by the actual value of log_ngd in 2001 and not 2000!


              I really hope to get more help

              Comment


              • #8
                I think I understand what you want now. If you are using version 16, you can use the code below. If you are not, then I suggest you upgrade to version 16 because doing this without frames is an awful mess.

                Code:
                clear*
                use "https://fmwww.bc.edu/RePEc/bocode/x/xthst_sample_dataset.dta", clear
                
                keep if id == 1
                
                frame put log_rgdpo log_ngd year, into(forecasts)
                frame forecasts {
                    replace year = year - 1
                    gen prediction = .
                }
                
                forvalues yr = 2000/`=year[_N]-1' {
                    lasso linear log_rgdpo  log_ngd if inrange(year, 1960,`yr')
                    matrix b = e(b_postselection)
                    frame forecasts {
                       matrix score yhat = b
                       replace prediction = yhat if year == `yr'
                       drop yhat
                    }
                }
                
                frame forecasts: replace year = year + 1
                frlink 1:1 year, frame(forecasts)
                frget prediction, from(forecasts)
                The variable prediction will have what you want.

                Added: I am not enough of an expert in -lasso- to say this confidently, but doing a rolling lasso in each year, so that each year's forecast is potentially based on different predictors, strikes me as questionable. You might want to consult with somebody who really understands -lasso- to find out if this is a legitimate use of the process or not.
                Last edited by Clyde Schechter; 03 Oct 2020, 11:44.

                Comment


                • #9
                  Dear Professor Clyde Schechter:

                  Thank you so much for your continuing support; really thnk you.

                  I took some time to respond as I had to check the frame commands (they are new to me). I have Stata 16.
                  I think your code is doing what I wanted. I executed it first on the data above and then moved to my actual data.

                  You can think about my actual quarterly data as if they are several datasets together at each week of the quarter. For example, I have data for the same quarterly rgdp_first (my dependent variable) linked with my explanatory variables var1-var30 (for example) at week 1, week 2, week 3... up till week12.
                  Therefore, given that I want to repeat my exercise for data at each week, I had to make some changes to your code so that the loop repeats the estimation process at each week of the quarter.
                  quarter_date is the quarter_date and myWEEK is the week (ranges from 1 to 12 per each quarter).

                  I made the following changes and I hope they are correct:

                  Code:
                  sum quarter_date if myWEEK==1  // this shows me a range of 88 to 241 which are 1981q2 up til 2020 q2
                  
                  frame put rgdp_first var1-var30 quarter_date myWEEK, into(forecasts) 
                  frame forecasts {
                      replace quarter_date = quarter_date - 1
                      gen prediction = .
                  }
                  
                  forvalues QTR = 188/`=quarter_date[_N]-1' {
                  foreach w of numlist 1/12 {
                     lasso linear rgdp_first  var1-var30 if inrange(quarter_date, 88,`QTR') & myWEEK==`w'
                      matrix b = e(b_postselection)
                      frame forecasts {
                         matrix score yhat = b
                         replace prediction = yhat if quarter_date == `QTR' & myWEEK==`w'
                         drop yhat
                      }
                  }
                  }
                  
                  frame forecasts: replace quarter_date = quarter_date + 1
                  frlink 1:1 quarter_date myWEEK, frame(forecasts)
                  frget prediction, from(forecasts)

                  Here is also an example of the actual data used

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float(quarter_date myWEEK) double(var1 var2 var3 var4)
                  88 1  .10359720885753632 .8679591417312622 0                  0
                  89 1  .07843606919050217 .7152390778064728 0  .1811458319425583
                  90 1  .05430242419242859 .6410727500915527 0  .2230774462223053
                  91 1  .12981104850769043 .6597848534584045 0 .23717594146728516
                  92 1   .1568187177181244  .752483606338501 0 .14484082162380219
                  93 1  .19480887055397034 .6877254247665405 0 .15477584302425385
                  94 1 -.22961216419935226 .6739154607057571 0 .27210623025894165
                  95 1  .20222391188144684 .6799057126045227 0 .21608100831508636
                  96 1  .16426868736743927  .607530027627945 0  .2920711189508438
                  97 1  .15438511967658997 .6481877565383911 0  .2926305830478668
                  end
                  format %tq quarter_date
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float(quarter_date myWEEK) double(var1 var2 var3 var4)
                   88 2     .1684819906949997  .6942142844200134                   0 .07923194766044617
                   89 2    .08434794098138809  .6936862468719482                   0 .14471320807933807
                   90 2    .04069159924983978  .6852232813835144                   0 .11732598394155502
                   91 2    .09218613058328629  .6661829948425293                   0 .06617940217256546
                   92 2  -.021724693477153778  .7285594642162323                   0 .13520880788564682
                   93 2    .05902031622827053  .6897070109844208                   0 .14763759821653366
                   94 2    .02629457786679268  .6810156106948853                   0 .18668651580810547
                   95 2    .09080712497234344  .6917273700237274                   0 .10976798459887505
                   96 2    .16416621208190918   .580367922782898                   0 .19577175378799438
                   97 2    .16985678672790527   .687131404876709                   0 .19622404128313065
                   98 2     .1510545313358307  .6526841521263123                   0 .16868863999843597
                   99 2    .23583514988422394  .6309177279472351                   0 .25267285108566284
                  100 2    .14274976402521133  .6903561651706696                   0 .17652829736471176
                  
                  end
                  format %tq quarter_date

                  1- Is my adjustment correct?
                  2- To confirm, does the code (e.g. the
                  matrix score) here generate the linear forecast using the postselection coefficients (which will be only for a selected subset of variables) multiplied by the (the same selected subset of variables)? I just wanted to be sure that it does not apply these coefficients to all variables or anything like like that..


                  I am not enough of an expert in -lasso- to say this confidently, but doing a rolling lasso in each year, so that each year's forecast is potentially based on different predictors, strikes me as questionable. You might want to consult with somebody who really understands -lasso- to find out if this is a legitimate use of the process or not.
                  Regarding this comment; would it more legitimate to use standardized (penalized) instead of postselection?! I hope to hear from some economists here as well.

                  I thought that postselection or penalized would be fine given that my objective is to use the stream of real-time data each week to make the forecasts; the lasso should pick up the most important variables to make the prediction in case of the former. These variables can come from different industry data too and hence giving lasso the freedom to determine which matters the most during the time period seems logical ?! doesn't it?!
                  For example, during the COVID19 period say in 2020q3, the model will select the most important variables in the nearest period to 2020q3 including data from 2020q2 and 2020q1. That is the way I think about it.

                  I look forward to hearing back. Hope the adjustment makes sense too?!

                  Best regards
                  Lisa


                  Comment


                  • #10
                    1- Is my adjustment correct?
                    Well, here's what your adjustment does. It treats your data set as if it were 12 different data sets that were appended together, one for each value of week from 1 to 12. As best I understand your explanation, that's what you want to do;. The estimates for, for example, week 7 in 2020q1 are based on all the data from week 7 of preceding quarters. Data from any other week are not considered in fitting the model or calculating the estimates. Then with those estimates in hand, a forecast is made for week 7 in 2020q2. It escapes me why this is what you want, but if this is what you want, you have it. It should not worry you that it escapes me why this is what you want: the subject matter here is out of my area of knowledge. I can only say that it would be strange, at best, to do something like this with epidemiologic data.

                    2- To confirm, does the code (e.g. the matrix score) here generate the linear forecast using the postselection coefficients (which will be only for a selected subset of variables) multiplied by the (the same selected subset of variables)? I just wanted to be sure that it does not apply these coefficients to all variables or anything like like that..

                    Yes.

                    Regarding this comment; would it more legitimate to use standardized (penalized) instead of postselection?! I hope to hear from some economists here as well.
                    This question goes beyond my secure grasp of -lasso-. FWIW, the Stata manuals say that for linear models, this use of the post-selection coefficients is better than the use of the penalized coefficients, but not so for non-linear models. Since you are using a linear model this would seem to be OK. But again, I don't know the theory of lasso well enough to explain why or endorse this beyond appealing to the authority of the manuals.

                    I thought that postselection or penalized would be fine given that my objective is to use the stream of real-time data each week to make the forecasts; the lasso should pick up the most important variables to make the prediction in case of the former. These variables can come from different industry data too and hence giving lasso the freedom to determine which matters the most during the time period seems logical ?! doesn't it?!
                    I don't know if it does or not. The phrase "most important" is doing a lot of work here. What -lasso- does is select a set of predictors that perform well in terms of cross-validation, which, in turn, tends to imply that they will do well in out-of-sample predictions assuming that there are no salient differences in the way the out-of-sample sample arises. Where I get queasy about this is that you are iterating this process over a tower of increasing subsets of the data. As you do this, the model selection will change, partly due to the greater amount of information available, but also partly due to fluctuations in the noise. My intuition is that a better process would be simply to use the model derived from the most complete data set, i.e. from the last iteration. In your most recent revision, you have also added the stratification of the data into 12 disjoint subsets indicated by the week variable. Each week will, in turn, have its own predictive model at each iteration. I don't understand the reason for separating the 12 weeks. Is it to capture some kind of high-frequency temporal variation in your data? But if so, why is week 3 of this quarter considered to be equivalent to week 3 of the next or preceding quarter? My guess is that you have an answer to that question, but the net result is that we have even more models being generated, and my sense is that doing this is probably capitalizing on noise--which is precisely what LASSO is trying to avoid doing. Again, I don't have a deep understanding of this. I'm just offering up my intuitions. I would love to hear from somebody who really understands this well.

                    Comment


                    • #11
                      Dear Professor Clyde Schechter

                      Thank you so much for the continuing support and the very insightful comments you are giving here. I have learned a lot from this post; so thank you.

                      I will try to respond to your comments and raise three related questions:

                      Well, here's what your adjustment does. It treats your data set as if it were 12 different data sets that were appended together, one for each value of week from 1 to 12. As best I understand your explanation, that's what you want to do;. The estimates for, for example, week 7 in 2020q1 are based on all the data from week 7 of preceding quarters. Data from any other week are not considered in fitting the model or calculating the estimates. Then with those estimates in hand, a forecast is made for week 7 in 2020q2. It escapes me why this is what you want, but if this is what you want, you have it. It should not worry you that it escapes me why this is what you want: the subject matter here is out of my area of knowledge. I can only say that it would be strange, at best, to do something like this with epidemiologic data.
                      Thanks for confirming this. I explain the rationale behind it as well.
                      Each week's data represent an accumulation of all information released from the start of the quarter up till this week. For example, a variable observation in week 1 represents the sum of losses generated by the energy sector in week 1; the observation for the same variable in week 2 represents the sum of losses generated by the energy sector from week 1 to week 2, and it will be from week 1 to 7 in week 7; and so on. Hence, each week's observation is in fact an accumulation of the quarter's data up till that particular week. Therefore, when I run these regressions, I ask which point of time during the quarter, the variables (e.g. losses) are better predictors for the outcome dependent variable. I then estimate mean square forecast errors using the forecasts generated each week to know when uncertainty around the forecast is decreased the most during the quarter. Hope this makes sense?

                      Related questions:
                      1- Now, suppose that the data observations are not accumulated in the way I explained it above (i.e. each week data now represent ONLY a given week), how would I adjust my code to include all past observations up till a given week in the quarter? I could figure this out and would appreciate your help.



                      Where I get queasy about this is that you are iterating this process over a tower of increasing subsets of the data.
                      I agree with you but I was not sure how to change this to a fixed rolling window at least to check the robustness of the results.

                      Related questions:
                      2- I would love to have a fixed rolling window too; say of 100 observations; where I add a new observation each time and drop the oldest observation (e.g. first window is from 1982 q1 up till 2007q1; second window is 1982 q2 to 2007 q2; and so on). How can I adjust my code in #9 to do so (so instead of increasing rolling window; it will be fixed rolling window)?



                      For completeness, I think I can repeat my analysis using an original sample to fit the model and subsequently make forecasts for the rest of the periods without updating the slope coefficients while utilizing the flow of new data in the out of sample period. For example, the model can be fit using a sample from 1982 q1 up till 2007 q1; and then the slope coefficients can then be multiplied with the new observations for var1-var30 in subsequent periods to make forecasts starting from 2007 q2 (i.e. out of sample too). I think in this way the code could be much simpler as following:

                      Code:
                       sum quarter_date if myWEEK==1  // this shows me a range of 88 to 241 which are 1981q2 up til 2020 q2  *Creating to two subsamples where the model is fit on SUBsample 1 and the forecasts are generated for SUBsample 2 gen SUBsample=.
                      replace SUBsample=1 if quarter_date<188  // a sample to fit the model
                      replace SUBsample=2  if SUBsample==.   // a sample to evaluate the model (out of sample)
                      
                      gen prediction=.
                      
                      levelsof myWEEK, local(levels)
                      
                      foreach x of local levels {
                        
                      quietly lasso linear rgdp_first var1-var30 if SUBsample == 1 & myWEEK==`x'  
                      predict temp if SUBsample==2 & myWEEK==`x', postselection
                      replace myprediction=temp  if myWEEK==`x'
                      drop temp
                      }
                      Related question
                      3- Does this code above perform its aim as explained?


                      I look forward to hearing from you so much; thanks!

                      Lisa

                      Comment


                      • #12
                        1- Now, suppose that the data observations are not accumulated in the way I explained it above (i.e. each week data now represent ONLY a given week), how would I adjust my code to include all past observations up till a given week in the quarter? I could figure this out and would appreciate your help

                        You would create a new sequential time variable: rather than week starting over at 1 in each new quarter, it would just keep counting up. Then you would do a single loop over that sequential week variable. And you would remove the -if myweek == `w'- parts of your syntax.

                        [quote]2- I would love to have a fixed rolling window too; say of 100 observations; where I add a new observation each time and drop the oldest observation (e.g. first window is from 1982 q1 up till 2007q1; second window is 1982 q2 to 2007 q2; and so on). How can I adjust my code in #9 to do so (so instead of increasing rolling window; it will be fixed rolling window)?[quote]
                        With the same sequential week variable (let's call it seq_wk) I referred to in response to question 1, change the -inrange()- expression to the -inrange(seq_wk, `j', `j'-99)-, where `j' is the loop index.

                        3- Does this code above perform its aim as explained?
                        Off hand, it looks like it does. But really you'd have to try it and see if there are any unexpected problems.

                        Comment


                        • #13
                          Dear Professor Clyde Schechter

                          Thank you so much. My apologies for this long post now but I am trying to close the remaining points.

                          You would create a new sequential time variable: rather than week starting over at 1 in each new quarter, it would just keep counting up. Then you would do a single loop over that sequential week variable. And you would remove the -if myweek == `w'- parts of your syntax.
                          Thanks for your reply. I have followed your suggestion and adjusted the code below. I believe that it should now run using all previous observations in an increasing window rolling routine.

                          Code:
                          sort quarter_date myWEEK // this sort is important so I can get the new sequential week variable correctly
                          gen seq_wk= _n
                          
                          sum seq_wk   // this shows from 1 to 1846
                          
                          frame put rgdp_first var1-var30 quarter_date seq_wk, into(forecasts)
                          frame forecasts {
                              replace seq_wk = seq_wk - 1
                              gen prediction = .
                          }
                          
                          forvalues SEQ = 100/`=seq_wk[_N]-1' {
                             lasso linear rgdp_first  var1-var30 if inrange(seq_wk, 1,`SEQ')
                              matrix b = e(b_postselection)
                              frame forecasts {
                                 matrix score yhat = b
                                 replace prediction = yhat if seq_wk == `SEQ'
                                 drop yhat
                              }
                          }
                          
                          
                          frame forecasts: replace seq_wk = seq_wk + 1
                          frlink 1:1 seq_wk , frame(forecasts)
                          frget prediction, from(forecasts)

                          The code runs and produces forecasts starting from seq_wk 101.
                          I hope that this is exactly as you suggested?



                          With the same sequential week variable (let's call it seq_wk) I referred to in response to question 1, change the -inrange()- expression to the -inrange(seq_wk, `j', `j'-99)-, where `j' is the loop index.
                          Thanks a lot. This was your advice for the fixed window of 100 observations in a rolling routine which is based on the new seq_wk variable. I followed that but the code worked only after a small change to the -if inrange- which I hope is correct (below):


                          Code:
                          sort  quarter_date myWEEK
                          
                          gen seq_wk= _n
                          
                          sum seq_wk   // this shows me from 1 to 1846
                          
                          frame put rgdp_first var1-var30 quarter_date seq_wk, into(forecasts)
                          frame forecasts {
                              replace seq_wk = seq_wk - 1
                              gen prediction = .
                          }
                          
                          forvalues SEQ = 100/`=seq_wk[_N]-1' {
                             lasso linear rgdp_first  var1-var30 if inrange(seq_wk,`SEQ'-99,`SEQ')
                             matrix b = e(b_postselection)
                              frame forecasts {
                                 matrix score yhat = b
                                 replace prediction = yhat if seq_wk == `SEQ'
                                 drop yhat
                              }
                          }
                          
                          
                          frame forecasts: replace seq_wk = seq_wk + 1
                          frlink 1:1 seq_wk , frame(forecasts)
                          frget prediction, from(forecasts)

                          The code also works and produces results.
                          I hope my adjustment to the inrange in line with the forvalues I have is correct (I have inrange(seq_wk,`SEQ'-99,`SEQ') instead of inrange(seq_wk,`SEQ',`SEQ'-99) and hope I am correct too?!


                          For completeness, I have also made an adjustment to the original code in #9 to have the same fixed window rolling routine using observations based ONLY on myWEEK series data rather than seq-wk as follows:

                          Code:
                          sum quarter_date if myWEEK==1  // this shows me 88 to 241
                          
                          frame put rgdp_first var1-var30 quarter_date myWEEK, into(forecasts)
                          frame forecasts {
                              replace quarter_date = quarter_date - 1
                              gen prediction = .
                          }
                          
                          forvalues QTR = 187/`=quarter_date[_N]-1' {
                          foreach w of numlist 1/12 {
                             lasso linear rgdp_first  var1-var30 if inrange(quarter_date, `QTR'-99,`QTR') & myWEEK==`w'
                              matrix b = e(b_postselection)
                              frame forecasts {
                                 matrix score yhat = b
                                 replace prediction = yhat if quarter_date == `QTR' & myWEEK==`w'
                                 drop yhat
                              }
                          }
                          }
                          
                          frame forecasts: replace quarter_date = quarter_date + 1
                          frlink 1:1 quarter_date myWEEK, frame(forecasts)
                          frget prediction, from(forecasts)
                          I hope this is right?!

                          Off hand, it looks like it does. But really you'd have to try it and see if there are any unexpected problems.
                          As for this one, I can confirm that the code runs smoothly and produces "myprediction" only for the SUBsample no.2 in each myWEEK series. If it looks consistent with its purpose to you (as described for the out of sample test), then I am happy to go with that.


                          Added:
                          I noted that a change to this code would produce different results for some myWEEK although I thought it would be equivalent. I do not know why?!

                          Code:
                          gen SUBsample=.
                          replace SUBsample=1 if quarter_date<188  
                          replace SUBsample=2  if sample==.   
                          
                          gen myprediction=.
                          
                          levelsof myWEEK, local(levels)
                          
                          foreach x of local levels {
                            
                          quietly lasso linear rgdp_first var1-var30 if SUBsample == 1 & myWEEK==`x' 
                          estimates store lasso  
                          predict temp if myWEEK==`x', postselection 
                          replace prediction=temp  if myWEEK==`x' 
                          drop temp
                          }
                          replace myprediction=. if SUBsample==1 // to keep only the out-of-sample predictions when SUBsample==2

                          I also attach my data (with 30 variables; but I have about 500) in case you want to confirm any of the points discussed.


                          Thank you so much and I look forward to confirming those last points.

                          Lisa

                          Attached Files
                          Last edited by Lisa Wilson; 06 Oct 2020, 08:27.

                          Comment


                          • #14
                            The codes you are inquiring about appear to be what I had in mind. Your change to -inrange()- is correct, I had mistakenly switched the order of the second and third arguments. Thank you for fixing that.

                            As for the last code producing different results, I don't know why either. Perhaps if I ran them with example data I could figure it out, but I do not download files from people I don't know. The only data I run from Statalist is that posted with -dataex-, or which I create myself. In any case, you seem to have working code, so stick with it.

                            Added: In the final code block in #13, there is an error:
                            Code:
                            replace SUBsample=2 if sample==.  
                            // SHOULD BE
                            replace SUBsample = 2 if SUBsample == .
                            If there is a variable named sample in your data, then the incorrect version will run but will not appropriately designate SUBsample. If there is no variable named sample, then the code would produce a syntax error, rather than different results.

                            Added: Looking further at your code, I see a few more things such as haphazardly going back and forth between prediction and myprediction as variable names. But once all of those things are fixed, I notice that you did not set the random number seed before the loop in either program. -lasso- is not a deterministic program: it draws random subsets. So the results will not be the same even just running the same code twice. Try setting the random number seed before the loop in both versions of the code (and set it to the same number in both versions!) and I think you will get the same results from both.
                            Last edited by Clyde Schechter; 06 Oct 2020, 11:20.

                            Comment


                            • #15
                              Thanks a lot. This is great. I am glad that all adjustments are appropriate.

                              I am sorry for uploading the data in the previous post. I will then use dataex below as advised.


                              As for the last point:
                              As for the last code producing different results, I don't know why either. Perhaps if I ran them with example data I could figure it out, but I do not download files from people I don't know. The only data I run from Statalist is that posted with -dataex-, or which I create myself.
                              SUBsample was a typo here, sorry. I still get different results when I use the code in post #11 or post #14 for the fixed window, no rolling (SUBsample 1 and 2 codes). I still do not know why; but as they are different; I feel that the one in post #11 is more appropriate because the model is fitted in a sample and in the same loop the predictions are produced for the other sample. If you have any suggestions after using dataex, please let me know.

                              ​​​​​​
                              Code:
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input double rgdp_first float(quarter_date myWEEK) double(var1 var2 var3 var4 var5 var6 var7 var8 var9 var10)
                              -3.893  88 1   .10359720885753632  .8679591417312622                    0                  0 .015200352296233177  .02626059018075466  .0004999529919587076  .45699819922447205                   0   .0004384829953778535
                               1.669  89 1   .07843606919050217  .7152390778064728                    0  .1811458319425583  .03077547997236252  .02904677204787731   .007437900174409151  .43739956617355347                   0  .00008633000106783584
                                 .76  90 1   .05430242419242859  .6410727500915527                    0  .2230774462223053  .02197185903787613 .029340852051973343  .0005919199902564287  .37148916721343994                   0                      0
                              -2.515  91 1   .12981104850769043  .6597848534584045                    0 .23717594146728516 .020914312452077866  .03222334757447243   .009770149365067482  .39284753799438477                   0                      0
                               3.095  92 1    .1568187177181244   .752483606338501                    0 .14484082162380219  .02387678250670433  .02717658132314682  .0026124389842152596   .4399906396865845                   0                      0
                               8.671  93 1   .19480887055397034  .6877254247665405                    0 .15477584302425385 .017393289133906364                   0                     0   .5208333134651184                   0                      0
                               7.909  94 1  -.22961216419935226  .6739154607057571                    0 .27210623025894165  .03557705506682396  .02814125455915928   .026281065307557583   .4408544450998306                   0  .00036027550231665373
                               4.477  95 1   .20222391188144684  .6799057126045227                    0 .21608100831508636  .01953848823904991  .02884775772690773                     0  .28941744193434715                   0                      0
                               8.338  96 1   .16426868736743927   .607530027627945                    0  .2920711189508438 .041305070742964745 .029682494699954987  .0008042220142669976   .4765091836452484                   0                      0
                               7.476  97 1   .15438511967658997  .6481877565383911                    0  .2926305830478668  .05020279251039028 .029833678156137466   .011925265192985535  .33102890849113464                   0                      0
                               2.662  98 1    .5329873561859131  .5102922320365906                    0 .30713802576065063 .050115130841732025 .027307720854878426   .007056082133203745   .4626304805278778                   0                      0
                               3.922  99 1   .36372411251068115    .56190025806427                    0 .24327513575553894 .051354601979255676  .03234944865107536   .005427088122814894   .3947512209415436                   0                      0
                               1.354 100 1    .2837163358926773   .749171257019043                    0 .18352091312408447  .02868946921080351 .031125779263675213    .02565670758485794  .46175071597099304                   0                      0
                               1.743 101 1   .10795796103775501  .5764660388231277                    0 .22072414308786392 .036747029051184654 .030386864207684994 -.0011646965285763144   .1279873624444008                   0                      0
                               2.353 103 1   .07519269734621048    .59975266456604                    0  .2312362790107727 .032526202499866486  .02421199344098568   .006996465846896172   .4107060432434082                   0                      0
                               3.201 104 1   .04535355046391487    .68744957447052                    0   .259430855512619  .02300809510052204  .02493523247539997  .0004463520017452538  .32220199704170227                   0                      0
                               1.077 105 1 -.061102996580302715     .7769795358181                    0 .06499325484037399  .02364042028784752 .020842844620347023   .002330940536921844  .46841904520988464                   0  .00007400850154226646
                               2.414 106 1  -.03943045064806938  .7835010290145874                    0 .09402298927307129 .021459832787513733 .019999999552965164   .008551880717277527   .1434820592403412                   0                      0
                               1.747 107 1    .1510646566748619  .7343248724937439                    0 .16322628036141396 .030402344651520252 .019729825668036938    .01099458895623684   .3353141322731972                   0                      0
                               4.299 108 1   .12457912415266037  .8115817308425903                    0 .10704068839550018 .037228167057037354 .021879177540540695  .0029602719005197287  .44980013370513916 .029361942782998085  .00002145120015484281
                               2.591 109 1   .09771569073200226  .6026725769042969                    0 .11925965547561646 .029607847332954407 .014843522571027279  .0015474259853363037  .36684149503707886                   0                      0
                               3.838 110 1  .027549786493182182  .8069815039634705                    0 .14030960202217102  .04798216372728348 .023463096469640732    .00684462720528245   .3095238208770752                   0                      0
                               4.151 111 1   .23088725749403238  .7811554968357086                    0 .11151014268398285 .026515944860875607 .021324838511645794    .00821618246845901  .19610353745520115  .00897750910371542                      0
                               2.266 112 1   .06547258608043194  .5904433727264404                    0 .09379125386476517 .045370690524578094 .023462822660803795   .006181740202009678   .3797549307346344                   0                      0
                               3.089 113 1   .04756193794310093  .6583850979804993                    0 .16135266423225403 .029804490506649017  .01990995556116104   .004165769554674625    .319701611995697                   0 .000013627400221594144
                               2.237 114 1   .09916634485125542  .8127434849739075                    0 .11130645498633385  .03339780401438475  .01713610114529729                     0   .3792326897382736                   0                      0
                                1.99 115 1   .09765896201133728  .6433976590633392                    0 .20762329548597336  .03392602130770683 .021280944347381592   .005457831081002951   .3582698851823807                   0                      0
                               5.546 116 1   .18991564214229584  .7861429452896118                    0 .10812115669250488 .040688470005989075 .022817308083176613  .0014135210076346993   .3588153123855591                   0                      0
                               1.676 117 1   .07597751915454865   .746464192867279                    0 .19077809154987335  .04135265573859215   .0227687768638134   .007548762019723654  .33692511916160584                   0                      0
                               2.501 118 1   .07555382326245308  .7181055545806885                    0  .1359294205904007  .03892428241670132 .023404747247695923    .01081382017582655   .3938099145889282                   0                      0
                                .501 119 1   .13181869685649872  .6889693140983582                    0 .22372323274612427  .03696989640593529  .02261929027736187  .0040750689804553986  .38753455877304077                   0                      0
                               2.096 120 1   .17234022915363312  .6527844965457916                    0 .17843560129404068  .04557858593761921 .018132335506379604   .006231090286746621   .3780239373445511                   0                      0
                                1.22 121 1   .04685705155134201  .7141219973564148                    0 .19813643395900726  .03561438247561455  .02557329460978508   .003849942935630679  .37255364656448364                   0                      0
                               1.793 122 1   .07157503068447113  .8004719018936157                    0 .11924289166927338 .032555606216192245  .02298236172646284   .015198738314211369   .3061268925666809                   0                      0
                              -2.131 123 1   .09237945079803467  .6117232441902161                    0 .27132394909858704  .03766055405139923 .027311744168400764   .007830865681171417  .36174243688583374                   0                      0
                              -2.811 124 1  .017031755298376083  .7257599234580994                    0 .09521055221557617  .03505482152104378  .02389690652489662  .0039708660915493965  .37829598784446716                   0                      0
                                .418 125 1   .09774142131209373  .6640038192272186                    0 .21282590925693512  .04348932206630707  .02742959652096033  .0037702819099649787   .3284205049276352                   0                      0
                               2.371 126 1   .08078432828187943   .650404155254364                    0  .1275697946548462 .024283817037940025  .02090592309832573   .008681747131049633   .3930188715457916                   0                      0
                                .297 127 1  -.05550992488861084  .6119781732559204                    0  .2942383885383606 .034678492695093155  .02712562307715416  .0016767090419307351  .35813377797603607                   0                      0
                               1.978 128 1 -.005241314647719264   .670651912689209  .006570980418473482 .13539808616042137  .04129217937588692 .020594733767211437   .004885147558525205   .4438548982143402                   0                      0
                               1.386 129 1  .054290205240249634  .6977491974830627                    0  .1882728785276413  .03506552055478096 .021769292652606964   .004523884505033493   .3839285671710968                   0                      0
                                2.65 130 1    .2013719528913498  .6792399883270264                    0 .15078236162662506 .041025642305612564  .01965649612247944   .005945920012891293  .29619893431663513                   0     .01024586707353592
                                3.79 131 1    .0713081881403923  .5789836049079895                    0  .2009541615843773 .030214705504477024 .019215047359466553   .003684619558043778   .3971231281757355                   0                      0
                               1.799 132 1   .06416945718228817  .6183741390705109                    0  .1964579001069069  .04600469768047333 .018809600733220577  .0019895399746019393    .325138121843338                   0                      0
                               1.577 133 1  .038945429027080536  .7245792746543884                    0 .17572880536317825  .04271678626537323  .01522260531783104                     0   .2972881495952606                   0                      0
                               2.844 134 1   .18857061862945557  .6980202794075012                    0  .0938401147723198  .03057851269841194  .01816791296005249   .007823738269507885  .37899544835090637                   0                      0
                                5.87 135 1   .05807130690664053  .5858350396156311 .0048838527873158455  .1853632926940918  .05419006384909153  .01696584839373827  .0052438718266785145  .42641739547252655                   0 1.7150000160149702e-17
                               2.581 136 1   .04747616872191429  .5992066860198975  .009813749231398106 .17631863057613373  .05229495279490948 .016300395131111145   .010065523907542229  .33336153626441956                   0                      0
                               3.708 137 1   .09270800650119781  .6318810880184174                    0 .20851033180952072  .05601607635617256 .019899999722838402   .006511077517643571  .32100629806518555                   0                      0
                               3.438 138 1   .13145361468195915    .71683070063591                    0 .17200733721256256  .04282259754836559 .014112362638115883  .0034016530262306333  .35266028344631195                   0  8.714999902441083e-17
                               4.532 139 1    .1347259134054184   .590053141117096                    0 .15020708739757538  .05144275538623333 .006003000249620527 .00007990399899426848   .3591795563697815                   0                      0
                               2.819 140 1   .16695279628038406  .6224272847175598   .03994588926434517  .1808386892080307  .05343421921133995 .017306622117757797 -.0016587759600952268  .16771931946277618                   0  5.072699877928244e-06
                                .528 141 1    .1167466901242733  .7045220136642456                    0  .1687823235988617 .029945057816803455 .020154567435383797   .003893760498613119  .36439983546733856                   0                      0
                               4.205 142 1    .0457307081669569  .7971104383468628                    0 .20854897797107697  .05427820608019829 .021749397739768028   .004865823080763221  .34969133138656616                   0                      0
                                .919 143 1   .08842536062002182  .5896275639533997                    0  .2767828702926636  .06112676113843918 .015498272143304348  .0037054189015179873   .3561643958091736                   0                      0
                               2.809 144 1   .22033662348985672  .6471281051635742   .01001484110020101 .08896289020776749 .058701541274785995  .02389395423233509    .01192223746329546  .34272250533103943                   0                      0
                               4.221 145 1   .09141965955495834  .7541283667087555                    0  .1802286058664322  .05427282303571701 .022265059873461723                     0 .004701609374023974                   0                      0
                               2.171 146 1   .10858333855867386  .8345368504524231                    0 .11785241216421127 .042804256081581116  .01834988035261631  .0013616180513054132  .39024388790130615                   0                      0
                               4.717 147 1   .05190311372280121  .6193544864654541                    0 .21795567870140076  .04511875659227371 .009463458321988583   .003704488044604659  .38017991185188293                   0                      0
                               5.611 148 1    .2030649557709694  .6361786723136902                    0  .1691368818283081  .02226417511701584 .009100993163883686    .04478945699520409  .27800899744033813                   0                      0
                               2.163 149 1   .10523955151438713  .6902948021888733                    0 .21191756427288055  .04370664432644844 .019435441121459007   .008797653950750828   .2610534429550171                   0                      0
                               3.521 150 1    .0843629464507103  .6005275249481201                    0 .22459150850772858  .04940836504101753  .01657283864915371 .00045741398935206234   .3636783957481384                   0                      0
                               4.298 151 1   .06331848353147507  .6251218318939209                    0  .1790686547756195  .06168796494603157 .007270379923284054  .0038591409102082253  .34956228733062744                   0                      0
                               4.242 152 1   .20314940810203552 .38100093603134155   .02981366403400898  .3165895491838455   .0676107108592987 .017781784757971764   .009506518254056573  .15135504305362701                   0                      0
                               1.417 153 1   .11338229477405548  .5613337159156799                    0  .2888713479042053  .10094289481639862 .015274013858288527  .0002880815009120852   .3949989676475525                   0                      0
                               3.288 154 1  .037567440420389175  .6337849497795105                    0  .2598651796579361 .051914578303694725 .018722547218203545  .0019192739855498075   .3875839412212372                   0                      0
                               5.585 155 1    .1274329274892807  .6402285099029541                    0 .18100039660930634  .05241824686527252 .015363449230790138   .002670379471965134  .33700065314769745                   0                      0
                               4.492 156 1   .07660535722970963  .5933132767677307                    0  .2436394989490509  .03365837410092354 .013363737612962723  .0012445029569789767   .3740310072898865                   0                      0
                               2.288 157 1  .008643722161650658  .6501864790916443                    0 .22463421523571014  .04334966838359833  .01576370559632778   -.00360892410390079    .288896307349205                   0                      0
                               4.824 158 1   .09292653948068619  .6221055686473846                    0 .23488283157348633 .038743527606129646 .015122951939702034 .00008355799946002662   .3600739687681198                   0                      0
                               5.798 159 1   .17202481627464294  .6948989629745483                    0  .1255931258201599  .04505286365747452   .0161066222935915  .0003437369887251407   .3233668804168701                   0                      0
                               5.391 160 1   .17111922800540924  .6154264211654663                    0 .23937074840068817  .03715255856513977 .017867939546704292  .0014178820420056582  .35340143740177155                   0                      0
                                5.19 161 1   .14491452276706696  .6484046578407288                    0 .15475750714540482  .04579901322722435 .018023522570729256   .002959880046546459   .3430524170398712                   0                      0
                               2.745 162 1    .1797257661819458  .6371322274208069                    0 .17580880969762802  .04000920429825783  .01924380473792553   .005364974960684776  .37707722187042236                   0                      0
                               1.373 163 1   .13142557442188263  .6350573897361755                    0 .09678983688354492  .05056442320346832  .01588251255452633   .005947065073996782   .3361032009124756                   0                      0
                               1.982 164 1   .20648633688688278  .6599540114402771                    0 .20093418657779694  .03066177200525999 .010792119428515434  -.008068176917731762   .3121844232082367                   0                      0
                                .735 165 1   .07610215991735458  .6751998960971832                    0  .2197740077972412  .04126586765050888 .018770478665828705 -.0038820900954306126   .2101116180419922                   0                      0
                               -.355 166 1   .05268929526209831  .5510042309761047                    0  .2525259852409363   .0402456559240818                   0 -.0008551289793103933   .3700152337551117                   0                      0
                                .224 167 1   .04285532049834728  .6148322224617004                    0 .16688518226146698  .03275533113628626  .01130365114659071                     0   .3329286277294159                   0                      0
                               5.836 168 1  .011708948761224747  .6826736032962799                    0  .1422729641199112 .032599423080682755 .011975767556577921  .0018716754857450724   .3480127155780792                   0                      0
                               1.059 169 1 -.058509303256869316  .7254939675331116                    0 .13960833102464676 .029217400588095188  .01698488090187311                     0   .2771835923194885                   0                      0
                               3.137 170 1   -.0752779170870781   .554777055978775                    0  .1890181303024292  .03794896975159645 .017825704999268055   .005039374344050884  .23094233125448227                   0                      0
                                .744 171 1  .038378515280783176  .6608560681343079                    0 .16521789133548737 .028357082046568394 .012213696260005236  .0002512572427804116   .2710488513112068                   0                      0
                               1.598 172 1    .1389252096414566  .6931020617485046                    0 .14471305906772614 .027291756123304367 .015287991613149643   .009609276428818703   .3773792088031769                   0                      0
                                2.37 173 1   .06338148564100266  .6853001117706299                    0  .2152746394276619  .02992277964949608 .014390657655894756 .00021982149337418377   .3162073791027069                   0                      0
                               7.155 174 1   .03466222062706947  .6680843234062195                    0 .13508671522140503  .03630698472261429 .016981156542897224  .0038063640240579844    .371636301279068                   0                      0
                               4.024 175 1   .04526584781706333   .632089227437973                    0 .21830374002456665  .02540651522576809 .008440547157078981   .000534473976586014   .3236040472984314                   0                      0
                               4.158 176 1   .15829606354236603   .672880232334137                    0 .23490315675735474 .027685873210430145  .00797378458082676 -.0013081219512969255   .3599728047847748                   0                      0
                               3.044 177 1   .07944869622588158  .5745872259140015                    0  .1914014220237732   .0305781327188015 .014274755492806435 .00009318799857283011   .3400094509124756                   0                      0
                               3.711 178 1    .1769624426960945  .5787602066993713                    0 .16856808215379715   .0224982975050807 .007568564265966415   .005604488076642156  .34370237588882446                   0                      0
                               3.147 179 1     .104831263422966  .6107438802719116                    0 .16311874985694885 .022654840722680092 .014825529418885708  .0011690480168908834  .33127644658088684                   0                      0
                               3.088 180 1   .11861174553632736   .655663013458252                    0 .19600976258516312  .01839705929160118  .01145879179239273  .0024715319741517305  .34995701909065247                   0                      0
                               3.414 181 1   .13076071441173553  .6196867227554321                    0 .20035666227340698  .03405476827174425 .016175637021660805 -.0037962261121720076  .32215291261672974                   0                      0
                               3.805 182 1   .11045490950345993  .7174596786499023                    0 .12707124650478363 .020971737802028656 .013788902200758457   .007835599593818188   .3570947200059891                   0                      0
                               1.119 183 1   .13088613748550415  .7307280600070953                    0  .1105077937245369 .016526482068002224 .016212495043873787  .0017066604341380298   .3473331481218338                   0                      0
                               4.818 184 1   .08072374761104584  .6167611479759216                    0 .24337022006511688  .02023956272751093 .014527623541653156   .002379472483880818   .3688112199306488                   0                      0
                               2.458 185 1   .07293152809143066  .5901447534561157                    0  .2590447962284088  .02775769867002964 .015072397887706757  .0008079260005615652  .35377658903598785                   0                      0
                               1.583 186 1   .02250332571566105  .5718316435813904                    0  .3262297511100769  .02731901966035366 .017500903457403183  .0003497629950288683    .344684362411499                   0                      0
                               3.473 187 1   .07071228325366974   .632343590259552                    0  .1607406586408615 .017663339152932167  .01722455769777298 .00013998399663250893  .34939810633659363                   0                      0
                                1.26 188 1   .13703425228595734  .5847705602645874                    0 .26505714654922485 .020132657140493393 .013030831702053547  .0014295140281319618  .35979485511779785                   0                      0
                              end
                              format %tq quarter_date

                              Comment

                              Working...
                              X