Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cvlasso predicting for a non-estimation sample

    I would like to estimate a model for data before 2015, but then predict for data after 2015 (happy to explain why, but there is a good reason). Predict after cvlasso seems incapable of doing that.

    Here's the code:
    cvlasso Y Xs, fe nfolds(10) lopt seed(1234), if year<2016
    predict temp2, lopt ols e, if year > 2016
    (
    also tried: predict temp2 if year > 2016, lopt ols e)

    Each time:
    Warning: if condition ignored. Residuals calculated for estimation sample.

    What am I doing wrong? Thanks!

  • #2
    Both the cvlasso and predict commands apparently need correction. Stata's if clause is not an "option" although it is "optional", it appears before the comma that introduces "options". Note the following output from help cvlasso:
    Code:
       Full syntax
    
            cvlasso depvar regressors [if exp] [in range] [, alpha(numlist) alphacount(int) sqrt adaptive
                  adaloadings(string) adatheta(real) ols lambda(real) lcount(integer) lminratio(real) lmax(real) lopt
                  lse lglmnet notpen(varlist) partial(varlist) psolver(string) ploadings(string) unitloadings prestd fe
                  noftools noconstant tolopt(real) tolzero(real) maxiter(int) nfolds(int) foldvar(varname)
                  savefoldvar(varname) rolling h(int) origin(int) fixedwindow seed(real) plotcv plotopt(string)
                  saveest(string)]
    So try the following.
    Code:
    cvlasso Y Xs if year<2016, fe nfolds(10) lopt seed(1234)
    predict temp2 if year<2016, lopt ols e

    Comment


    • #3
      William Lisowski 's diagnosis is, I think, not quite right here; the actual rule in Stata is that parts of the command occur prior to odd-numbered commas and that options occur after odd-numbered commas; i.e., you are allowed to have more than one comma but need to be careful what you put where; note, however, that there may be some official commands that do not abide by this (there are many community-contributed commands that break this); compare the following:
      Code:
      sysuse auto
      . regress gear price i.foreign if rep78<., vce(robust)
      
      Linear regression                               Number of obs     =         69
                                                      F(2, 66)          =      88.77
                                                      Prob > F          =     0.0000
                                                      R-squared         =     0.6631
                                                      Root MSE          =     .27261
      
      ------------------------------------------------------------------------------
                   |               Robust
        gear_ratio | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
             price |  -.0000584   8.15e-06    -7.17   0.000    -.0000747   -.0000422
                   |
           foreign |
         Domestic  |          0  (base)
          Foreign  |   .7188939   .0744691     9.65   0.000     .5702116    .8675761
                   |
             _cons |    3.13953   .0738505    42.51   0.000     2.992083    3.286977
      ------------------------------------------------------------------------------
      r; t=0.02 21:02:35
      
      . regress gear price i.foreign, vce(robust), if rep78<.
      
      Linear regression                               Number of obs     =         69
                                                      F(2, 66)          =      88.77
                                                      Prob > F          =     0.0000
                                                      R-squared         =     0.6631
                                                      Root MSE          =     .27261
      
      ------------------------------------------------------------------------------
                   |               Robust
        gear_ratio | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
             price |  -.0000584   8.15e-06    -7.17   0.000    -.0000747   -.0000422
                   |
           foreign |
         Domestic  |          0  (base)
          Foreign  |   .7188939   .0744691     9.65   0.000     .5702116    .8675761
                   |
             _cons |    3.13953   .0738505    42.51   0.000     2.992083    3.286977
      ------------------------------------------------------------------------------
      I could continue this by adding another comma and then put another option after than third comma
      Last edited by Rich Goldstein; 09 Aug 2021, 19:09.

      Comment


      • #4
        Yes, unfortunately moving the order around does not help. It seems to be something with the cvlasso command. The identical syntax, but with areg, works fine. It is sort of ironic that cvlasso seems to be incapable of generated fitted values for a non-estimation subsample.

        Comment


        • #5
          What has not been mentioned before in this topic is that cvlasso is a community contributed command, available from both SSC and (as an older version) from the Stata Journal.

          First, be sure you have the latest version of cvlasso installed, which is the version from the SSC package lassopack
          Code:
          . ssc install lassopack
          checking lassopack consistency and verifying not already installed...
          installing into /Users/lisowskiw/Library/Application Support/Stata/ado/plus/...
          installation complete.
          
          . which cvlasso
          /Users/lisowskiw/Library/Application Support/Stata/ado/plus/c/cvlasso.ado
          *! cvlasso 1.0.11 27sept2020
          *! lassopack package 1.4.1
          *! authors aa/ms
          and not the version from the Stata Journal package st0594
          Code:
          . net install st0594.pkg, from(http://www.stata-journal.com/software/sj20-1)
          checking st0594 consistency and verifying not already installed...
          installing into /Users/lisowskiw/Library/Application Support/Stata/ado/plus/...
          installation complete.
          
          . which cvlasso
          /Users/lisowskiw/Library/Application Support/Stata/ado/plus/c/cvlasso.ado
          *! cvlasso 1.0.09 28jun2019
          *! lassopack package 1.3
          *! authors aa/ms
          If indeed you have the 2020 version of cvlasso installed, then the output of
          Code:
          ssc describe lassopack
          includes for support the email addresses of the authors Achim Ahrens, Christian Hansen, and Mark Schaffer.

          Comment


          • #6
            Yes, unfortunately moving the order around does not help. It seems to be something with the cvlasso command. The identical syntax, but with areg, works fine. It is sort of ironic that cvlasso seems to be incapable of generated fitted values for a non-estimation subsample.
            This is, of course, usually possible. Example:

            Code:
            insheet using https://statalasso.github.io/dta/housing.csv,  ///
                clear comma
            cvlasso medv crim-lstat if _n <=400, lopt postres
            predict yhat if _n>400, xb
            The reason is that it doesn't work here are the fixed effects. Out-of-sample values for u, e, ue, xbu (=xb + u) are currently not supported with fe. This something we should fix. Fortunately, you have all you need to calculate it yourself:

            Code:
            insheet using https://statalasso.github.io/dta/housing.csv,  ///
                clear comma
            
            xtset rad
            order rad, last
            
            // estimation
            cvlasso medv crim-lstat if _n <=400, fe lopt postres
            mat bhat = e(b)
            
            // predicted values  
            predict xb0, xb // for full sample
            predict u0, u // estimation sample only
            predict xbu0, xbu // estimation sample only
            
            // get fixed effects for full sample
            sort rad
            by rad: egen uhat = mean(u0)
            
            // get predicted values xb + u
            gen xbu = xb0 + uhat
            I will upload a fix to lassopack when I get a chance.

            the actual rule in Stata is that parts of the command occur prior to odd-numbered commas and that options occur after odd-numbered commas; i.e., you are allowed to have more than one comma but need to be careful what you put where; note, however, that there may be some official commands that do not abide by this (there are many community-contributed commands that break this); compare the following:
            I wasn't aware of this.

            Thanks William Lisowski for tagging me. I missed the conversation since I am not regularly on Statalist.

            By the way: Recommended way of installing lassopack is from github. We only upload every other version to SSC.

            Code:
            net install lassopack, from("https://raw.githubusercontent.com/statalasso/lassopack/master/")
            Last edited by Achim Ahrens; 10 Aug 2021, 12:03.
            --
            Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.

            Comment


            • #7
              You're amazing (as are William and Rich)!! Thank you so much. Always a little comforting to know it wasn't something completely obvious. Thanks for an amazing package.

              Comment

              Working...
              X