Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I keep a large a number of variables in STATA

    Hello,

    I would like to do the OLS after variables selection by LASSO. Lasso selected about 70 variables. How could I keep these variables, drop other unselected variables, so that I could do the following step?

    Thank you very much!

  • #2
    You do not explain "LASSO" so I am left to assume you are using the lasso2 command included in the lassopack package from SSC.

    We see from the output of help lasso2 that the macro e(selected0) includes all the selected independent variables other than the constant. That suggests to me that
    Code:
    lasso2 y x1 x2 ....
    local indep `e(selected)'
    regress y `indep'
    should accomplish the OLS regression you seek.

    Please take the time to review review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    Comment


    • #3
      Thanks William. Note that the lasso2 reports post-lasso (post-estimation OLS) automatically. Take this example from the help file:
      Code:
      // Load example data
      insheet using https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.data, clear tab
      
      // Estimate coefficient lasso path over (default) list of lambda values.
      . lasso2 lpsa lcavol lweight age lbph svi lcp gleason pgg45
      
      // Estimate model selected by EBIC.  Show lasso and post-lasso coefficients.
      // postresults makes sure that coefficients are stored as e() objects
      . lasso2, lic(ebic) postres
      If you want to use Stata's regress, you can use e(selected0) as William suggested. But please note that the test statistics reported by regress will generally not be valid.
      http://statalasso.github.io/

      Comment


      • #4
        Thank you Achim Ahrens for sharing your knowledge of the lasso2 command. You can tell I am not familiar with lasso2, and just applied what I know about returned results from estimation commands to address the narrow question of using the variables selected in a subsequent OLS regression, as post #1 asked, without asking myself if that was the right thing to do. In doing so I was concerned about exactly the point you make in your final sentence and am glad to see that there is a correct way to run post-lasso OLS provided as part of the lasso2 command.

        The lesson here is that it's always best to carefully read the documentation provided by the help file thoroughly.

        Comment


        • #5
          If the slimmed down version of the model has 70 predictors, what did the original look like?

          Comment

          Working...
          X