Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Computational time of lasso2 vs. rlasso

    Hello,

    I am using the lasso2 package for model building and variable selection and have some questions about the differences between rlasso and lasso2 commands.

    The lasso2 command takes a long time to converge when a large number of variables are included, while the rlasso command is nearly instantaneous. I am a bit confused why this would be.

    Here's a reproducible illustration:

    Code:
    sysuse auto, clear
    
    foreach var in mpg rep78 headroom trunk weight length turn displacement gear_ratio {
    
        gen `var'2 = `var' ^2 
        gen `var'3 = `var' ^3 
    
    } //Create a bunch of new variables for demonstration
    
    eststo lasso1: lasso2 price mpg* rep78* headroom* trunk* weight* length* turn* displacement* gear_ratio*
        //takes a while to converge
        
    eststo lasso2: rlasso price mpg* rep78* headroom* trunk* weight* length* turn* displacement* gear_ratio*, displayall
        //instant
    On a related note, I am trying to wrap my head around the different selected models. In the auto case, we have HUGE differences in the output:

    Code:
        lasso2, lic(ebic) //model selected by EBIC NO VARIALBES?
        lasso2, lic(aicc) //model selected by AICC
    Can anyone point me in the direction of guidance on how to determine for my case which might be the best to use?

    Thanks!


  • #2
    Thanks for raising these questions, Jonathan.

    The "short" answer:

    Computational speed

    To answer your question on computational speed, it's important to understand what lasso2 and rlasso (and cvlasso) are doing in the background.

    lasso2 calculates the lasso solution for multiple penalty values (lambdas). lasso2 uses 100 lambdas by default. That is, lasso2 does not just calculate a single beta-hat vector (as e.g. regress would do), but it obtains the coefficient path of beta estimates for a range of lambdas. You can plot the coefficient path using the plotpath option. It looks like this:
    Click image for larger version

Name:	plotpath_housing.jpg
Views:	1
Size:	84.4 KB
ID:	1479509



    We get one beta-hat for each lambda. As we increase the degree of penalization (from left to right), more and more predictors are removed from the model. lasso2 also reports information criteria (AIC, AICc, BIC, EBIC) and you can select the model preferred by one of these information criteria using the lic() option, as you have demonstrated in your question.

    rlasso, on the other hand, uses an iterative algorithm to estimate a penalty level that is grounded in theory; see Belloni, Chen, Chernozhukov, Hansen, 2014 and other papers. (I'm not explaining the background of this theory here as it is all spelled out in the paper linked at the end of this message.)

    Now, about the computational time: lasso2 is much faster than running 100 separate lasso estimations with a single lambda, since we can use the previous estimate as a "warm start" (initial value). Nevertheless, the iterative algorithm of rlasso is often faster than lasso2, as it only requires a few lasso estimations.

    cvlasso which implements cross-validation is by far the slowest method as it requires multiple lasso2 estimations. This is a well-known drawback of cross-validation, especially for large data sets.

    Why are the results different?
    • The EBIC and AICc have quite different theoretical properties. EBIC is model selection consistent if the true model is among the candidates, which as such is a very strong assumption. (Even the existence of a true model is conceptually problematic.) AICc is aimed towards minimizing the average squared error, which makes it interesting when you are interested in prediction. In general, AICc will tend to select more predictors than EBIC. Side note: AIC and BIC are not recommended if you have "wide" data (i.e., many predictors relative to the sample size).
    • rlasso puts a strong focus on controlling over-fitting, which is why rlasso also tends to produce parsimonious solutions. This strong focus on controlling over-fitting makes rlasso especially attractive when you want to do causal inference -- see our sister package PDSLASSO ("ssc install pdslasso").
    Which is the right method to use?

    That depends on the aim of your analysis and your type of data. It's a complex question; just a few notes here: If your aim is prediction, you might want to consider cross-validation or AICc (which often behaves similar to cross-validation). If your aim is to identify the true model, consider BIC/EBIC keeping in mind that it relies on strong assumptions. Definitely don't use AIC or BIC if you have very wide data. If you have time-series data, you can use rolling cross-validation (also implemented in cvlasso) -- But none of these are ultimate answers, and I'd recommend to have a look at our paper below, which brings me to the long answer:


    The long answer is in our new working paper (with Christian Hansen and Mark Schaffer), which we have uploaded yesterday. So your question comes at the right time We discuss all of the above questions in much more detail. We present the theory of all three penalization approaches (cross-validation, information criteria, rigorous penalization) and also have some Monte Carlo results.

    You can find it here: https://arxiv.org/abs/1901.05397

    By the way, make sure you update lassopack -- we have just released a new version (v1.2).
    --
    Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.

    Comment


    • #3
      Fantastic! Thanks so much for the comprehensive answer. And to anyone else interested in this topic, who is new to these techniques, I can't recommend enough their paper enough. Extremely lucid explanation of some fairly complex ideas.

      Comment

      Working...
      X