lassopack & pdslasso: prediction & causal inference using lasso, square-root lasso, adaptive lasso, elastic net or ridge regression

Mark Schaffer

Join Date: Mar 2014

Posts: 324
#1

lassopack & pdslasso: prediction & causal inference using lasso, square-root lasso, adaptive lasso, elastic net or ridge regression

03 Feb 2018, 12:03

With thanks to Kit Baum, two new user-written packages by Achim Ahrens, Chris Hansen and Mark Schaffer are now available through the SSC archive: LASSOPACK and PDSLASSO.

LASSOPACK is a suite of programs for penalized regression methods: the lasso, square-root lasso, adaptive lasso, elastic net, ridge regression and post-estimation OLS. These methods are suitable for the high-dimensional setting where the number of predictors may be large and possibly greater than the number of observations.

PDSLASSO implements routines for estimating structural parameters in linear models with many controls and/or many instruments. A VCV for the estimated coefficients is also reported, making inference/testing possible. PDSLASSO makes use of the lasso and square-root lasso to select controls and/or instruments from a large set of variables (possibly numbering more than the number of observations), in a setting where the researcher is interested in estimating the causal impact of one or more (possibly endogenous) causal variables of interest.

The lasso (Least Absolute Shrinkage and Selection Operator, Tibshirani 1996), the square-root-lasso (Belloni et al. 2011) and the adaptive lasso (Zou 2006) are regularization methods that use L1 norm penalization to achieve sparse solutions: of the full set of predictors, typically most will have coefficients set to zero. Ridge regression (Hoerl & Kennard 1970) relies on L2 norm penalization; the elastic net (Zou & Hastie 2005) uses a mix of L1 and L2 penalization.

LASSOPACK consists of three main programs:
lasso2 implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso and post-estimation OLS.

cvlasso supports K-fold cross-validation and rolling h-step ahead cross-validation (for time-series and panel data) to choose the optimal tuning parameters, i.e., the overall penalty level, lambda, and the elastic net parameter, alpha.

rlasso implements theory-driven penalization for the lasso and square-root lasso for cross-section and panel data based on theory developed in Belloni et al. (2012, 2013, 2014, 2016). In addition, rlasso can also report the Chernozhukov et al. (2013) sup-score test of joint significance of the regressors, a test that is suitable for the high-dimensional setting.

LASSOPACK implements the elastic net and square-root lasso using coordinate descent algorithms. The algorithm (then referred to as "shooting") was first proposed by Fu (1998) for the lasso, and by Van der Kooij (2007) for the elastic net. Belloni et al. (2011) implement coordinate descent for the square-root lasso.

The PDSLASSO package includes two commands:
pdslasso, which allows for estimating structural parameters in linear models with many controls.

ivlasso, which in addition allows for endogenous treatment variables and many instruments.

PDSLASSO relies on algorithms to estimate the penalization level as implemented in rlasso. Thus PDSLASSO requires LASSOPACK to be installed.

For an alternative Stata package to estimate the elastic net, lasso and ridge regression using coordinate descent, see elasticregress by Wilbur Townsend, announced on Statalist here:

https://www.statalist.org/forums/for...net-regression
Tags: None

4 likes
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#2

15 Feb 2018, 12:16

An update for LASSOPACK and PDSLASSO is now available on SSC (with thanks to Kit Baum as usual).

The update fixes a couple of bugs, but the main change is a big speed improvement for the fixed effects option for the various estimators, thanks to Sergio Correia's FTOOLS package. FTOOLS needs to be installed to take advantage of this but the programs will also run without it (albeit more slowly).
Comment
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#3

16 Feb 2018, 03:02

A user (thanks John!) has just pointed out a "bug", namely that ftools needs to be run once on its own first. So after you install ftools, just type once, within Stata,

ftools

and then everything should work. It's a permanent fix, so you only need to do it once. No need to do it in each Stata session etc.

It is a bug, though, because I meant to code it so that this is done automatically. Will see about fixing it in the next update.
Comment
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#4

18 Feb 2018, 14:49

And with thanks to Kit Baum as usual, a fix is now up on SSC. It should now be the case that the first call to one of the lasso programs will automatically take care of the initial call to ftools.
Comment
Achim Ahrens

Join Date: Jun 2014

Posts: 49
#5

09 Apr 2018, 04:42

A new version of lassopack has been uploaded to SSC, thanks to Kit. The update adds a third method for data-driven model selection.

The previous version of lassopack supported two methods for selecting the tuning parameter lambda: cross-validation and the theory-driven ("rigorous") penalization developed in a series of articles by Belloni, Chernozhukov and Hansen (e.g. Econometrica, 2012). K-fold cross-validation and rolling cross-validation (for panel and time-series data) are implemented in cvlasso; theory-driven penalization is available in rlasso.

We have now added a third method: selection of tuning parameters using information criteria. In addition to classical information criteria -- Akaike information criterion (AIC, Akaike, 1974), the Bayesian information criterion (BIC, Schwarz, 1978), the corrected AIC (Sugiura, 1978; Hurvich, 1989) -- we also support the Extended BIC (Chen & Chen, Biometrika, 2008), which is more suitable in the high-dimensional context when the number of regressors is large relative to the sample size. By default, lasso2 reports the EBIC, but the user can change the default display using the ic(string) option.

To estimate the model selected by one of the information criteria, use the lic(string) option as follows:

Code:

lasso2 y x* // estimate the full model for a range of lambda values using the lasso lasso2, lic(ebic) // estimate the model selected by EBIC

More information and examples can be found in the help file (see "help lasso2").

We hope that users will find the new feature useful for model selection. Feedback welcome!

--
Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.
1 like
Comment
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#6

20 Sep 2018, 05:42

Update now available on SSC (thanks as usual to Kit Baum): rlasso, pdslasso and ivlasso now support survey weights (actually pweights and aweights).

Also - we now have a web page where we put the latest stable version, explanations of how to use the commands, etc.:

https://statalasso.github.io
Comment
Richard Hiscock

Join Date: Apr 2014

Posts: 17
#7

28 Sep 2018, 01:55

Mark,
Thanks for the lassopack suite of commands.
I would be very grateful if you or others are able to help me with the following general query regarding calibration of penalized models with a binary outcome:

For use, I would like the prediction model to be calibrated (mean prediction from model = observed event proportion) however using penalized models this appears not the case.

By way of example using the prostate data example from your Github site and defining a binary outcome based upon lpsa:

clear
insheet using "https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.data"
su lpsa,de
gen binlpsa =0
replace binlpsa =1 if lpsa >=3
tab binlpsa

lasso2 binlpsa lcavol lweight age lbph svi lcp gleason pgg4 # selects lcavol & svi

predict double xbhat1, xb l(7.67813) noisily
su xbhat1,de

gen pr_xbhat1 = 1/(1+exp(-xbhat1))
su pr_xbhat1,de

hl binlpsa pr_xbhat1, plot
brier binlpsa pr_xbhat1

where hl command is obtained net from http://www.homepages.ucl.ac.uk/~ucakgam/stata

From the Brier output

Mean probability of outcome 0.2784
of forecast 0.5677

A logistic model using the same variables is calibrated:

logistic binlpsa lcavol svi
predict pr_log,pr
hl binlpsa pr_log
brier binlpsa pr_log

Am I correct in understanding that this results from the biased regression coefficients in the penalised model?
If one is attempting to develop a prediction model across levels of prediction risk (eg low, moderate & high), which needs the model to be adequately calibrated (rather than a one level classifier - say sensitivity at 90% specificity) is it acceptable to use penalized models (LASSO or elasticnNet) for variable selection and then a cross validated logistic model containing only those variables?

thanks in advance for any feedback

yours Richard Hiscock
Comment
Achim Ahrens

Join Date: Jun 2014

Posts: 49
#8

29 Sep 2018, 06:02

Hi Richard, first a side note:

lasso2 binlpsa lcavol lweight age lbph svi lcp gleason pgg4 # selects lcavol & svi
predict double xbhat1, xb l(7.67813) noisily

It seems you have taken the "optimal" lambda (as selected by EBIC) and copy & pasted it to use with the -predict- command.

This will give you the result you want (except for rounding), but might be error-prone. For example, you might change your model, but forget to update the lambda value.

Instead, I would recommend the following usage:

lasso2 binlpsa lcavol lweight age lbph svi lcp gleason pgg4 // runs lasso for a list of lambda values
lasso2, lic(ebic) postest // displays and stores beta estimates corresponding to the value of lambda that minimizes EBIC
predict double xbhat2, xb // obtains predicted values

The "postest" in the second line asks -lasso2- to store the estimation results of the single-lambda estimation. Otherwise the results of the first -lasso2- call are still active.

The same can be also be achieved in one line:

lasso2 binlpsa lcavol lweight age lbph svi lcp gleason pgg4, lic(ebic) postest
predict double xbhat3, xb

Now, to your question: If you are willing to believe in / assume the linear probability model which states that E[y|x]=P(y=1|x)=x'beta, it might be OK to use lasso or elatisc net with a binary dependent variable. It might provide some insights. However, there are many reasons for why the LPM is not suitable and a crude simplification. Some references: https://davegiles.blogspot.com/2012/...ng-linear.html and https://blogs.worldbank.org/impactev...bability-model. There is really no guarantee that the LPM works well in your case. The technically right and clean approach would be to use the logistic lasso. Unfortunately, lassopack doesn't support logitistic lasso at the moment, but it's quite high on our to-do list!

Achim

--
Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.
Comment
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#9

29 Sep 2018, 08:47

Richard:

I just had a look at plogit (penalised log it) by Tony Brady and Gareth Ambler, available here:

net from http://www.homepages.ucl.ac.uk/~ucakgam/stata

It's a bit old (2007) and the implementation is (I think) brute-force numerical optimization using Stata's ml rather than coordinate descent or some such, but it might be worth checking out to see if it fits your needs.

We'll let people know here when we add logistic lasso to LASSOPACK.

Mark
Comment

Announcement

lassopack & pdslasso: prediction & causal inference using lasso, square-root lasso, adaptive lasso, elastic net or ridge regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment