Lasso logit: error in postselection b

Chris Imler

Join Date: Nov 2019

Posts: 8
#1

Lasso logit: error in postselection b

01 Dec 2020, 08:41

Hi everyone,

after fitting a lasso logit model (Stata 16.1) I receive the error "error in postselection b" with the error code r(430). Return code 430 states:

convergence not achieved
You have estimated a maximum likelihood model, and Stata's
maximization procedure failed to converge to a solution;
see help maximize. Check if the model is identified.

My lassologit command looks like this:

Code:

lasso logit y c.x1 c.x2 .... i.x13 i.x14 .... i.20 if (testSet != 2), selection(cv, gridminok) rseed(222)

The categorical values contain a lot of categories, so they translate into quiet a few variables. For example one variable is a week dummy going from 1-52.
I am fitting the model for 40 subsets of my data (referring to different countries for example) and then for 10 different subsets (for my own cross-validation).

Some of the regressions run with out a problem. I have the feeling those are the regressions with more data.
Those regressions that are troublesome work however, if I exclude some of the categorical variables, like the week identifier.

Any ideas where the problem comes from and maybe also potential solutions?

Thanks a lot,
Chris

EDIT: just want to point out that the problem is similar to the one described here: https://www.statalist.org/forums/for...ostselection-b , but there is no solution provided. But it seems to be a more common problem.
Tags: convergence, lasso logit, lassologit, Maximum-likelihood, regression
DI LIU

Join Date: Sep 2020

Posts: 5
#2

01 Dec 2020, 10:31

Hi Chris,

The error "error in postselection b" is actually caused by -logit- inside
-lasso logit- when computing the postselection coefficient. This means that
the conventional maximum likelihood estimator identification conditions are not
satisfied.

The post-selection coefficients are obtained by running a regular regression such
as regress or logit on the variables selected by -lasso-. Post-selection
coefficients are used in both of the -predict- and inference commands such as
-dslogit- and -pologit-. Some simulation studies show that post-selection
coefficients have better prediction properties than the penalized coefficients,
and that's the reason we compute them.

The error itself indicates the variables selected by -lasso- form a numerically
unstable model for the current sample size, which cannot converge using -logit-
on the selected variables.

A good alternative would be to try different selection methods such as
-selection(adaptive)- or -selection(plugin)-. In theory, adaptive lasso should
select a sparser model than the cross-validation. Plugin method is even more
aggressive than CV or adaptive lasso in terms of selecting sparse models.

I hope this help.

Di
Comment

Announcement

Lasso logit: error in postselection b

Comment