Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lasso logit: error in postselection b

    Hi everyone,

    after fitting a lasso logit model (Stata 16.1) I receive the error "error in postselection b" with the error code r(430). Return code 430 states:


    convergence not achieved
    You have estimated a maximum likelihood model, and Stata's
    maximization procedure failed to converge to a solution;
    see help maximize. Check if the model is identified.
    My lassologit command looks like this:

    Code:
    lasso logit  y c.x1 c.x2 .... i.x13 i.x14 .... i.20  if (testSet != 2), selection(cv, gridminok) rseed(222)
    The categorical values contain a lot of categories, so they translate into quiet a few variables. For example one variable is a week dummy going from 1-52.
    I am fitting the model for 40 subsets of my data (referring to different countries for example) and then for 10 different subsets (for my own cross-validation).

    Some of the regressions run with out a problem. I have the feeling those are the regressions with more data.
    Those regressions that are troublesome work however, if I exclude some of the categorical variables, like the week identifier.

    Any ideas where the problem comes from and maybe also potential solutions?

    Thanks a lot,
    Chris

    EDIT: just want to point out that the problem is similar to the one described here: https://www.statalist.org/forums/for...ostselection-b , but there is no solution provided. But it seems to be a more common problem.

  • #2
    Hi Chris,

    The error "error in postselection b" is actually caused by -logit- inside
    -lasso logit- when computing the postselection coefficient. This means that
    the conventional maximum likelihood estimator identification conditions are not
    satisfied.

    The post-selection coefficients are obtained by running a regular regression such
    as regress or logit on the variables selected by -lasso-. Post-selection
    coefficients are used in both of the -predict- and inference commands such as
    -dslogit- and -pologit-. Some simulation studies show that post-selection
    coefficients have better prediction properties than the penalized coefficients,
    and that's the reason we compute them.

    The error itself indicates the variables selected by -lasso- form a numerically
    unstable model for the current sample size, which cannot converge using -logit-
    on the selected variables.

    A good alternative would be to try different selection methods such as
    -selection(adaptive)- or -selection(plugin)-. In theory, adaptive lasso should
    select a sparser model than the cross-validation. Plugin method is even more
    aggressive than CV or adaptive lasso in terms of selecting sparse models.

    I hope this help.

    Di

    Comment

    Working...
    X