questions about model selection with lassopack

Jing Pan

Join Date: Dec 2018

Posts: 9
#1

questions about model selection with lassopack

27 Dec 2018, 00:50

Dear STATA users,
Sorry to ask you 3 simple questions.

1.When we used lassopack for selecting predictors, if the predictor is a categorical variable, should we just put it in the code, or add "i." before the variable?

Should we use these code:
lasso2 AO agec i.sex i.edu3 i.jobm i.incomef i.snec i.dnec1 , plotpath(lambda)
cvlasso AO agec i.sex i.edu3 i.jobm i.incomef i.snec i.dnec1 , lopt seed(123)

Or these code:
lasso2 AO agec sex edu3 jobm incomef snec dnec1 , plotpath(lambda)
cvlasso AO agec sex edu3 jobm incomef snec dnec1 , lopt seed(123)

2. Must we use cvlasso to select the predictors?
When we finished the lasso2 code and at the bottom of the results, there is a explanation "Type "lasso2, lic(ebic)" to run the model selected by EBIC.

My question is which one should be based for model selection? EBIC or Lambda?

3. After we run the lasso code and get the final model, the p values for some predictors are more than 0.05, is it ok?

Many thanks and best wishes!
Jing Pan
Tags: None
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#2

28 Dec 2018, 12:59

Hi Jing Pang. A very quick answer:

1. Most people would use the factor variable operator i. That would be the case in most settings, not just lasso estimation.

2. No easy short answer here. Depends what you are trying to do. If your goal is prediction, then you'd probably prefer cross-validation with cvlasso. If your goal is model selection, then EBIC has some nice properties.

3. The usual p-values from OLS after model selection by the lasso aren't generally valid.

--Mark
Comment
Jing Pan

Join Date: Dec 2018

Posts: 9
#3

29 Dec 2018, 17:27

Hi Mark,
Many thanks for your valuable answer. It really helps me a lot.
If you are available, I have a few more question about these feedback.
1. I found the predictors choosen by cvclasso is more than those choosen by EBIC. So I choosed predictors with EBIC.
2. If the p-values are bigger than 0.05, is the model with those predictors acceptable?
Many thanks and best wishes!
Jing Pan
Comment

Announcement

questions about model selection with lassopack

Comment

Comment