Can I combine feature selection and feature creation method?

Van Anh

Join Date: Jan 2019

Posts: 8
#1

Can I combine feature selection and feature creation method?

13 Jan 2019, 07:09

Hello,
I am working on my thesis, with the topic "Factors determining life satisfaction in the USA". I was given a dataset of 265 variables and more than 2000 observations. My aim is to compare the effects of economics and social problems (which are not already variables in the dataset) on the happiness.
I would like to first, using LASSO (lasso2 depvar indepvar, lic(aic) ) to choose the suitable variables for the regression. And then, from the selected variables, I use PCA to combine them to only 10 factors, including economics and social problems). The last step is to build a model from these factors, with happiness as the dependent variables.
In this case, LASSO selected 98 variables, and the number is too large to build a model. Therefore, I would like to use PCA to both reduce the dimension and call out the needed latent variables.
May I ask is this an acceptable method to combine LASSO and PCA? If not, could you please suggest me a better method?

Thank you very much!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

13 Jan 2019, 09:36

Van:
welcome to this forum.
Using PCA as a last step sounds reasonable.
In my opinion, the main risk lieas ahead in the regression model (endogeneity due to reverse causation): when adjusted for the other predictors, happiness can well contribute to variation in life-satistaction but the other way round holds, too.

Kind regards,
Carlo
(Stata 19.0)
Comment
Van Anh

Join Date: Jan 2019

Posts: 8
#3

14 Jan 2019, 06:50

Thank you very much for your answer
So it means that I can use PCA after lasso feature selection, isn't it? By the way, should I use cvlasso to choose the value of Lambda before doing lasso?
Anyway, here I consider happiness is the same as life satisfaction, not two different variables.
Comment
Achim Ahrens

Join Date: Jun 2014

Posts: 49
#4

14 Jan 2019, 09:41

Why do you want to combine lasso and PCA? Both lasso and PCA are regularization (dimension reduction) techniques, but they work in different ways and are usually used with a different aim in mind.

The lasso is fine if you want to predict happiness or identify which variables determine happiness (endogeneity issues aside). -- But consider to use EBIC or AICc instead of AIC; both are more appropriate when you have many regressors.

If I understand correctly, you are interested in latent factors; so why not apply PCA to the full set of regressors and regress happiness against a subset of those components? In this way PCA does the regularization for you.

You might also want to consider Ridge regression-- Principal Components Regression and Ridge are closely related (see https://web.stanford.edu/~hastie/ElemStatLearn/).

--
Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.
1 like
Comment

Announcement

Can I combine feature selection and feature creation method?

Comment

Comment

Comment