Variable Subset Selection for Logistic Regression

Mike Stage

Join Date: Apr 2018

Posts: 1
#1

Variable Subset Selection for Logistic Regression

06 Apr 2018, 04:19

Hello all,

i want to analyse risk in a Peer-to-Peer-Lending environment by modeling prepayment and default using logisitc regression. The dataset i am using (after deleting all variables not relevant for prepayment/default) has 25 variables. Those variables are continuous (e.g. "income") and categorical (e.g. "purpose" or dummys like "verified"). Now i would like to identify a subset (lets say a maximum of 10 variables) of those variables. Available literature does not really explain how variables were picked (it seems like the were choosen by logic arguments of relevance).

Commonly used methods like stepwise and best subset approaches are often critizied (maybe i am wrong here?). I also thought of a pca approach for mixed data, but as far as i know filter methods like B1,B2,B3,B4 (Jolliffe) for variable selection are not meant for regression subset selection.

Does anyone have a clue how to solve the problem and extract a suitable subset of variables for a following regression analysis?

Thank you.
Best regards, Mike Stage

Last edited by Mike Stage; 06 Apr 2018, 04:23.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35755
#2

06 Apr 2018, 04:54

Cross-posted at https://stackoverflow.com/questions/...tic-regression In my view the question is off-topic there, but regardless of that our policy on cross-posting is explicit in the FAQ Advice everyone is asked to read before posting: you are asked to tell us about it.
Comment

Announcement

Variable Subset Selection for Logistic Regression

Comment