Dear Stata experts,
I am working on a data set where the regressand is binary (s.o. owns a house or not).
Unfortunately I only have 25 successes (and 101 failures). Hence, guidelines of 10 observations per independent variable restrict my analysis quite a lot. If I'd follow the "one in ten rule" e.g. I could analyse no more than two or three variables, while at least seven variables are significant and influential.
That's why I thought of "displacing" some variables that didn't make it into the narrow main model, but that I still want to expand on.
So in my homeownership analysis I'd do a second regression focussing solely on personal features, omitting other (important main) variables such as the size of the town and suchlike. Of course providing that theses variables are not correlating with the variables in the main model to avoid omitted variable bias.
Is this a possible way to elude overfitting or what would you recommend?
Thanks a lot for your time and expertise!
Simon
I am working on a data set where the regressand is binary (s.o. owns a house or not).
Unfortunately I only have 25 successes (and 101 failures). Hence, guidelines of 10 observations per independent variable restrict my analysis quite a lot. If I'd follow the "one in ten rule" e.g. I could analyse no more than two or three variables, while at least seven variables are significant and influential.
That's why I thought of "displacing" some variables that didn't make it into the narrow main model, but that I still want to expand on.
So in my homeownership analysis I'd do a second regression focussing solely on personal features, omitting other (important main) variables such as the size of the town and suchlike. Of course providing that theses variables are not correlating with the variables in the main model to avoid omitted variable bias.
Is this a possible way to elude overfitting or what would you recommend?
Thanks a lot for your time and expertise!
Simon
Comment