Stepwise Regression to Maximize goodness of fit

Louise Wang

Join Date: Jul 2025

Posts: 0
#1

Stepwise Regression to Maximize goodness of fit

04 Nov 2014, 21:16

Hello Statalist members,

I have a question on how to use the stepwise function for logistic regression of a binary variable . I'm trying to see what variables in my model have a good Hosmer -Lemeshow goodness of fit. Since my model includes ~15 variables, I'd like to use the stepwise function.

My current code is the following (I have factor variables):
xi: stepwise, pr(.05): logistic hospital i.race_eth i.revagebin gender hx*

After finding the variables that have significance, I run the Hosmer -Lemeshow fit function :
lfit, group(10) table

My results still show a H-L goodness of fit < 0.05. 1) Why is this happening even though my variables are each significant? 2) Is there a stepwise function on a goodness of fit threshold? 3) Is there another option other than the stepwise function that you would recommend?

Thanks for your help!
Louise
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#2

05 Nov 2014, 01:56

Stepwise regression does not have many fans on this list. For the reasons why you should be sceptical see: http://www.stata.com/support/faqs/st...sion-problems/

The Hosmer-Lemeshow test is also problematic, in fact it was the subject of my very first post on Statalist 10 years and 45 days ago. It has low power to detect any deviation from your model. Also why would you want to base your model selection on statistical tests? We already know that the model is wrong before we do any tests. The real question is: is it close enough to be useful? Statistical tests do not answer that question.

So I would take a step back and look at the variables and think about why you think it should be in your model. Is there one explanatory/predictor/independent/right-hand-side/x-variable of primary interest? Yes, then it should definately be in. Do you think that another explanatory variable influences both your explanatory variable of interest and the explained/predicted/dependent/left-hand-side/y-variable? Then those variables should be in your model as those are confounding variables. Do you think that your explanatory variable of interest influences another explanatory variable, which in turn influences the explained variable? Those variables should not be in your model. They capture the mechanism through which your variable of interest influences the explained variable, and you obviously don't want to filter out the part of the effect where we know why that effect exsists.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

Stepwise Regression to Maximize goodness of fit

Comment