Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stepwise Regression to Maximize goodness of fit

    Hello Statalist members,

    I have a question on how to use the stepwise function for logistic regression of a binary variable . I'm trying to see what variables in my model have a good Hosmer -Lemeshow goodness of fit. Since my model includes ~15 variables, I'd like to use the stepwise function.

    My current code is the following (I have factor variables):
    xi: stepwise, pr(.05): logistic hospital i.race_eth i.revagebin gender hx*

    After finding the variables that have significance, I run the Hosmer -Lemeshow fit function :
    lfit, group(10) table

    My results still show a H-L goodness of fit < 0.05. 1) Why is this happening even though my variables are each significant? 2) Is there a stepwise function on a goodness of fit threshold? 3) Is there another option other than the stepwise function that you would recommend?

    Thanks for your help!
    Louise

  • #2
    Stepwise regression does not have many fans on this list. For the reasons why you should be sceptical see: http://www.stata.com/support/faqs/st...sion-problems/

    The Hosmer-Lemeshow test is also problematic, in fact it was the subject of my very first post on Statalist 10 years and 45 days ago. It has low power to detect any deviation from your model. Also why would you want to base your model selection on statistical tests? We already know that the model is wrong before we do any tests. The real question is: is it close enough to be useful? Statistical tests do not answer that question.

    So I would take a step back and look at the variables and think about why you think it should be in your model. Is there one explanatory/predictor/independent/right-hand-side/x-variable of primary interest? Yes, then it should definately be in. Do you think that another explanatory variable influences both your explanatory variable of interest and the explained/predicted/dependent/left-hand-side/y-variable? Then those variables should be in your model as those are confounding variables. Do you think that your explanatory variable of interest influences another explanatory variable, which in turn influences the explained variable? Those variables should not be in your model. They capture the mechanism through which your variable of interest influences the explained variable, and you obviously don't want to filter out the part of the effect where we know why that effect exsists.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment

    Working...
    X