Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable selection for multiple logistic regression

    Based on the now widely cited paper by Heinze et al. https://pmc.ncbi.nlm.nih.gov/articles/PMC5969114/... and the non-rigorous way variable selection is often performed in medical studies...

    Im left with the idea that from an observational study perspective the gold standard approach is to include in final model: Theory variables from previous knowledge + Variables with uncertain behavior selected through BACKWARD ELIMINATION WITH THE USE OF AIC and then performing bootstrapping/penalizing + reporting performance metrics.

    With this in mind, I cannot seem to find a good command that allows for backward elimination with the use of AIC automatically.

    I have found a manual way in theory which would be to run a saturated model and then manually deleting one by one and then inserting it back to the model, while running estat ic at each deletion to see which ends up having the greater decrease in AIC to justify deletion completely.

    When to stop? My plan is to run gvselect library to find the N variables my final model should have

    With this in mind:
    1. What's your overall assessment of my plan.
    2. Has any new library been created for automatic BE with AIC?
    3. Any other suggested routes to the manual way?

    Overall: Given that no built-in Stata command currently supports BE with direct AIC tracking to my knowledge, I'm preparing a manual workflow using estat ic to guide model reduction. I’ll retain theory-based covariates and evaluate stability with bootstrap inclusion frequencies. Any suggestions for improving this flow
    Last edited by Sergio Alzate; 19 May 2025, 23:45.
Working...
X