Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Lasso Logistic Regression, Cross-Validation, and AUC

    Hi folks.
    I am working on a dataset of 200 subjects, 27 outcomes (binary) and looking at predictors using a lasso model. I realize with a good rule of thumb I can really only include 2-3 predictors, and that's okay, but my question is around the execution of the training AUC and validation AUC. I am not splitting the data, just using cross-validation.
    lasso logit total_dose2 c.age c.bmi_calculated i.hdp_presentation i.fg_restrict i.parity i.smoke_status i.diabetes_hx i.new_prior i.gest_cat i.antepartum_asa i.map_cat, nolog rseed(12345) predict lasso_model roctab total_dose2 lasso_model
    My understanding is that this would be the training AUC?
    cvauroc total_dose2 lasso_model, kfold(10) seed(12345)
    and that this code would be the k-fold cross validated AUC, i.e. a validation set.

    But this doesn't *seem* right, so I am wondering if there is a more appropriate way to do this process in Stata. It seems like the first AUC and cvauroc AUC are too similar.

    I would *greatly* appreciate any thoughts or considerations folks can provide.
Working...
X