Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lasso Logit Postestimation/Out of Sample ROC

    Hello.
    I am using lasso logit in Stata 18.0 for the first time, using lasso for prediction.
    I have read the help documents several times as well as the postestimation documents for logistic regression.
    I have also read several questions/responses on this listserve.
    I am still not sure I am performing analyses correctly, specifically related to:
    (1) applying the results of the lasso logit to an independent dataset
    (2) how to obtain sensitivity/specificity of the model that I developed in my training dataset in an independent sample
    I would really appreciate any input.
    Specific questions below.

    I have 3 groups:
    (1) Group 1: a discovery cohort (aka training; sample on which I am developing the logistic regression) (svar==1)
    (2) Group 2: held out set for testing (svar==2)
    (3) Group 3: sample that was not in either training or testing (completely independent sample) (svar==3)

    set seed 1234
    *This was the code I used to generate my training and testing dataset *Groups 1 and 2*
    splitsample, generate(svar) split(0.75 0.25)

    *Then I imported in group 3 and name them svar==3*

    *I start with this code*
    quietly lasso logit y x1 x2 x3 if svar==1, selection(cv) rseed(1234) folds(10)
    estimates store estim

    *I then predict pr. QUESTION: is it ok/correct to predict pr for the whole sample (all 3 groups) at once? (I did read about predict e(sample) but is it correct it doesnt apply here?)*
    predict pr

    *I test goodness of fit in each group using this command*
    lassogof, over(svar)

    *I test sensitivty/specificy in the groups using this command*=
    roctab irbd pr if svar==2
    roctab irbd pr if svar==3

    *QUESTION: Is it correct that I can interpret there results of roctab as follows: At a predicted proabbility of xyz (some cutoff of pr that I choose), the senstiivty and specifity of the model I developed on the TRAINING dataset (group 1, svar==1) is xyz in the out-of-sample dataset (group 3)?*

    Side note/question:
    *on this listserve i did find the following code but I am not sure if it is achieving the same thing, and the estat results I get are similar though not exactly like roctab. Any comments on whether this code is approriate to use after lasso logit for what I am trying ot achieve?*
    logistic `e(post_sel_vars)' if svar==2
    estat classification
    logistic `e(post_sel_vars)' if svar==3
    estat classification

Working...
X