Hello.
I am using lasso logit in Stata 18.0 for the first time, using lasso for prediction.
I have read the help documents several times as well as the postestimation documents for logistic regression.
I have also read several questions/responses on this listserve.
I am still not sure I am performing analyses correctly, specifically related to:
(1) applying the results of the lasso logit to an independent dataset
(2) how to obtain sensitivity/specificity of the model that I developed in my training dataset in an independent sample
I would really appreciate any input.
Specific questions below.
I have 3 groups:
(1) Group 1: a discovery cohort (aka training; sample on which I am developing the logistic regression) (svar==1)
(2) Group 2: held out set for testing (svar==2)
(3) Group 3: sample that was not in either training or testing (completely independent sample) (svar==3)
set seed 1234
*This was the code I used to generate my training and testing dataset *Groups 1 and 2*
splitsample, generate(svar) split(0.75 0.25)
*Then I imported in group 3 and name them svar==3*
*I start with this code*
quietly lasso logit y x1 x2 x3 if svar==1, selection(cv) rseed(1234) folds(10)
estimates store estim
*I then predict pr. QUESTION: is it ok/correct to predict pr for the whole sample (all 3 groups) at once? (I did read about predict e(sample) but is it correct it doesnt apply here?)*
predict pr
*I test goodness of fit in each group using this command*
lassogof, over(svar)
*I test sensitivty/specificy in the groups using this command*=
roctab irbd pr if svar==2
roctab irbd pr if svar==3
*QUESTION: Is it correct that I can interpret there results of roctab as follows: At a predicted proabbility of xyz (some cutoff of pr that I choose), the senstiivty and specifity of the model I developed on the TRAINING dataset (group 1, svar==1) is xyz in the out-of-sample dataset (group 3)?*
Side note/question:
*on this listserve i did find the following code but I am not sure if it is achieving the same thing, and the estat results I get are similar though not exactly like roctab. Any comments on whether this code is approriate to use after lasso logit for what I am trying ot achieve?*
logistic `e(post_sel_vars)' if svar==2
estat classification
logistic `e(post_sel_vars)' if svar==3
estat classification
I am using lasso logit in Stata 18.0 for the first time, using lasso for prediction.
I have read the help documents several times as well as the postestimation documents for logistic regression.
I have also read several questions/responses on this listserve.
I am still not sure I am performing analyses correctly, specifically related to:
(1) applying the results of the lasso logit to an independent dataset
(2) how to obtain sensitivity/specificity of the model that I developed in my training dataset in an independent sample
I would really appreciate any input.
Specific questions below.
I have 3 groups:
(1) Group 1: a discovery cohort (aka training; sample on which I am developing the logistic regression) (svar==1)
(2) Group 2: held out set for testing (svar==2)
(3) Group 3: sample that was not in either training or testing (completely independent sample) (svar==3)
set seed 1234
*This was the code I used to generate my training and testing dataset *Groups 1 and 2*
splitsample, generate(svar) split(0.75 0.25)
*Then I imported in group 3 and name them svar==3*
*I start with this code*
quietly lasso logit y x1 x2 x3 if svar==1, selection(cv) rseed(1234) folds(10)
estimates store estim
*I then predict pr. QUESTION: is it ok/correct to predict pr for the whole sample (all 3 groups) at once? (I did read about predict e(sample) but is it correct it doesnt apply here?)*
predict pr
*I test goodness of fit in each group using this command*
lassogof, over(svar)
*I test sensitivty/specificy in the groups using this command*=
roctab irbd pr if svar==2
roctab irbd pr if svar==3
*QUESTION: Is it correct that I can interpret there results of roctab as follows: At a predicted proabbility of xyz (some cutoff of pr that I choose), the senstiivty and specifity of the model I developed on the TRAINING dataset (group 1, svar==1) is xyz in the out-of-sample dataset (group 3)?*
Side note/question:
*on this listserve i did find the following code but I am not sure if it is achieving the same thing, and the estat results I get are similar though not exactly like roctab. Any comments on whether this code is approriate to use after lasso logit for what I am trying ot achieve?*
logistic `e(post_sel_vars)' if svar==2
estat classification
logistic `e(post_sel_vars)' if svar==3
estat classification