Hi. I am trying to use lassologit for model selection in my data, where the number of predictors are larger than the number of observations in my dataset. This is my first time using a lasso command and while I have a basic understanding of the mechanics of it, I am struggling to interpret the stata results and figure out which predictors are best suited given my outcome measure.
Below is the output from the lassologit as well as the cvlassologit commands. Any guidance on how to interpret these results and how to identify which predictors work best would be much appreciated, or alternately if I've gotten this all wrong and any suggestion what I should do to identify the best predictors. For reference, I am using Stata 15.1 and don't have access to some of the newer commands that simplify this process.
Below is the output from the lassologit as well as the cvlassologit commands. Any guidance on how to interpret these results and how to identify which predictors work best would be much appreciated, or alternately if I've gotten this all wrong and any suggestion what I should do to identify the best predictors. For reference, I am using Stata 15.1 and don't have access to some of the newer commands that simplify this process.
Code:
lassologit Stepone $demo $health $healthsys $econ Obtaining solution for 50 lambdas ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 Knot| ID Lambda s L1-Norm EBIC Pseudo-R2 | Entered/removed ------+---------------------------------------------------------+---------------- 1| 1 5.61988 0 0.18232 30.31641 0.0000 | Added _cons. 2| 2 5.11577 1 0.45126 39.14029 0.0329 | Added healthover_tot1516. 3| 3 4.65687 3 0.88499 57.48661 0.0756 | Added povertyperct | | religion_others_2011. 4| 7 3.19764 4 2.27917 62.72845 0.2267 | Added | | rate_acuteresp_mlcase2018. 5| 9 2.64970 6 624.74804 80.40714 0.2915 | Added rate_hepdeaths_2018 | | rate_ventilators_public. 6| 11 2.19565 8 6106.07641 97.11783 0.3882 | Added | | rate_influenzacase_2018 | | exphealth_percap1516. 7| 12 1.99870 7 5594.80079 86.03996 0.4296 | Removed povertyperct. 8| 15 1.50764 8 2635.32693 92.90529 0.5271 | Added jain_2011. 9| 16 1.37240 8 1133.65945 91.88012 0.5609 | Added | | rate_typoidml_death2018. | | Removed | | rate_ventilators_public. 10| 17 1.24930 12 1393.10142 130.18702 0.5932 | Added pop_perkm2_2019 | | muslim_2011 | | rate_pneumoniaml_case2018 | | rate_hospital_beds_private. 11| 18 1.13723 12 1704.08816 129.02455 0.6315 | Added projtotpop_2019. 12| 20 0.94236 14 2157.12334 146.57911 0.7004 | Added | | rate_acuteresp_femdeath2018. 13| 21 0.85783 13 2324.96845 135.87923 0.7294 | Removed pop_perkm2_2019. 14| 24 0.64707 14 2799.60647 143.58208 0.7993 | Added sexratio_2016. 15| 27 0.48809 15 3219.14459 151.85042 0.8505 | Added projpercturban_2019. 16| 28 0.44431 14 3387.09811 141.60784 0.8644 | Removed | | exphealth_percap1516. 17| 30 0.36817 15 3804.53081 150.70366 0.8883 | Added rate_tb2018. 18| 48 0.06782 16 7637.99027 157.74653 0.9799 | Added pop_perkm2_2019. 19| 49 0.06174 17 7860.38536 167.51301 0.9817 | Added per85_2011. Use 'long' option for full output. Type e.g. 'lassologit, lic(ebic)' to run the model selected by EBIC. . lassologit, lic(ebic) Use lambda=5.619880267729252 (selected by EBIC). --------------------------------------------------- Selected | Logistic Post | Lasso logit ------------------+-------------------------------- _cons | -0.1823216 8.8560519 --------------------------------------------------- . cvlassologit Stepone $demo $health $healthsys $econ, nfolds(10) seed(123) tabfold stratified (max) | Fold Stepone | 1 2 3 4 5 6 | Total -----------+------------------------------------------------------------------+---------- 0 | 1 1 1 1 2 1 | 12 1 | 1 1 1 1 1 1 | 10 -----------+------------------------------------------------------------------+---------- Total | 2 2 2 2 3 2 | 22 (max) | Fold Stepone | 7 8 9 10 | Total -----------+--------------------------------------------+---------- 0 | 1 1 1 2 | 12 1 | 1 1 1 1 | 10 -----------+--------------------------------------------+---------- Total | 2 2 2 3 | 22 K-fold cross-validation with 10 folds. Fold 1 2 3 4 5 6 7 8 9 10 | Lambda Deviance St. err. ----------+--------------------------------------------- 1| 5.6198803 1.4244316 .01966248 ^ 4| 3.681744 1.4152504 .11431231 * * lopt = the lambda that minimizes loss measure. Run model: cvlassologit, lopt ^ lse = largest lambda for which MSPE is within one standard error of the minimum loss. Run model: cvlassologit, lse Use 'long' option for long output. . cvlassologit, lopt postresults | Lambda Deviance St. err. ----------+--------------------------------------------- 1| 5.6198803 1.4244316 .01966248 ^ 4| 3.681744 1.4152504 .11431231 * * lopt = the lambda that minimizes loss measure. Run model: cvlassologit, lopt ^ lse = largest lambda for which MSPE is within one standard error of the minimum loss. Run model: cvlassologit, lse Use 'long' option for long output. Estimate with lambda=3.682 (lopt). --------------------------------------------------- Selected | Logistic Post | Lasso logit ------------------+-------------------------------- povertyperct | 0.0071754 0.0289432 religion_oth~2011 | 0.0551081 0.6745619 healthover_t~1516 | -0.3188299 -2.1661823 _cons | 1.3770605 10.4341639 --------------------------------------------------- . predict double phat, pr