Helo, I used Lasso in STATA under this command:
describe
vl set, categorical(7) uncertain(19) dummy
vl list vluncertain
// Choose baseline variables that make sense
// We then have to classify one by one all the vluncertain to each category: cont/
vl move (ID4 SRH7 CapAsp1 CapAsp2 CapAsp3 CapAsp4 CapAsp5 Agency1 Agency2 Agency3 Agency4 Agency5 Path1 Path2 Path3 Path4 Path5 WindowEd1_s1 WindowEd1_s2 WindowEd1_s3 WindowEd1_s4 WindowEd2_s1 WindowEd2_s2 WindowEd2_s3 WindowEd2_s4 WindowEd3_s1 WindowEd3_s2 WindowEd3_s3 WindowEd3_s4 WindowEd4_s1 WindowEd4_s2 WindowEd4_s3 WindowEd4_s4 WindowEd5_s1 WindowEd5_s2 WindowEd5_s3 WindowEd5_s4 WindowEd6_s1 WindowEd6_s2 WindowEd6_s3 WindowEd6_s4 AspJob2 AspJob3 AspJob6 AspJob7 AspFam5 risk1 age partner_age) vlcontinuous
// The variables that we are interested to predict should be moved to other
vl move (Agency_Index_std_M Path_Index_std_M CapAsp_Index_std_v2_M Hope_Index_std_v2_M AspEd9_M div1_cod_M info_treatment_06_M AspJob2_binary_M AspJob3_binary_M AspJob8_binary_M AspFam4_M AspFam6_M ess_total_M ess_total1_M se1_M se2_M se3_M se4_M wbscore_M pss_score_M) vlother
// Subdivision for our models
* We need further subdivision for our models.
* The system-defined variable lists are good for a general division of variables. But we need further
* subdivision for our models. We have four demographic variables, which are all categorical, but we
* want them included in all lasso models. So we create a user-defined variable list containing these variables.
vl create studentdemo = (treated infotreat_M age female father_mother old_sib tasaf male WindowEd1_binary WindowEd2_binary WindowEd3_binary WindowEd4_binary P_BelEd1_binary P_BelJob2_binary S_BelEd1_binary S_BelEd2_binary S_BelJob5_binary AspEd1_binary AspEd2_binary AspEd3_binary_division1 AspEd3_binary_division2 AspEd3_binary_division3 AspEd3_binary_division4 AspJob1_binary AspJob2_binary_HH AspJob2_binary_wage AspJob2_binary_self AspJob6_binary_HH AspJob6_binary_wage AspJob6_binary_self)
vl create factors = vldummy + vlcategorical
vl modify factors = factors - studentdemo
// We turn the vl substitute command allows us to apply factor-variable operators to a variable list. We turn the variables in demographics and factors into factor variables.
vl substitute istudentdemo = i.studentdemo
vl substitute ifactors = i.factors
splitsample, gen(sample) nsplit(2)
label define svalues 1 "Training" 2 "Testing"
label values sample svalues
label data "Midline lasso with vl"
save "$dfinal/midline_vl", replace
// Split sample to training and testing
clear all
set maxvar 32767
use "$dfinal/midline_vl", replace
vl rebuild
set seed 1234
// Fitting lasso (Primary Outcome: Hope_Index_std_v2_M)
lasso linear Agency_Index_std_M ($istudentdemo) $ifactors $vlcontinuous if sample == 1, rseed(1234)
I followed all the example in the STATA documentation but for some reason, it is not working. It keeps repeating this command:
. // Fitting lasso (Primary Outcome: Hope_Index_std_v2_M)
. lasso linear Agency_Index_std_M ($istudentdemo) $ifactors $vlcontinuous if sample == 1, rseed(1234)
note: 1.male omitted because of collinearity.
note: 1.AspEd3_binary_division4 omitted because of collinearity.
note: 1.AspJob2_binary_selfemp omitted because of collinearity.
note: 1.AspJob6_binary_selfemp omitted because of collinearity.
no observations
r(2000);
Can you tell me what is wrong?
describe
vl set, categorical(7) uncertain(19) dummy
vl list vluncertain
// Choose baseline variables that make sense
// We then have to classify one by one all the vluncertain to each category: cont/
vl move (ID4 SRH7 CapAsp1 CapAsp2 CapAsp3 CapAsp4 CapAsp5 Agency1 Agency2 Agency3 Agency4 Agency5 Path1 Path2 Path3 Path4 Path5 WindowEd1_s1 WindowEd1_s2 WindowEd1_s3 WindowEd1_s4 WindowEd2_s1 WindowEd2_s2 WindowEd2_s3 WindowEd2_s4 WindowEd3_s1 WindowEd3_s2 WindowEd3_s3 WindowEd3_s4 WindowEd4_s1 WindowEd4_s2 WindowEd4_s3 WindowEd4_s4 WindowEd5_s1 WindowEd5_s2 WindowEd5_s3 WindowEd5_s4 WindowEd6_s1 WindowEd6_s2 WindowEd6_s3 WindowEd6_s4 AspJob2 AspJob3 AspJob6 AspJob7 AspFam5 risk1 age partner_age) vlcontinuous
// The variables that we are interested to predict should be moved to other
vl move (Agency_Index_std_M Path_Index_std_M CapAsp_Index_std_v2_M Hope_Index_std_v2_M AspEd9_M div1_cod_M info_treatment_06_M AspJob2_binary_M AspJob3_binary_M AspJob8_binary_M AspFam4_M AspFam6_M ess_total_M ess_total1_M se1_M se2_M se3_M se4_M wbscore_M pss_score_M) vlother
// Subdivision for our models
* We need further subdivision for our models.
* The system-defined variable lists are good for a general division of variables. But we need further
* subdivision for our models. We have four demographic variables, which are all categorical, but we
* want them included in all lasso models. So we create a user-defined variable list containing these variables.
vl create studentdemo = (treated infotreat_M age female father_mother old_sib tasaf male WindowEd1_binary WindowEd2_binary WindowEd3_binary WindowEd4_binary P_BelEd1_binary P_BelJob2_binary S_BelEd1_binary S_BelEd2_binary S_BelJob5_binary AspEd1_binary AspEd2_binary AspEd3_binary_division1 AspEd3_binary_division2 AspEd3_binary_division3 AspEd3_binary_division4 AspJob1_binary AspJob2_binary_HH AspJob2_binary_wage AspJob2_binary_self AspJob6_binary_HH AspJob6_binary_wage AspJob6_binary_self)
vl create factors = vldummy + vlcategorical
vl modify factors = factors - studentdemo
// We turn the vl substitute command allows us to apply factor-variable operators to a variable list. We turn the variables in demographics and factors into factor variables.
vl substitute istudentdemo = i.studentdemo
vl substitute ifactors = i.factors
splitsample, gen(sample) nsplit(2)
label define svalues 1 "Training" 2 "Testing"
label values sample svalues
label data "Midline lasso with vl"
save "$dfinal/midline_vl", replace
// Split sample to training and testing
clear all
set maxvar 32767
use "$dfinal/midline_vl", replace
vl rebuild
set seed 1234
// Fitting lasso (Primary Outcome: Hope_Index_std_v2_M)
lasso linear Agency_Index_std_M ($istudentdemo) $ifactors $vlcontinuous if sample == 1, rseed(1234)
I followed all the example in the STATA documentation but for some reason, it is not working. It keeps repeating this command:
. // Fitting lasso (Primary Outcome: Hope_Index_std_v2_M)
. lasso linear Agency_Index_std_M ($istudentdemo) $ifactors $vlcontinuous if sample == 1, rseed(1234)
note: 1.male omitted because of collinearity.
note: 1.AspEd3_binary_division4 omitted because of collinearity.
note: 1.AspJob2_binary_selfemp omitted because of collinearity.
note: 1.AspJob6_binary_selfemp omitted because of collinearity.
no observations
r(2000);
Can you tell me what is wrong?