Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lasso Regression for logit model

    Hello,

    I am attempting to build a model for mortality prediction using lasso regression. So far I have separated variables into continuous and categorical subsets and have split the data. However, when it comes to attempting the actual lasso regression, an error occurs. My data set has around 400 observations and 190 variables. I have run the following codes so far:

    *lasso regression steps
    *dividing variables into categorical and continuous subsets
    vl set, categorical(6) uncertain(0) dummy
    vl list vlcategorical
    vl list vlother
    vl move (s1 s2 s3 s4 s5) vlother
    vl list vldummy
    vl move (mv2 mv3 mv4 mv5 mv6 mv7 mv8 mv9 mv10 mv11 mv12 mv13 mv14) vlother
    vl list vlcontinuous
    vl list vldummy
    vl list vlcategorical
    vl create factors = vldummy + vlcategorical
    vl substitute ifactors = i.factors
    label data "Survey data with vl"
    save survey_vl
    *splitting sample into Training and Testing
    set seed 1234
    splitsample, generate(sample) nsplit(2)
    label define svalues 1 "Training" 2 "Testing"
    label values sample svalues
    lasso logit mortalityd $ifactors $vlcontinuous if sample == 1, rseed(1234)
    *the number of observations is less than the cross-validation folds r(198);

    It is when I ran the last code that the error occurred. I do not understand what the error means and I do not understand what cross-validation is either. I would really appreciate some help in the understanding of this and understanding how I could rectify it. I am using Stata 17 from my university portal.

    Thank you!
    Barsa
    Last edited by Barsa Saha; 01 Nov 2021, 08:31.

  • #2
    Originally posted by Barsa Saha View Post
    Hello,

    I am attempting to build a model for mortality prediction using lasso regression. So far I have separated variables into continuous and categorical subsets and have split the data. However, when it comes to attempting the actual lasso regression, an error occurs. My data set has around 400 observations and 190 variables. I have run the following codes so far:

    *lasso regression steps
    *dividing variables into categorical and continuous subsets
    vl set, categorical(6) uncertain(0) dummy
    vl list vlcategorical
    vl list vlother
    vl move (s1 s2 s3 s4 s5) vlother
    vl list vldummy
    vl move (mv2 mv3 mv4 mv5 mv6 mv7 mv8 mv9 mv10 mv11 mv12 mv13 mv14) vlother
    vl list vlcontinuous
    vl list vldummy
    vl list vlcategorical
    vl create factors = vldummy + vlcategorical
    vl substitute ifactors = i.factors
    label data "Survey data with vl"
    save survey_vl
    *splitting sample into Training and Testing
    set seed 1234
    splitsample, generate(sample) nsplit(2)
    label define svalues 1 "Training" 2 "Testing"
    label values sample svalues
    lasso logit mortalityd $ifactors $vlcontinuous if sample == 1, rseed(1234)
    *the number of observations is less than the cross-validation folds r(198);

    It is when I ran the last code that the error occurred. I do not understand what the error means and I do not understand what cross-validation is either. I would really appreciate some help in understanding this and understanding how I could rectify it. I am using Stata 17 from my university portal.

    Thank you!
    Barsa

    Hello,

    After some research, I understand what cross-validation is and may know what the issue is. I would like to know how I can run a 5-fold cross-validation in the lasso logit regression instead. I would really appreciate some help with this!

    Kind regards,
    Barsa
    Last edited by Barsa Saha; 03 Nov 2021, 05:13.

    Comment

    Working...
    X