Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Do I need to standardize a mixed data before doing lasso?

    Hello people,

    I have a great data of mixed variables: binary variables, quantitative variables and categorical variables, multiple level variables (e.g: how important is your family?: 1: very important, 2: important, 3; not important)
    I would like to use lasso to choose the suitable variables for regression. I was advised to standardize all the variables before doing that. However, I am still wondering how should I standardize binary variables or multiple level variables? Is it possible?
    Besides, I only know the command: egen float (newvar) = std(var), mean(0) std(1) , to standardize one variable. How could I do this to a large number of variables on STATA?
    Could you please kindly suggest me some more options?

    Thank you very much!

  • #2
    Hi Van,

    Yes, it is recommended you standardize all of your variables before entering them into the LASSO. LASSO places a penalty on the magnitude of the coefficients, and so variables need to be on the same scale or else certain variables may dominate the LASSO. Here is an example for a simple for loop that you can adapt to work for your data.
    Code:
    set obs 100
    gen x1 = rnormal(0, 1)
    gen x2 = rbinomial(1, 0.5)
    gen x3 = rnormal(0, 1)
    
    foreach var of varlist x1 x2 x3 {
      egen `var'_std = std(`var'), mean(0) std(1)
      sum `var' `var'_std
     }
    You can replace x1-x3 with your own variables that you would like to standardize. Hope this helps.

    Comment


    • #3
      There are some occasions where you instead want leave the variables alone. For example, say you have indicator variables indicating category membership (like brand or geographic region). In this case, the standardization would put more penalty on common categories and less penalty on rare categories, which is often undesirable.

      Comment


      • #4
        If you are using lassopack, please note that lasso2 and cvlasso standardize the predictors by default. If you use rlasso, the standardization is also built-in. So you do not have to standardize the predictors manually. You can disable standardization but this is not recommended unless you know what you are doing. See help files for more info. There are a few paragraphs on standardization.
        http://statalasso.github.io/

        Comment


        • #5
          Achim Ahrens : I am using lassopack (lasso2 command). That means I do not need to standardize all the predictors?
          Just use the command:
          lasso2 depvar indepvar, lic(ebic)
          without any previous steps?
          Thank you very much for your help!

          Comment


          • #6
            Yes. lasso2 standardizes all predictors by default, and returns the coefficients in original units.
            http://statalasso.github.io/

            Comment

            Working...
            X