Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Manually calculation of the predicted values after LASSO regression

    Dear all,

    I am using Stata/SE 16.1 for Mac and I have created an imaginable dataset to illustrate my problem. The dataset includes:
    • v_cont: a continuous variable
    • v_bin_1 - v_bin_8, and outcome: several binary (0/1) variable
    • v_categ: one categorical variable with the value 1, 2, and 3

    I set up and run a lasso regression and predicted the values using the following commands.
    Code:
    vl set v_bin_1-v_bin_8 v_categ v_cont, categorical(3) uncertain(0)
    vl substitute ifactors = i.vlcategorical
    lasso logit outcome $ifactors $vlcontinuous
    predict p_predicted

    As I plan to modify the coefficients to build a score, i.e. multiply by 10 and round, I was wondering how the - predict - command works. Below, the output I got with the two Stata commands is shown:
    Code:
    estimates store cv
    lassocoef cv, display(coef)
    cv
    2.v_categ 9.619903
    _cons -.3400077
    Legend:
    b - base level
    e - empty cell
    o - omitted



    I tried to calculate the predicted values "manually" using the following commands:
    Code:
    gen ln_odds_of_outcome = 9.619903 * (v_categ==2) - .3400077
    gen p_manual = exp(ln_odds_of_outcome)/(1+exp(ln_odds_of_outcome))
    However, the variables p_predicted and p_manual strongly differ if v_categ != 2. Can you help to find my error?

    Thank you in advance.
    Martin

  • #2
    Well, the mystery is why they agree even when v_categ == 2. When you use -predict- after -lasso-, unless you specify otherwise, the standardized coefficients are used. When you use -lassocoef, display(coef)- you get, by default, the penalized coefficients. These are different from the standardized ones, so when you use them to calculate predicted values, you get different results. You can override these defaults with appropriate options or sub-options. For your purposes, I think the penalized coefficients are the ones you want.

    Comment


    • #3
      Thank you so much!

      Comment

      Working...
      X