General Lasso Confusion

Ethan Nourbash

Join Date: Aug 2021

Posts: 3
#1

General Lasso Confusion

27 Aug 2021, 09:03

I ran a lasso regression on my data and it does not reach the r^2 that a regression of my own design achieved. Does stata's lasso tune based on r^2? Ideally I would like it to tune optimizing the adjusted r^2, but I do not know what feature to use.
Tags: lasso
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

27 Aug 2021, 09:27

For details on the lasso command and the methodology behind it, see The Stata Lasso Reference Manual PDF included in your Stata installation and accessible through Stata's Help menu.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#3

27 Aug 2021, 10:14

Dear Ethan

As William suggested, you should really learn more about lasso (or any method) before using it. The R2 is an in-sample measure of goodness-of-fit which is generally not very interesting. For example, if you keep adding variables to your model the R2 will go to 1, but the model will badly overfit the data and will become useless. The goal of lasso is to select and estimate a model that predicts well out of sample, something much more interesting (and difficult) than getting a high R2.

Best wishes,

Joao
1 like
Comment
Jackson Monroe

Join Date: Jul 2019

Posts: 60
#4

27 Aug 2021, 13:18

Am I mistaken that lasso is fit via cross-validation using out-of-sample MSE, and that that is a 1:1 function of R^2? In other words, minimizing OOS MSE will maximize OOS R^2, right? The default is doing what Ethan wants.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#5

28 Aug 2021, 06:10

I may have missed something, but I do not think #1 referred to an out-of-sample R2, did it? Of course, OoS R2 and in-sample R2 are two very different things, and I guess that is the problem.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

28 Aug 2021, 06:24

Based on my own research of out of sample predictability cited below, I think that the in-sample and out-of-sample R-squares are absolutely different statistics, and there is no known relationship between them. So I do not think that they are one to one function of each other at all.
Kolev, Gueorgui I., and Rasa Karapandza. "Out-of-sample equity premium predictability and sample split–invariant inference." Journal of Banking & Finance 84 (2017): 188-201.

Originally posted by Jackson Monroe View Post

Am I mistaken that lasso is fit via cross-validation using out-of-sample MSE, and that that is a 1:1 function of R^2? In other words, minimizing OOS MSE will maximize OOS R^2, right? The default is doing what Ethan wants.
Comment
Jackson Monroe

Join Date: Jul 2019

Posts: 60
#7

28 Aug 2021, 11:11

Originally posted by Joao Santos Silva View Post

I may have missed something, but I do not think #1 referred to an out-of-sample R2, did it? Of course, OoS R2 and in-sample R2 are two very different things, and I guess that is the problem.

Ethan asked, "Does stata's lasso tune based on r^2?" Considering OOS R^2 is an R^2, if not the typical one, I would say the answer is yes because MSE is 1 to 1 with R^2. If CV methods do any tuning it will be on OOS data, so his reference to R^2 led me to think he was referring to OOS R^2. Perhaps I put too much gloss on an otherwise unclear question.

To #6, Joro I agree that in and out-of-sample R^2 don't need to be related, I just assumed #1 was asking about OOS because of the CV nature of the Lasso in Stata. Interesting paper btw, I didn't follow the notation but it seemed reasonable that OOS R^2 was negative on portfolio data, too much noise in the predictions. My general point was MSE is 1 to 1 with R^2 (as it is traditionally defined), and Lasso does indeed fit on OOS MSE.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

28 Aug 2021, 13:22

I think indeed you gave a lot of content to a random shot by OP :P.

You are right: if the dependent variable is the same, R-squared = 1 - MSE/(MS Dependent variable deviation from its mean), so if we do not change the dependent variable they are indeed one to one mapping of each other.

Originally posted by Jackson Monroe View Post

Ethan asked, "Does stata's lasso tune based on r^2?" Considering OOS R^2 is an R^2, if not the typical one, I would say the answer is yes because MSE is 1 to 1 with R^2. If CV methods do any tuning it will be on OOS data, so his reference to R^2 led me to think he was referring to OOS R^2. Perhaps I put too much gloss on an otherwise unclear question.

To #6, Joro I agree that in and out-of-sample R^2 don't need to be related, I just assumed #1 was asking about OOS because of the CV nature of the Lasso in Stata. Interesting paper btw, I didn't follow the notation but it seemed reasonable that OOS R^2 was negative on portfolio data, too much noise in the predictions. My general point was MSE is 1 to 1 with R^2 (as it is traditionally defined), and Lasso does indeed fit on OOS MSE.
1 like
Comment

Announcement

General Lasso Confusion

Comment

Comment

Comment

Comment

Comment

Comment

Comment