LASSO and subsets of variables

Alecia Cassidy

Join Date: Sep 2014
Posts: 58

LASSO and subsets of variables

25 Apr 2017, 12:54

I am wondering if someone can explain to me why the LASSO sometimes will choose no variables if there are many candidate variables in a model and choose more when fewer variables (a subset of the former) are included. I don't understand how this could happen.

Here's an example with data:
If I run:

lars en a3 e3 l3 d2 rs sizeavg, a(lasso)

Then LASSO chooses a3, e3, rs, and sizeavg.

But adding age to the candidates, using:

lars en a3 e3 l3 d2 rs sizeavg age, a(lasso)

The LASSO chooses no candidate variables.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(a3 e3 l3 d2 age rs sizeavg en)
35   11 18.6 6   4 1   736  47021.41
49 11.3   22 6   6 1   670  82373.56
30   11 13.7 6   5 1   736  58144.61
31 11.3 14.7 6   5 1   606  47947.14
 8    5   15 1   3 0   504  28261.64
30    8   13 8   5 1   745  32253.55
30   12   25 6   6 1   575  70050.37
11 11.7  4.7 3  15 1   553  67029.66
30   12   15 6   5 1   755  72386.39
33 10.4  5.6 6   5 1   692  58593.08
28 10.6  8.8 6   4 1   755  96065.57
31 11.3 12.5 6   6 1   660  96112.35
38   11   13 6   5 1   598  88737.84
22   11   16 8   4 1   811 75900.375
31 10.2   12 6   4 1   737  85583.16
38 11.3   18 6   6 1   670  52297.74
22 10.6   15 6   3 1   850  104180.3
34 12.5 10.6 8 4.5 1   696  38965.18
24  9.6  6.6 8   5 1   593  40038.34
30 11.6    9 6   6 1   755 123999.98
30   11   14 6   5 1   648  40243.81
22   11   11 6   4 1   696  58989.97
26 12.5  7.6 8   3 1 690.5   81369.8
38 10.6   19 6   5 1   667  59163.72
30    8   13 8   5 1   677  76445.72
30   11   20 6   5 1   710  58538.94
49 10.2 10.6 6   6 1   688  29591.23
30    8   19 6   6 1   736 68110.875
38    8   17 8   5 1   711  82417.18
30   12   18 8   5 1   774  78911.65
27 12.8 11.7 6   5 1   625  25993.42
49  8.5 15.9 6   6 1   788 27619.146
31 10.4  6.9 6   6 1   692 14386.373
38   11 12.7 8   5 1   688 13170.033
30   12   13 6   6 1   760 70203.625
38    8 22.5 8   5 1   677  57646.82
38  7.9 12.9 8   5 1   667  45465.28
30   11   14 8   4 1   738  56925.07
49 10.2 14.6 6   6 1   688 23870.637
22   11 12.5 6   4 1   662  92508.52
30   11   15 8   5 1   763  53747.79
36   14   17 8   5 1   600 38685.676
30 11.7   18 6   5 1   667  80511.63
30   11   16 8   5 1   731 23874.104
38   11   22 6   5 1   606  44018.17
30    8   20 6   5 1   732  43041.18
30   11   15 6   3 1   639 38779.184
end

Last edited by Alecia Cassidy; 25 Apr 2017, 12:57.

Tags: None

Jeff Wooldridge

Join Date: Apr 2014
Posts: 2175

26 Apr 2017, 05:29

I'm curious about this, too. At a minimum, one might expect a3 to be kept, as it is statistically significant in a standard regression:

Code:

. reg en a3 e3 l3 d2 rs sizeavg age

      Source |       SS           df       MS      Number of obs   =        47
-------------+----------------------------------   F(7, 39)        =      1.73
       Model |  6.9671e+09         7   995306268   Prob > F        =    0.1310
    Residual |  2.2472e+10        39   576195739   R-squared       =    0.2367
-------------+----------------------------------   Adj R-squared   =    0.0997
       Total |  2.9439e+10        46   639973428   Root MSE        =     24004

------------------------------------------------------------------------------
          en |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          a3 |  -1134.055   526.9997    -2.15   0.038    -2200.012   -68.09726
          e3 |   975.5635   2677.129     0.36   0.718    -4439.442    6390.569
          l3 |   640.2008   849.8749     0.75   0.456    -1078.834    2359.235
          d2 |  -5508.879   3942.795    -1.40   0.170    -13483.93    2466.177
          rs |   69497.98   46162.81     1.51   0.140    -23875.12    162871.1
     sizeavg |    94.1841   62.70715     1.50   0.141    -32.65308    221.0213
         age |  -1630.529   2688.831    -0.61   0.548    -7069.203    3808.146
       _cons |  -14215.07   50651.83    -0.28   0.780    -116668.1    88237.91
------------------------------------------------------------------------------

When I've tried lars on larger panel data sets with many controls, Lasso returns sensible results. I suppose like all estimation methods, it can produce surprising results for a single draw with a small sample size. But I hope an expert weighs in.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35730
#3

26 Apr 2017, 06:34

Interesting question. I am not the expert on lasso that Jeff wants; my expertise is epsilon above zero in that I have heard of it. But I can draw graphs.

A scatter plot matrix you can draw for yourself (note the tiny trick of naming the response last so that convention is respected and it's on the vertical axis for the bottom row). It suggests little juice in this lemon to me. I note one indicator variable. I don't know whether that could be awkward, especially with one outlier.

A multiple quantile plot can be drawn to show individual distributions. For more on the command in question, see http://www.stata-journal.com/sjpdf.h...iclenum=gr0053

Code:

. graph matrix a3 e3 l3 d2 rs sizeavg en . multqplot a3 e3 l3 d2 rs sizeavg en

Last edited by Nick Cox; 26 Apr 2017, 06:39.
1 like
Comment
Alecia Cassidy

Join Date: Sep 2014

Posts: 58
#4

26 Apr 2017, 09:35

Hi Jeff and Nick,

Thanks so much for your replies. This is not my full dataset; it's a smaller one that I'm playing around with to learn more about LASSO.

Nick points out that the dummy variable rs might be the culprit. I probably wouldn't include rs as a candidate variable in real life since it has too little variation to be useful. Indeed, when I add a little random noise to rs, the same set of variables is chosen when age is included in the candidate list as when it is excluded. I'm still a bit perplexed as to why this would be a problem for LASSO, though.

Alecia
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#5

27 Apr 2017, 07:03

A chink in the LASSO armor? I wonder if it can be reproduced on a larger data set.
1 like
Comment
Alecia Cassidy

Join Date: Sep 2014

Posts: 58
#6

27 Apr 2017, 07:51

Originally posted by Jeff Wooldridge View Post

A chink in the LASSO armor? I wonder if it can be reproduced on a larger data set.

I hope it cannot be reproduced on a larger data set. I'll definitely let you know if I reproduce it on my larger data set.
Comment

Jeff Wooldridge

Join Date: Apr 2014
Posts: 2175

28 Apr 2017, 08:13

Alecia: I generated some data and I was able to reproduce your puzzle in an N = 50 case with 4 regressors. Lasso on 2 or 3 regressors picks out a single variable, lasso on all 4 chooses none. But it seems rare. And I could not find conflict with N = 500, but I did not search over a lot different data draws.

Code:

. lars y x1 x2, a(lasso)
NOTE: Deleting all matrices
          ade[3,2]
           mu[1,1]
        meanx[1,2]
           R2[1,3]
          RSS[1,3]
           r2[1,1]
          rss[1,1]
           cp[1,3]
        normx[1,2]
         beta[3,2]
        sbeta[3,2]
        error[1,1]

sbeta[3,2]
           c1         c2
r1          0          0
r2  1.2291965          0
r3  1.5592557  -.3300593

Algorithm is lasso

Cp, R-squared and Actions along the sequence of models

+-------------------------------------+
| Step |      Cp     | R-square |  Action |
|------+-------------+----------+-----|
|    1 |     1.1416  |  0.0000  |     | 
|    2 |     1.1375 *|  0.0408  | +x1 | 
|    3 |     3.0000  |  0.0436  | +x2 | 
+-------------------------------------+
* indicates the smallest value for Cp

The coefficient values for the minimum Cp

+-------------------------+
| Variable |  Coefficient |
|----------+--------------|
| x1       |       0.2060 |
+-------------------------+

. lars y x1 x2 x3 x4, a(lasso)
NOTE: Deleting all matrices
          ade[5,4]
           mu[1,1]
        meanx[1,4]
           R2[1,5]
          RSS[1,5]
           r2[1,1]
          rss[1,1]
           cp[1,5]
        normx[1,4]
         beta[5,4]
        sbeta[5,4]
        error[1,1]

sbeta[5,4]
            c1          c2          c3          c4
r1           0           0           0           0
r2           0           0           0   .94302486
r3   .16056008           0           0   1.1035849
r4   .16164226           0    .0015528   1.1041314
r5   .43678751  -5.2221903   5.0173428   1.4016385

Algorithm is lasso

Cp, R-squared and Actions along the sequence of models

+-------------------------------------+
| Step |      Cp     | R-square |  Action |
|------+-------------+----------+-----|
|    1 |     1.9570 *|  0.0000  |     | 
|    2 |     1.9970  |  0.0392  | +x4 | 
|    3 |     3.7335  |  0.0445  | +x1 | 
|    4 |     5.7319  |  0.0445  | +x3 | 
|    5 |     5.0000  |  0.0992  | +x2 | 
+-------------------------------------+
* indicates the smallest value for Cp

The coefficient values for the minimum Cp

+-------------------------+
| Variable |  Coefficient |
|----------+--------------|
+-------------------------+

. reg y x1 x2 x3 x4

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(4, 45)        =      1.24
       Model |  4.93730601         4   1.2343265   Prob > F        =    0.3079
    Residual |   44.821115        45  .996024778   R-squared       =    0.0992
-------------+----------------------------------   Adj R-squared   =    0.0192
       Total |   49.758421        49  1.01547798   Root MSE        =    .99801

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .0732172   .4536781     0.16   0.873    -.8405374    .9869717
          x2 |  -.2209018    .136094    -1.62   0.112    -.4950092    .0532057
          x3 |   .1136642   .0715457     1.59   0.119    -.0304363    .2577646
          x4 |    .214223   .4177185     0.51   0.611    -.6271053    1.055551
       _cons |   1.443697   .3003661     4.81   0.000     .8387284    2.048665
------------------------------------------------------------------------------

Comment

Alecia Cassidy

Join Date: Sep 2014
Posts: 58

28 Apr 2017, 08:45

Originally posted by Jeff Wooldridge View Post

Code:

. lars y x1 x2, a(lasso)
NOTE: Deleting all matrices
ade[3,2]
mu[1,1]
meanx[1,2]
R2[1,3]
RSS[1,3]
r2[1,1]
rss[1,1]
cp[1,3]
normx[1,2]
beta[3,2]
sbeta[3,2]
error[1,1]

sbeta[3,2]
c1 c2
r1 0 0
r2 1.2291965 0
r3 1.5592557 -.3300593

Algorithm is lasso

Cp, R-squared and Actions along the sequence of models

+-------------------------------------+
| Step | Cp | R-square | Action |
|------+-------------+----------+-----|
| 1 | 1.1416 | 0.0000 | |
| 2 | 1.1375 *| 0.0408 | +x1 |
| 3 | 3.0000 | 0.0436 | +x2 |
+-------------------------------------+
* indicates the smallest value for Cp

The coefficient values for the minimum Cp

+-------------------------+
| Variable | Coefficient |
|----------+--------------|
| x1 | 0.2060 |
+-------------------------+

. lars y x1 x2 x3 x4, a(lasso)
NOTE: Deleting all matrices
ade[5,4]
mu[1,1]
meanx[1,4]
R2[1,5]
RSS[1,5]
r2[1,1]
rss[1,1]
cp[1,5]
normx[1,4]
beta[5,4]
sbeta[5,4]
error[1,1]

sbeta[5,4]
c1 c2 c3 c4
r1 0 0 0 0
r2 0 0 0 .94302486
r3 .16056008 0 0 1.1035849
r4 .16164226 0 .0015528 1.1041314
r5 .43678751 -5.2221903 5.0173428 1.4016385

Algorithm is lasso

Cp, R-squared and Actions along the sequence of models

+-------------------------------------+
| Step | Cp | R-square | Action |
|------+-------------+----------+-----|
| 1 | 1.9570 *| 0.0000 | |
| 2 | 1.9970 | 0.0392 | +x4 |
| 3 | 3.7335 | 0.0445 | +x1 |
| 4 | 5.7319 | 0.0445 | +x3 |
| 5 | 5.0000 | 0.0992 | +x2 |
+-------------------------------------+
* indicates the smallest value for Cp

The coefficient values for the minimum Cp

+-------------------------+
| Variable | Coefficient |
|----------+--------------|
+-------------------------+

. reg y x1 x2 x3 x4

Source | SS df MS Number of obs = 50
-------------+---------------------------------- F(4, 45) = 1.24
Model | 4.93730601 4 1.2343265 Prob > F = 0.3079
Residual | 44.821115 45 .996024778 R-squared = 0.0992
-------------+---------------------------------- Adj R-squared = 0.0192
Total | 49.758421 49 1.01547798 Root MSE = .99801

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .0732172 .4536781 0.16 0.873 -.8405374 .9869717
x2 | -.2209018 .136094 -1.62 0.112 -.4950092 .0532057
x3 | .1136642 .0715457 1.59 0.119 -.0304363 .2577646
x4 | .214223 .4177185 0.51 0.611 -.6271053 1.055551
_cons | 1.443697 .3003661 4.81 0.000 .8387284 2.048665
------------------------------------------------------------------------------

Hi Jeff,
In your case, were any of the variables binary?

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#9

28 Apr 2017, 09:35

Actually, I meant to make one binary, but it looks like I make it fractional. One of them is discrete -- x2, I think -- with a binomial distribution from 0 to 20.
1 like
Comment
Alecia Cassidy

Join Date: Sep 2014

Posts: 58
#10

28 Apr 2017, 10:29

Originally posted by Jeff Wooldridge View Post

Actually, I meant to make one binary, but it looks like I make it fractional. One of them is discrete -- x2, I think -- with a binomial distribution from 0 to 20.

Wow, so I guess it might be a more general problem with using LASSO in small samples. I'll be more careful in the future!
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#11

28 Apr 2017, 11:20

By the way, please say hello to Traviss for me.
Comment
Alecia Cassidy

Join Date: Sep 2014

Posts: 58
#12

28 Apr 2017, 12:36

Originally posted by Jeff Wooldridge View Post

By the way, please say hello to Traviss for me.

Traviss says hi!
Comment
Alecia Cassidy

Join Date: Sep 2014

Posts: 58
#13

20 May 2017, 09:09

Found this in Belloni, Chernozhukov and Hansen, JEP 2014 (p. 40):

"Intuitively, reliably distinguishing true predictive power from spurious association
becomes more difficult as more variables are considered. This intuition can be
seen in the theory of high-dimensional variable selection methods, and the methods
work best in simulations when selection is done over a collection of variables that
is not overly extensive. It is therefore important that some persuasive economic
intuition exists to produce a carefully chosen, well-targeted set of variables to be
selected over even when using automatic variable selection methods."

This could be why LASSO might select nothing when given more variables.
Comment
Haron Smith

Join Date: May 2017

Posts: 4
#14

25 May 2017, 01:22

Originally posted by Alecia Cassidy View Post

Found this in Belloni, Chernozhukov and Hansen, JEP 2014 (p. 40):

"Intuitively, reliably distinguishing true predictive power from spurious association
becomes more difficult as more variables are considered. This intuition can be
seen in the theory of high-dimensional variable selection methods, and the methods
work best in simulations when selection is done over a collection of variables that
is not overly extensive. It is therefore important that some persuasive economic
intuition exists to produce a carefully chosen, well-targeted set of variables to be
selected over even when using automatic variable selection methods."

This could be why LASSO might select nothing when given more variables.

Hello Alecia,

I have the same problem like you.
Actually I got over 40 explanatory variables and the Cp becomes negative!
What would you suggest then? Elimination of some x? Based on which criteria would you eliminate them?

Best regards,
David
Comment
Alecia Cassidy

Join Date: Sep 2014

Posts: 58
#15

25 May 2017, 07:48

Hi David,
I'm not really qualified to give you advice on this, so I hope someone more qualified will chime in. That said, it looks like eliminating some of the candidate variables based on economic intuition is what Belloni, Chernozhukov and Hansen would recommend.
Good luck!
Alecia
Comment

Announcement

LASSO and subsets of variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment