I noticed that the Postselection coefficients I was getting after running -lasso- differed from the OLS estimates using the LASSO selected regressors when I specified a cluster variable. When I don't specify a cluster variable, they match as expected. I didn't expect clustering to cause them to not match, so I was wondering if anyone knew why that would happen? Is this just a precision issue?
Here's a toy example. I just compare the postselection estimate of x1 with the OLS estimate of x1. With and without some cluster variable.
Here's the output:
Here's a toy example. I just compare the postselection estimate of x1 with the OLS estimate of x1. With and without some cluster variable.
Code:
qui forv s = 1/10 {
* make up some data
clear all
set seed `s'
set obs 1000
g y = runiform() /* outcome */
g clust = int(runiform(1,20))
forv z=1/10 {
g x`z' = int(runiform(1,5)) /* potential predictors x* */
}
replace y = y + 5*x1 + .5*x2
foreach z in "WITH_CLUSTERING" "WITHOUT_CLUSTERING" {
if "`z'"=="WITH_CLUSTERING" lasso linear y x*, cluster(clust)
if "`z'"=="WITHOUT_CLUSTERING" lasso linear y x*
loc beta_lasso = e(b_postselection)[1,"x1"]
reg y `e(allvars_sel)'
loc beta_ols = _b[x1]
loc dif = `beta_lasso' - `beta_ols'
noi di "seed=`s'; `z'; lasso estimate=`beta_lasso'; OLS estimate=`beta_ols'; difference=`dif' "
}
}
Code:
seed=1; WITH_CLUSTERING; lasso estimate=4.998759942514343; OLS estimate=4.999203211772241; difference=-.0004432692578984 seed=1; WITHOUT_CLUSTERING; lasso estimate=4.999203211772241; OLS estimate=4.999203211772241; difference=0 seed=2; WITH_CLUSTERING; lasso estimate=4.996668209099026; OLS estimate=4.997709674975988; difference=-.0010414658769626 seed=2; WITHOUT_CLUSTERING; lasso estimate=4.997709674975988; OLS estimate=4.997709674975988; difference=0 seed=3; WITH_CLUSTERING; lasso estimate=5.005329955268507; OLS estimate=5.004821214398009; difference=.0005087408704973 seed=3; WITHOUT_CLUSTERING; lasso estimate=5.004821214398009; OLS estimate=5.004821214398009; difference=0 seed=4; WITH_CLUSTERING; lasso estimate=4.994085712037184; OLS estimate=4.995145877055269; difference=-.0010601650180853 seed=4; WITHOUT_CLUSTERING; lasso estimate=4.995145877055269; OLS estimate=4.995145877055269; difference=0 seed=5; WITH_CLUSTERING; lasso estimate=5.005353533242829; OLS estimate=5.002485671712456; difference=.0028678615303726 seed=5; WITHOUT_CLUSTERING; lasso estimate=5.002485671712456; OLS estimate=5.002485671712456; difference=0 seed=6; WITH_CLUSTERING; lasso estimate=4.987126930711644; OLS estimate=4.986695164677461; difference=.0004317660341826 seed=6; WITHOUT_CLUSTERING; lasso estimate=4.986695164677461; OLS estimate=4.986695164677461; difference=0 seed=7; WITH_CLUSTERING; lasso estimate=5.005207996913146; OLS estimate=5.00505189238087; difference=.0001561045322767 seed=7; WITHOUT_CLUSTERING; lasso estimate=5.00505189238087; OLS estimate=5.00505189238087; difference=0 seed=8; WITH_CLUSTERING; lasso estimate=4.999478604561925; OLS estimate=5.00041089181657; difference=-.0009322872546456 seed=8; WITHOUT_CLUSTERING; lasso estimate=5.00041089181657; OLS estimate=5.00041089181657; difference=0 seed=9; WITH_CLUSTERING; lasso estimate=5.009298907208509; OLS estimate=5.009607632973197; difference=-.0003087257646888 seed=9; WITHOUT_CLUSTERING; lasso estimate=5.009607632973197; OLS estimate=5.009607632973197; difference=0 seed=10; WITH_CLUSTERING; lasso estimate=5.021691730222245; OLS estimate=5.023500155842829; difference=-.0018084256205837 seed=10; WITHOUT_CLUSTERING; lasso estimate=5.023500155842829; OLS estimate=5.023500155842829; difference=0

Comment