Dear Statalist,
I use the ivlasso command (part of the pdslasso package by Achim Ahrens,Christian Hansen, and Mark Schaffer) to estimate a regression with high-dimensional exogenous controls and a scalar endogenous variable (continuous) as well as a scalar instrumental variable (a dummy).
The command returns results using either Lasso or Post-Lasso OLS in the different model selection and estimation steps (besides results using Lasso only for model selection). In my understanding of the methodology, I expected Lasso and Post-Lasso to select the same set of covariates, as Post-Lasso simply applies OLS to the covariates selected by Lasso. However, in practice, these two estimators often select different covariates, which at times results in quite different final estimates. From the regression output, it seems like this divergence occurs when optimal instruments are created. I looked at the underlying literature but failed to understand why this difference can occur.
Could anyone shed some light on this for me or point me toward a relevant paper that addresses this issue? Moreover, how would you interpret or deal with Lasso and Post-Lasso resulting in quite different estimates?
Many thanks for any help,
Kevin
I use the ivlasso command (part of the pdslasso package by Achim Ahrens,Christian Hansen, and Mark Schaffer) to estimate a regression with high-dimensional exogenous controls and a scalar endogenous variable (continuous) as well as a scalar instrumental variable (a dummy).
The command returns results using either Lasso or Post-Lasso OLS in the different model selection and estimation steps (besides results using Lasso only for model selection). In my understanding of the methodology, I expected Lasso and Post-Lasso to select the same set of covariates, as Post-Lasso simply applies OLS to the covariates selected by Lasso. However, in practice, these two estimators often select different covariates, which at times results in quite different final estimates. From the regression output, it seems like this divergence occurs when optimal instruments are created. I looked at the underlying literature but failed to understand why this difference can occur.
Could anyone shed some light on this for me or point me toward a relevant paper that addresses this issue? Moreover, how would you interpret or deal with Lasso and Post-Lasso resulting in quite different estimates?
Many thanks for any help,
Kevin
Comment