Dear Statalister,
I currently use lasso2 command in lassopack developed by Ahrens, Hansen and Schaffer (link to lassopack can be found here: https://statalasso.github.io/docs/lassopack/) for variable selection. To be more specific, I am applying lasso2 to choose what factors most relevant in explaining CEO compensation using US data from 2000 to 2014.
I have declared panel data with firm id and year before calling lasso2 command in Stata. The dependent variable is CEO compensation (in log form) and I have 27 regressors on the right-hand side which are lagged one period. I understand that by adding "fe" option, lasso2 accounts for firm fixed effect in the model. However, I am puzzled with year fixed effect since I have obtained different results using either (1) i.year or (2) dummy for each year in the regression.
I use "fe" and "lic(aic)" option to use AIC information criteria to choose penalty level in lasso2 and obtain the following results when trying to account for year fixed effect:
In another approach, I used the option notpen(i.year) or partial(i.year) to not penalise year when calling lasso2 but I got the error message as follows:
3. using partial(i.year): syntax error - 0.year in partial(.) but not in list of regressors
r(198);
4. using notpen(i.year): internal _lassopath error - unpenalized 2002.year missing from selected vars
set tolzero(.) or other tolerances smaller or use partial(.) option
r(499);
Clearly, I have added the dummy for all year or i.year in the list of regressors. I have no idea why I got this error message. I would highly appreciate if you could help me clear the question.
Many thanks for your help.
Below is the sample of my data. I have in total 27 regressors but can not include them all due to linesize limit. In the data sample, ceo_totalpay is the total compensation of CEO (dependent variable), other variables are independent variables, gvkey is firm id and year is the financial year.
Here is my Stata code:
I currently use lasso2 command in lassopack developed by Ahrens, Hansen and Schaffer (link to lassopack can be found here: https://statalasso.github.io/docs/lassopack/) for variable selection. To be more specific, I am applying lasso2 to choose what factors most relevant in explaining CEO compensation using US data from 2000 to 2014.
I have declared panel data with firm id and year before calling lasso2 command in Stata. The dependent variable is CEO compensation (in log form) and I have 27 regressors on the right-hand side which are lagged one period. I understand that by adding "fe" option, lasso2 accounts for firm fixed effect in the model. However, I am puzzled with year fixed effect since I have obtained different results using either (1) i.year or (2) dummy for each year in the regression.
I use "fe" and "lic(aic)" option to use AIC information criteria to choose penalty level in lasso2 and obtain the following results when trying to account for year fixed effect:
- adding "i.year" in the list of regressors:
- The number of regressors chosen by lasso2 is 26 (out of 27). This result is not helpful at all since almost all the variables are selected
- Penalty level is 1.323 - adding a list of dummy for the year in the list of regressors: dumyear1-dumyear15 (sample period from 2000-2014) :
- Number of regressors chosen by lasso2 is 21/27
- Penalty level is much higher: 16.31
I am really confused about the results I have. I would expect using i.year is the same as using dummy for year in the regression.
In another approach, I used the option notpen(i.year) or partial(i.year) to not penalise year when calling lasso2 but I got the error message as follows:
3. using partial(i.year): syntax error - 0.year in partial(.) but not in list of regressors
r(198);
4. using notpen(i.year): internal _lassopath error - unpenalized 2002.year missing from selected vars
set tolzero(.) or other tolerances smaller or use partial(.) option
r(499);
Clearly, I have added the dummy for all year or i.year in the list of regressors. I have no idea why I got this error message. I would highly appreciate if you could help me clear the question.
Many thanks for your help.
Below is the sample of my data. I have in total 27 regressors but can not include them all due to linesize limit. In the data sample, ceo_totalpay is the total compensation of CEO (dependent variable), other variables are independent variables, gvkey is firm id and year is the financial year.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(ceo_totalpay ceo_tenure ceo_ownership ceo_directorship ceo_age ceo_age_sq ceo_duality ceo_gender ln_size tobinq roa leverage free_cashflow rd capex sales_growth ln_firmage ln_segment merger) double gvkey float year 7.743223 .6931472 .0000905406 0 56 3136 0 0 7.38426 1.5691023 .07805158 .326074 -.02208356 .02687224 .04476306 .2298486 2.772589 2.0794415 0 1034 2000 7.218372 1.0986123 .00011408457 0 57 3249 0 0 7.779052 1.1174034 .035978958 .4437609 .029121887 .03625511 .035668083 .08236733 2.833213 1.94591 0 1034 2001 7.556269 1.3862944 .0004853922 0 58 3364 0 0 7.739326 .8305851 .06212613 .3900251 .14331414 .02920776 .03238679 .26233295 2.890372 1.94591 0 1034 2002 7.973105 1.609438 .001487662 0 59 3481 0 0 7.753309 .9630332 .04531338 .350821 .10936305 .027146727 .018297164 .05405026 2.944439 1.94591 0 1034 2003 7.933617 1.792216 .0024018176 0 60 3600 1 0 7.602822 1.0058173 .03324264 .3501948 .15286067 .0406549 .02460573 .03252562 2.995732 2.0794415 0 1034 2004 7.618988 1.9463015 .002395355 0 61 3721 1 0 7.392268 1.384189 .05913269 .25667757 .13513224 .01659251 .02398633 . 3.0445225 1.7917595 0 1034 2005 8.047317 .9168385 .00322704 1.0986123 51 2601 0 1 7.160974 1.1174711 -.004286718 .24145354 -.015307399 .10887969 .04696526 .10491597 3.135494 1.609438 1 1034 2007 7.939509 1.4859537 0 .6931472 51 2601 0 1 7.458848 1.6002488 .13146882 .08863158 .11346654 0 .04507452 .07843047 3.367296 2.0794415 0 1076 2011 8.30934 .6497534 .000649168 . 70 4900 1 1 7.502699 1.5544815 .14043573 .07806594 -.002483934 0 .03589385 .09809002 3.4011974 2.0794415 0 1076 2012 8.319598 1.0698934 .0009713119 . 71 5041 1 1 7.510527 1.5500143 .12147653 .07810085 .11668974 0 .03182233 .005418458 3.433987 2.1972246 0 1076 2013 8.667651 1.2755923 .001827806 1.0986123 72 5184 0 1 7.806633 1.403948 .07163662 .2466913 -.04355994 0 .019360203 .21954766 3.465736 2.1972246 0 1076 2014 8.983843 1.0986123 .0001799805 0 45 2025 1 1 9.634513 5.338754 .21344067 .10179913 . .08839898 . .04312545 4.1431346 1.94591 0 1078 2000 9.694727 1.3862944 .00023363395 0 46 2116 1 1 10.056055 4.3312244 .1518659 .3128733 . .12482397 . .1847334 4.158883 1.94591 0 1078 2001 10.109052 1.609438 .0003439838 0 47 2209 1 1 10.096547 3.137679 .16244793 .26475123 .04616797 .0688192 .05343961 .08593158 4.1743875 1.94591 1 1078 2002 9.193542 1.7917595 .0003323995 0 48 2304 1 1 10.192993 3.2396975 .1558944 .22420397 .034286458 .06863891 .04666761 .1128604 4.189655 1.94591 1 1078 2003 9.3351965 1.9463015 .0003605419 0 49 2401 1 1 10.267 3.031784 .15487362 .23570412 .05243976 .068680264 .04489905 -.0000276923 4.204693 1.7917595 1 1078 2004 9.606682 2.079784 .0004261123 0 50 2500 1 1 10.279908 2.5880184 .16592997 .2276335 .06536151 .06308271 .04143593 .13250965 4.2195077 1.7917595 0 1078 2005 9.96103 2.197529 .00274604 1.0986123 51 2601 1 1 10.49621 2.68126 .1343412 .3430501 .0533049 .11800682 .036978595 .008458167 4.2341065 1.7917595 0 1078 2006 10.27352 2.302859 .003312108 1.0986123 52 2704 1 1 10.589458 2.743693 .13908334 .3075421 .04053475 .063092455 .04170343 .15295723 4.248495 1.7917595 0 1078 2007 10.130786 2.3983934 .003275916 .6931472 53 2809 1 1 10.655356 2.541137 .1526925 .29140773 .07309864 .06567938 .0303571 .13943355 4.26268 1.7917595 1 1078 2008 end
Here is my Stata code:
Code:
local x ceo_tenure ceo_ownership ceo_directorship ceo_age ceo_age_sq ceo_duality ceo_gender ln_size tobinq roa leverage free_cashflow rd capex sales_growth ln_firmage ln_segment merger * calling lasso2 using i.year option to account for year fixed effect: qui eststo: lasso2 ceo_totalpay `x' i.year, fe lic(aic) displayall postest * calling lasso2 using dummy for each year to account for year fixed effect: tab year, gen(dumyear) qui eststo: lasso2 ceo_totalpay `x' dumyear1-dumyear15, fe lic(aic) displayall postest * in an attempt not to penalise year I apply the below syntax and got the message as in (3): qui eststo: lasso2 `ceo_totalpay `x' i.year, partial(i.year) fe lic(aic) displayall postest *or the below syntax and got the error message as in (4) qui eststo: lasso2 `ceo_totalpay `x' i.year, notpen(i.year) fe lic(aic) displayall postest
Comment