Hello everyone,
I hope you are well.
I'd like to post a message on the forum about the LASSO and RIDGE methods.
Indeed, the objective of my study is to investigate the impact of different governance dimensions (board independence, CEO duality, governance index membership, board/committee size, etc.) on earnings management (measured by accruals).
Part 1 : LASSO
Here's my basic fixed-effect model:
I was unable to make any choices concerning the various governance variables (CAD, IND, etc.). However, when I analyzed the VIF, it turned out to be very high (even after centering reduced variables, etc.).
I therefore opted for a variable selection method, the LASSO method.
I used the following command:
Following this command, the variables selected are as follows: CFO2 DCFO CEO*CFO2*DCFO INDxCFO INDxCFOxDCFO IGExCFO IGExCFOxDCFO LEVIER2 TAILLEln2 TAILxCFO
Next, I forced the inclusion of certain variables present in the triple interactions but not selected by LASSO (e.g. CEO).
I then integrated the retained variables into my fixed-effect model and analyzed the results:
With this new order, the VIF has been considerably reduced, with an average of 3.
Part 2: Ridge
For robustness, I'd also like to use the Ridge method to reduce the multicollinearity of my model.
I used the following command:
And here are the results:
I confess I don't know how to interpret the results and what to do with them ...
I don't have any p-values to tell me which variables are significant in my model.
My questions are as follows:
Is the LASSO procedure empirically correct and well done?
What should I do to exploit the RIDGE method? Is it relevant or is LASSO enough?
Thank you very much for your answers,
Loïc Dubois
I hope you are well.
I'd like to post a message on the forum about the LASSO and RIDGE methods.
Indeed, the objective of my study is to investigate the impact of different governance dimensions (board independence, CEO duality, governance index membership, board/committee size, etc.) on earnings management (measured by accruals).
Part 1 : LASSO
Here's my basic fixed-effect model:
Code:
xtreg ACC CFO2 DCFO CFOxDCFO CADln2 CADlnxCFO CADlnxDCFO CADlnxCFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO AUDln2 AUDlnxCFO AUDlnxDCFO AUDlnxCFOxDCFO COGln2 COGlnxCFO COGlnxDCFO COGlnxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO OWN2 OWNxCFO OWNxDCFO OWNxCFOxDCFO LEVIER2 LEVIERxCFO LEVIERxDCFO LEVIERxCFOxDCFO TAILLEln2 TAILxCFO TAILxDCFO TAILxCFOxDCFO LITIGE LITIGExCFO LITIGExDCFO LITIGExCFOxDCFO, fe robust
I therefore opted for a variable selection method, the LASSO method.
I used the following command:
Code:
rlasso ACC CFO2 DCFO CFOxDCFO CADln2 CADlnxCFO CADlnxDCFO CADlnxCFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO AUDln2 AUDlnxCFO AUDlnxDCFO AUDlnxCFOxDCFO COGln2 COGlnxCFO COGlnxDCFO COGlnxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO OWN2 OWNxCFO OWNxDCFO OWNxCFOxDCFO LEVIER2 LEVIERxCFO LEVIERxDCFO LEVIERxCFOxDCFO TAILLEln2 TAILxCFO TAILxDCFO TAILxCFOxDCFO LITIGE LITIGExCFO LITIGExDCFO LITIGExCFOxDCFO, fe
Next, I forced the inclusion of certain variables present in the triple interactions but not selected by LASSO (e.g. CEO).
I then integrated the retained variables into my fixed-effect model and analyzed the results:
Code:
xtreg ACC CFO2 DCFO CFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO LEVIER2 TAILLEln2 TAILxCFO, fe robust
Part 2: Ridge
For robustness, I'd also like to use the Ridge method to reduce the multicollinearity of my model.
I used the following command:
Code:
ridgeregress ACC CFO2 DCFO CFOxDCFO CADln2 CADlnxCFO CADlnxDCFO CADlnxCFOxDCFO CEO CEOxCFO CEOxDCFO CEOxCFOxDCFO IND2 INDxCFO INDxDCFO INDxCFOxDCFO AUDln2 AUDlnxCFO AUDlnxDCFO AUDlnxCFOxDCFO COGln2 COGlnxCFO COGlnxDCFO COGlnxCFOxDCFO IGE IGExCFO IGExDCFO IGExCFOxDCFO OWN2 OWNxCFO OWNxDCFO OWNxCFOxDCFO LEVIER2 LEVIERxCFO LEVIERxDCFO LEVIERxCFOxDCFO TAILLEln2 TAILxCFO TAILxDCFO TAILxCFOxDCFO LITIGE LITIGExCFO LITIGExDCFO LITIGExCFOxDCFO
Code:
Ridge regression Number of observations = 1,822
R-squared = 0.5094
alpha = 0.0000
lambda = 0.0968
Cross-validation MSE = 0.0323
Number of folds = 10
Number of lambda tested = 100
---------------------------------------------------------------------------------
ACC | Coefficient
----------------+----------------------------------------------------------------
CFO2 | -.4201313
DCFO | .0442865
CFOxDCFO | -.2361166
CADln2 | -.0085005
CADlnxCFO | -.0199712
CADlnxDCFO | -.0186018
CADlnxCFOxDCFO | -.1104472
CEO | .0232471
CEOxCFO | .0026195
CEOxDCFO | -.0896609
CEOxCFOxDCFO | -.5306899
IND2 | .0057495
INDxCFO | -.7414762
INDxDCFO | -.0783018
INDxCFOxDCFO | -.6897435
AUDln2 | -.0184643
AUDlnxCFO | -.0240426
AUDlnxDCFO | -.0593989
AUDlnxCFOxDCFO | -.1078209
COGln2 | .0148307
COGlnxCFO | -.0076942
COGlnxDCFO | -.0097863
COGlnxCFOxDCFO | -.0744067
IGE | .0032314
IGExCFO | -.1649624
IGExDCFO | -.0513084
IGExCFOxDCFO | -.2191058
OWN2 | -.0032892
OWNxCFO | .0702804
OWNxDCFO | -.0259848
OWNxCFOxDCFO | -.2782928
LEVIER2 | -.1592852
LEVIERxCFO | -.0819853
LEVIERxDCFO | -.0316133
LEVIERxCFOxDCFO | .0887082
TAILLEln2 | .0045048
TAILxCFO | .0483754
TAILxDCFO | .0191018
TAILxCFOxDCFO | .0965565
LITIGE | .0031396
LITIGExCFO | -.0606596
LITIGExDCFO | -.0048072
LITIGExCFOxDCFO | .1459947
_cons | -.0316815
---------------------------------------------------------------------------------
I don't have any p-values to tell me which variables are significant in my model.
My questions are as follows:
Is the LASSO procedure empirically correct and well done?
What should I do to exploit the RIDGE method? Is it relevant or is LASSO enough?
Thank you very much for your answers,
Loïc Dubois
