Dear all,
I have an unbalanced panel data set, where N (110 companies) > T (5 Years). I first conducted a pooled OLS regression (-regress-). Later, I conducted panel regressions (-xtreg-), comparing the results as robustness checks. My model is as follows:
ROA = c.Var1_c##i.Industry Var2 Var3 Var4 Var5 i.Year, with Var1 being compensation to the CEO, Var2-5 control variables and Industry being a dummy variable (1 to 10 for different industries).
First, I winsorized my data at (5 95) to account for any outliers. I controlled for the OLS assumptions and in consequence transformed some variables (linearity), and mean-centered my key independent variable (multicollinearity for the interaction term). As one would expect, I do have heteroscedasticity (-estat hettest-) and autocorrelation (with -gen time = _n-; -tsset time-; and -dwstat-) in my data.
Question 1: How do I account for autocorrelation AND heteroscedasticity in pooled OLS? I understand that for the first I can use -prais ..., corc-, and for the latter -regress ...,vce(robust) -, but I have failed to find a combined method.
See the result of my pooled OLS regression below:
Question 2: Would you consider this an appropriate model? Am I missing something?
For my panel regressions I used the same winsorized/transformed data.
Question 3: Is this considered normal, or would one take the original (for some part) non-linear, non-normally-distributed data?
I then followed to do panel regressions (-xtreg, fe/re-) and testing for autocorrelation (-xtserial ...,output-, without categorical variables/interaction term) and heteroscedasticity (-xttest3-). After having confirmed, that both exist in my panel data, I accounted for it by going -xtreg ..., re vce(cluster Company_ID)- after the Hausman Test.
Question 4: Is using - ,re vce(cluster Company_ID)- correct in order to account for both, or should I conduct FGLS (-xtgls ..., p(h) c(ar1)-) or PCSE analyses (-xtpcse ..., het c(ar1)-)?
Question 5: Would you consider this an appropriate approach? Am I missing something?
Following that Var1_c in pooled OLS and random effects is similar in significance and having the same sign, I would conclude that the results obtained from pooled OLS seem reasonable and accept/reject my hypothesis from there.
Question 6: Would this be a correct way to do this?
Thank you very much for bearing with me so long. I am looking forward to your answers.
Best regards,
Pietro
I have an unbalanced panel data set, where N (110 companies) > T (5 Years). I first conducted a pooled OLS regression (-regress-). Later, I conducted panel regressions (-xtreg-), comparing the results as robustness checks. My model is as follows:
ROA = c.Var1_c##i.Industry Var2 Var3 Var4 Var5 i.Year, with Var1 being compensation to the CEO, Var2-5 control variables and Industry being a dummy variable (1 to 10 for different industries).
First, I winsorized my data at (5 95) to account for any outliers. I controlled for the OLS assumptions and in consequence transformed some variables (linearity), and mean-centered my key independent variable (multicollinearity for the interaction term). As one would expect, I do have heteroscedasticity (-estat hettest-) and autocorrelation (with -gen time = _n-; -tsset time-; and -dwstat-) in my data.
Question 1: How do I account for autocorrelation AND heteroscedasticity in pooled OLS? I understand that for the first I can use -prais ..., corc-, and for the latter -regress ...,vce(robust) -, but I have failed to find a combined method.
See the result of my pooled OLS regression below:
HTML Code:
. regress ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, vce(robust) Linear regression Number of obs = 472 F(28, 443) = 34.84 Prob > F = 0.0000 R-squared = 0.4664 Root MSE = .03384 ----------------------------------------------------------------------------------------- | Robust ROA_new | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------+---------------------------------------------------------------- Var1_c | .0004192 .0001773 2.37 0.018 .0000709 .0007676 | IndustryRank | Communication Services | .0129088 .0072345 1.78 0.075 -.0013093 .027127 Consumer Discretionary | .0188844 .0071505 2.64 0.009 .0048314 .0329375 Consumer Staples | .0078738 .0148976 0.53 0.597 -.0214049 .0371525 Financials | -.0030436 .0073608 -0.41 0.679 -.0175101 .0114228 Health Care | .0181798 .0069513 2.62 0.009 .0045182 .0318415 Information Technology | .0339861 .0076653 4.43 0.000 .0189212 .049051 Materials | .0007409 .0056497 0.13 0.896 -.0103627 .0118444 Real Estate | -.0062964 .0064746 -0.97 0.331 -.0190212 .0064284 Utilities | -.0269126 .0061301 -4.39 0.000 -.0389604 -.0148648 | IndustryRank#c.Var1_c | Communication Services | -.000053 .0002856 -0.19 0.853 -.0006143 .0005083 Consumer Discretionary | .0001062 .0002586 0.41 0.682 -.0004021 .0006145 Consumer Staples | .0004539 .0005145 0.88 0.378 -.0005572 .0014651 Financials | -.0002145 .0001982 -1.08 0.280 -.0006041 .0001751 Health Care | .0003999 .000239 1.67 0.095 -.0000699 .0008697 Information Technology | .0004263 .000286 1.49 0.137 -.0001358 .0009884 Materials | .0004491 .0002949 1.52 0.129 -.0001305 .0010288 Real Estate | .000397 .000238 1.67 0.096 -.0000708 .0008648 Utilities | -.0001756 .000237 -0.74 0.459 -.0006415 .0002902 | Var2 | -.0339362 .0150897 -2.25 0.025 -.0635926 -.0042799 Var3 | .0479924 .0242645 1.98 0.049 .0003044 .0956803 Var4 | -.0126286 .0016375 -7.71 0.000 -.0158468 -.0094103 Var5 | .0003169 .0011124 0.28 0.776 -.0018693 .0025032 | Year | 2015 | .001048 .005721 0.18 0.855 -.0101957 .0122917 2016 | .001855 .0052554 0.35 0.724 -.0084736 .0121835 2017 | .0068407 .0051132 1.34 0.182 -.0032084 .0168898 2018 | .0058702 .0051116 1.15 0.251 -.0041758 .0159161 2019 | .0056374 .0054482 1.03 0.301 -.0050702 .016345 | _cons | .2496132 .0273422 9.13 0.000 .1958766 .3033498 -----------------------------------------------------------------------------------------
For my panel regressions I used the same winsorized/transformed data.
Question 3: Is this considered normal, or would one take the original (for some part) non-linear, non-normally-distributed data?
I then followed to do panel regressions (-xtreg, fe/re-) and testing for autocorrelation (-xtserial ...,output-, without categorical variables/interaction term) and heteroscedasticity (-xttest3-). After having confirmed, that both exist in my panel data, I accounted for it by going -xtreg ..., re vce(cluster Company_ID)- after the Hausman Test.
Question 4: Is using - ,re vce(cluster Company_ID)- correct in order to account for both, or should I conduct FGLS (-xtgls ..., p(h) c(ar1)-) or PCSE analyses (-xtpcse ..., het c(ar1)-)?
HTML Code:
. xtset Company_ID Year panel variable: Company_ID (unbalanced) time variable: Year, 2014 to 2019, but with gaps delta: 1 unit . xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, re Random-effects GLS regression Number of obs = 472 Group variable: Company_ID Number of groups = 106 R-sq: Obs per group: within = 0.2685 min = 1 between = 0.3871 avg = 4.5 overall = 0.4329 max = 6 Wald chi2(28) = 187.42 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ----------------------------------------------------------------------------------------- ROA_new | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------------------+---------------------------------------------------------------- Var1_c | .00071 .0001607 4.42 0.000 .000395 .001025 | IndustryRank | Communication Services | .0097942 .0136338 0.72 0.473 -.0169275 .036516 Consumer Discretionary | .0130758 .0124651 1.05 0.294 -.0113553 .0375069 Consumer Staples | .0087058 .0202608 0.43 0.667 -.0310046 .0484163 Financials | -.0081966 .0166936 -0.49 0.623 -.0409154 .0245223 Health Care | .0134478 .0135422 0.99 0.321 -.0130943 .0399899 Information Technology | .0324283 .0149377 2.17 0.030 .0031509 .0617057 Materials | -.0025767 .0128767 -0.20 0.841 -.0278146 .0226613 Real Estate | -.0099463 .0232338 -0.43 0.669 -.0554838 .0355912 Utilities | -.0306672 .0206443 -1.49 0.137 -.0711293 .009795 | IndustryRank#c.Var1_c | Communication Services | -.0000123 .0003093 -0.04 0.968 -.0006185 .0005938 Consumer Discretionary | -.0001672 .0002001 -0.84 0.403 -.0005595 .000225 Consumer Staples | -.0006347 .0002995 -2.12 0.034 -.0012218 -.0000476 Financials | -.000613 .0003204 -1.91 0.056 -.0012409 .000015 Health Care | -.0004573 .0002939 -1.56 0.120 -.0010333 .0001188 Information Technology | -.0002168 .0003547 -0.61 0.541 -.000912 .0004783 Materials | .0002684 .0002352 1.14 0.254 -.0001924 .0007293 Real Estate | .0000822 .0006748 0.12 0.903 -.0012404 .0014048 Utilities | -.000399 .0003905 -1.02 0.307 -.0011645 .0003664 | Var2 | -.0373825 .0139206 -2.69 0.007 -.0646665 -.0100985 Var3 | .0421203 .0217148 1.94 0.052 -.00044 .0846806 Var4 | -.0106307 .0025466 -4.17 0.000 -.0156219 -.0056396 Var5 | -.0007089 .0008282 -0.86 0.392 -.0023321 .0009143 | Year | 2015 | .0009376 .0024598 0.38 0.703 -.0038835 .0057588 2016 | .0003204 .0024665 0.13 0.897 -.0045139 .0051546 2017 | .0038309 .0024417 1.57 0.117 -.0009546 .0086165 2018 | .0013017 .0025037 0.52 0.603 -.0036054 .0062088 2019 | -.0014532 .0027073 -0.54 0.591 -.0067594 .0038531 | _cons | .2299991 .0409079 5.62 0.000 .149821 .3101771 ------------------------+---------------------------------------------------------------- sigma_u | .0357301 sigma_e | .01435012 rho | .86110172 (fraction of variance due to u_i) ----------------------------------------------------------------------------------------- . est store re1 . xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, fe note: 1.IndustryRank omitted because of collinearity note: 2.IndustryRank omitted because of collinearity note: 3.IndustryRank omitted because of collinearity note: 4.IndustryRank omitted because of collinearity note: 5.IndustryRank omitted because of collinearity note: 7.IndustryRank omitted because of collinearity note: 8.IndustryRank omitted because of collinearity note: 9.IndustryRank omitted because of collinearity note: 10.IndustryRank omitted because of collinearity Fixed-effects (within) regression Number of obs = 472 Group variable: Company_ID Number of groups = 106 R-sq: Obs per group: within = 0.2722 min = 1 between = 0.2660 avg = 4.5 overall = 0.3108 max = 6 F(19,347) = 6.83 corr(u_i, Xb) = 0.0466 Prob > F = 0.0000 ----------------------------------------------------------------------------------------- ROA_new | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------+---------------------------------------------------------------- Var1_c | .0007539 .00017 4.43 0.000 .0004195 .0010883 | IndustryRank | Communication Services | 0 (omitted) Consumer Discretionary | 0 (omitted) Consumer Staples | 0 (omitted) Financials | 0 (omitted) Health Care | 0 (omitted) Information Technology | 0 (omitted) Materials | 0 (omitted) Real Estate | 0 (omitted) Utilities | 0 (omitted) | IndustryRank#c.Var1_c | Communication Services | .0002298 .0003752 0.61 0.541 -.0005083 .0009678 Consumer Discretionary | -.0002387 .0002111 -1.13 0.259 -.0006538 .0001765 Consumer Staples | -.0007517 .0003146 -2.39 0.017 -.0013706 -.0001329 Financials | -.0006464 .0003789 -1.71 0.089 -.0013916 .0000988 Health Care | -.0006811 .0003347 -2.03 0.043 -.0013393 -.0000228 Information Technology | -.0003157 .0004111 -0.77 0.443 -.0011243 .0004928 Materials | .0002191 .0002482 0.88 0.378 -.0002691 .0007072 Real Estate | .0000297 .0007433 0.04 0.968 -.0014321 .0014916 Utilities | -.0004243 .0004024 -1.05 0.292 -.0012158 .0003672 | Var2 | -.048388 .0168456 -2.87 0.004 -.0815203 -.0152556 Var3 | .0406569 .0238885 1.70 0.090 -.0063276 .0876414 Var4 | -.0106254 .0056596 -1.88 0.061 -.0217569 .000506 Var5 | -.000814 .0008762 -0.93 0.354 -.0025374 .0009094 | Year | 2015 | .0008862 .0024982 0.35 0.723 -.0040273 .0057997 2016 | .0002122 .0025832 0.08 0.935 -.0048685 .005293 2017 | .0035253 .0025814 1.37 0.173 -.0015518 .0086025 2018 | .0008915 .0028079 0.32 0.751 -.0046312 .0064143 2019 | -.0017244 .0031846 -0.54 0.589 -.0079881 .0045392 | _cons | .2380737 .0926853 2.57 0.011 .055778 .4203694 ------------------------+---------------------------------------------------------------- sigma_u | .03773385 sigma_e | .01435012 rho | .87364723 (fraction of variance due to u_i) ----------------------------------------------------------------------------------------- F test that all u_i=0: F(105, 347) = 22.42 Prob > F = 0.0000 . est store fe1 . xttest3 Modified Wald test for groupwise heteroskedasticity in fixed effect regression model H0: sigma(i)^2 = sigma^2 for all i chi2 (106) = 5.2e+31 Prob>chi2 = 0.0000 . xtserial ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, output factor-variable and time-series operators not allowed r(101); . xtserial ROA_new Var1_c Var2 Var3 Var4 Var5, output Linear regression Number of obs = 364 F(5, 95) = 12.17 Prob > F = 0.0000 R-squared = 0.2013 Root MSE = .01611 (Std. Err. adjusted for 96 clusters in Company_ID) ------------------------------------------------------------------------------ | Robust D.ROA_new | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- Var1_c | D1. | .0004306 .000123 3.50 0.001 .0001865 .0006747 | Var2 | D1. | -.0765561 .0178377 -4.29 0.000 -.1119683 -.0411439 | Var3 | D1. | .0629007 .0284957 2.21 0.030 .0063295 .1194718 | Var4 | D1. | -.0152056 .0053296 -2.85 0.005 -.0257861 -.0046251 | Var5 | D1. | -.0006843 .0006922 -0.99 0.325 -.0020586 .00069 ------------------------------------------------------------------------------ Wooldridge test for autocorrelation in panel data H0: no first-order autocorrelation F( 1, 90) = 11.909 Prob > F = 0.0009 . hausman fe1 re1 ---- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fe1 re1 Difference S.E. -------------+---------------------------------------------------------------- Var1_c | .0007539 .00071 .0000439 .0000555 IndustryRank#| c.Var1_c | 1 | .0002298 -.0000123 .0002421 .0002125 2 | -.0002387 -.0001672 -.0000715 .0000671 3 | -.0007517 -.0006347 -.000117 .0000963 4 | -.0006464 -.000613 -.0000334 .0002023 5 | -.0006811 -.0004573 -.0002238 .0001601 7 | -.0003157 -.0002168 -.0000989 .0002079 8 | .0002191 .0002684 -.0000494 .0000795 9 | .0000297 .0000822 -.0000525 .0003116 10 | -.0004243 -.000399 -.0000253 .0000971 Var2 | -.048388 -.0373825 -.0110055 .0094863 Var3 | .0406569 .0421203 -.0014635 .0099562 Var4 | -.0106254 -.0106307 5.29e-06 .0050543 Var5 | -.000814 -.0007089 -.0001051 .0002861 Year | 2015 | .0008862 .0009376 -.0000515 .0004362 2016 | .0002122 .0003204 -.0001081 .0007677 2017 | .0035253 .0038309 -.0003056 .0008378 2018 | .0008915 .0013017 -.0004102 .0012713 2019 | -.0017244 -.0014532 -.0002713 .001677 ------------------------------------------------------------------------------ b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(19) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 12.71 Prob>chi2 = 0.8533 . xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, re vce(cluster Company_ID) Random-effects GLS regression Number of obs = 472 Group variable: Company_ID Number of groups = 106 R-sq: Obs per group: within = 0.2685 min = 1 between = 0.3871 avg = 4.5 overall = 0.4329 max = 6 Wald chi2(28) = 313.61 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 (Std. Err. adjusted for 106 clusters in Company_ID) ----------------------------------------------------------------------------------------- | Robust ROA_new | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------------------+---------------------------------------------------------------- Var1_c | .00071 .0002914 2.44 0.015 .0001389 .001281 | IndustryRank | Communication Services | .0097942 .0152693 0.64 0.521 -.020133 .0397215 Consumer Discretionary | .0130758 .0124774 1.05 0.295 -.0113794 .0375311 Consumer Staples | .0087058 .0248933 0.35 0.727 -.0400842 .0574959 Financials | -.0081966 .0115438 -0.71 0.478 -.030822 .0144289 Health Care | .0134478 .0150852 0.89 0.373 -.0161187 .0430143 Information Technology | .0324283 .0149935 2.16 0.031 .0030415 .061815 Materials | -.0025767 .0120635 -0.21 0.831 -.0262206 .0210673 Real Estate | -.0099463 .0086922 -1.14 0.253 -.0269827 .0070901 Utilities | -.0306672 .012636 -2.43 0.015 -.0554333 -.005901 | IndustryRank#c.Var1_c | Communication Services | -.0000123 .0005877 -0.02 0.983 -.0011642 .0011395 Consumer Discretionary | -.0001672 .0004009 -0.42 0.677 -.0009529 .0006185 Consumer Staples | -.0006347 .0003247 -1.95 0.051 -.0012711 1.69e-06 Financials | -.000613 .0003028 -2.02 0.043 -.0012065 -.0000194 Health Care | -.0004573 .0003238 -1.41 0.158 -.0010919 .0001773 Information Technology | -.0002168 .0003364 -0.64 0.519 -.0008762 .0004425 Materials | .0002684 .0004278 0.63 0.530 -.00057 .0011069 Real Estate | .0000822 .0003729 0.22 0.826 -.0006487 .0008131 Utilities | -.000399 .0003621 -1.10 0.270 -.0011087 .0003107 | Var2 | -.0373825 .0152058 -2.46 0.014 -.0671854 -.0075796 Var3 | .0421203 .0229078 1.84 0.066 -.0027781 .0870188 Var4 | -.0106307 .0026749 -3.97 0.000 -.0158735 -.005388 Var5 | -.0007089 .0009975 -0.71 0.477 -.0026639 .0012461 | Year | 2015 | .0009376 .0021478 0.44 0.662 -.003272 .0051473 2016 | .0003204 .0021163 0.15 0.880 -.0038274 .0044682 2017 | .0038309 .0028211 1.36 0.174 -.0016983 .0093602 2018 | .0013017 .0031098 0.42 0.676 -.0047934 .0073969 2019 | -.0014532 .0034477 -0.42 0.673 -.0082105 .0053041 | _cons | .2299991 .0442903 5.19 0.000 .1431917 .3168064 ------------------------+---------------------------------------------------------------- sigma_u | .0357301 sigma_e | .01435012 rho | .86110172 (fraction of variance due to u_i) -----------------------------------------------------------------------------------------
Following that Var1_c in pooled OLS and random effects is similar in significance and having the same sign, I would conclude that the results obtained from pooled OLS seem reasonable and accept/reject my hypothesis from there.
Question 6: Would this be a correct way to do this?
Thank you very much for bearing with me so long. I am looking forward to your answers.
Best regards,
Pietro
Comment