Industry/Year Fixed Effects (Panel Data)

Wojciech Gulkowski

Join Date: Sep 2020

Posts: 22
#1

Industry/Year Fixed Effects (Panel Data)

17 Sep 2020, 09:34

Hi,

Let me first say that I'm Stata-beginner and would appreciate your help. I have read many threads about the topic already but I'm still unsure how to apply the mentioned effects specifically to my data set. I'm using Stata 16.1 version.

I'm investigating the relationship between cash holdings (dependent variable) and litigation risk (independent variable) using another 14 independent variables as control variables. My sample consists of S&P 500 companies (excluding financials and utilities, a total of 351 firms) in the period of 2010-2019. I've run some tests on my data: Hausman test (indicated I should use fixed effects), Breush Pagan test (indicated heteroscedasticity) and Woolridge test (indicated autocorrelation). Taking these tests into account I'm using fixed effects and clustered standard errors.

In line with what I read on this forum I used i.industry i.year specifications but the problem here is that industry is a string variable. Since industry is based on SIC number (also included in my data) I decided to use i.sic i.year specification instead. However as you can see all of my sic codes were omitted due to collinearity problem. Also I don't understand why i.year starts from 2011 and ends at 2018 and is not from 2010 to 2019? In any case I guess that my attempt to incorporate both industry and year fixed effect does not work well and needs some correction. I'm pasting my output from Stata below (I'm sorry for pasting like this but there seems to be some limit and dataex didn't work for me).

. xtset id year
panel variable: id (strongly balanced)
time variable: year, 2010 to 2019
delta: 1 unit

.
. xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div, fe vce(cluster id)

Fixed-effects (within) regression Number of obs = 3,083
Group variable: id Number of groups = 351

R-sq: Obs per group:
within = 0.2541 min = 1
between = 0.5638 avg = 8.8
overall = 0.5253 max = 9

F(15,350) = 30.05
corr(u_i, Xb) = 0.2648 Prob > F = 0.0000

(Std. Err. adjusted for 351 clusters in id)
--------------------------------------------------------------------------------
| Robust
ln_cash | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
lit_risk | .0010818 .0005395 2.01 0.046 .0000208 .0021429
size | .6542184 .0506476 12.92 0.000 .5546065 .7538304
mtb | .0520193 .0123956 4.20 0.000 .0276401 .0763986
lev | -.8306861 .2475409 -3.36 0.001 -1.317541 -.3438313
nwc | -1.271489 .2906379 -4.37 0.000 -1.843105 -.6998721
rd | -.2963804 .1064015 -2.79 0.006 -.5056472 -.0871136
growth | -.2076923 .1445206 -1.44 0.152 -.4919303 .0765457
cf | -.1704632 .2847327 -0.60 0.550 -.7304655 .3895392
cf_vol_5y | 1.829619 .7456975 2.45 0.015 .3630073 3.296231
industry_sigma | 4.291135 2.156647 1.99 0.047 .0495159 8.532753
acq | -2.505081 .2492455 -10.05 0.000 -2.995289 -2.014874
capex | -3.471404 .8872684 -3.91 0.000 -5.216452 -1.726355
ndi | 1.698586 .275659 6.16 0.000 1.15643 2.240743
nei | .9345293 .2447038 3.82 0.000 .4532545 1.415804
div | -.0070305 .0713867 -0.10 0.922 -.1474314 .1333704
_cons | 1.013231 .4810271 2.11 0.036 .0671633 1.959298
---------------+----------------------------------------------------------------
sigma_u | 1.0028185
sigma_e | .48667142
rho | .80937608 (fraction of variance due to u_i)
--------------------------------------------------------------------------------

. xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry i.year, fe vce(cluster id)
industry: string variables may not be used as factor variables
r(109);

. xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.sic i.year, fe vce(cluster id)
note: 1311.sic omitted because of collinearity
note: 1381.sic omitted because of collinearity
note: 1389.sic omitted because of collinearity
note: 1400.sic omitted because of collinearity
note: 1531.sic omitted because of collinearity
note: 1600.sic omitted because of collinearity
note: 1731.sic omitted because of collinearity
note: 2000.sic omitted because of collinearity
note: 2011.sic omitted because of collinearity
note: 2030.sic omitted because of collinearity
note: 2033.sic omitted because of collinearity
note: 2040.sic omitted because of collinearity
note: 2052.sic omitted because of collinearity
note: 2060.sic omitted because of collinearity
note: 2070.sic omitted because of collinearity
note: 2080.sic omitted because of collinearity
note: 2082.sic omitted because of collinearity
note: 2085.sic omitted because of collinearity
note: 2086.sic omitted because of collinearity
note: 2090.sic omitted because of collinearity
note: 2111.sic omitted because of collinearity
note: 2273.sic omitted because of collinearity
note: 2300.sic omitted because of collinearity
note: 2320.sic omitted because of collinearity
note: 2400.sic omitted because of collinearity
note: 2430.sic omitted because of collinearity
note: 2510.sic omitted because of collinearity
note: 2621.sic omitted because of collinearity
note: 2631.sic omitted because of collinearity
note: 2650.sic omitted because of collinearity
note: 2670.sic omitted because of collinearity
note: 2810.sic omitted because of collinearity
note: 2820.sic omitted because of collinearity
note: 2821.sic omitted because of collinearity
note: 2834.sic omitted because of collinearity
note: 2835.sic omitted because of collinearity
note: 2836.sic omitted because of collinearity
note: 2840.sic omitted because of collinearity
note: 2842.sic omitted because of collinearity
note: 2844.sic omitted because of collinearity
note: 2851.sic omitted because of collinearity
note: 2860.sic omitted because of collinearity
note: 2870.sic omitted because of collinearity
note: 2911.sic omitted because of collinearity
note: 3011.sic omitted because of collinearity
note: 3021.sic omitted because of collinearity
note: 3081.sic omitted because of collinearity
note: 3100.sic omitted because of collinearity
note: 3312.sic omitted because of collinearity
note: 3411.sic omitted because of collinearity
note: 3420.sic omitted because of collinearity
note: 3430.sic omitted because of collinearity
note: 3490.sic omitted because of collinearity
note: 3510.sic omitted because of collinearity
note: 3511.sic omitted because of collinearity
note: 3523.sic omitted because of collinearity
note: 3531.sic omitted because of collinearity
note: 3533.sic omitted because of collinearity
note: 3540.sic omitted because of collinearity
note: 3559.sic omitted because of collinearity
note: 3560.sic omitted because of collinearity
note: 3561.sic omitted because of collinearity
note: 3570.sic omitted because of collinearity
note: 3572.sic omitted because of collinearity
note: 3576.sic omitted because of collinearity
note: 3577.sic omitted because of collinearity
note: 3585.sic omitted because of collinearity
note: 3620.sic omitted because of collinearity
note: 3630.sic omitted because of collinearity
note: 3663.sic omitted because of collinearity
note: 3674.sic omitted because of collinearity
note: 3678.sic omitted because of collinearity
note: 3679.sic omitted because of collinearity
note: 3711.sic omitted because of collinearity
note: 3714.sic omitted because of collinearity
note: 3721.sic omitted because of collinearity
note: 3724.sic omitted because of collinearity
note: 3728.sic omitted because of collinearity
note: 3730.sic omitted because of collinearity
note: 3751.sic omitted because of collinearity
note: 3760.sic omitted because of collinearity
note: 3812.sic omitted because of collinearity
note: 3823.sic omitted because of collinearity
note: 3825.sic omitted because of collinearity
note: 3826.sic omitted because of collinearity
note: 3827.sic omitted because of collinearity
note: 3841.sic omitted because of collinearity
note: 3842.sic omitted because of collinearity
note: 3843.sic omitted because of collinearity
note: 3844.sic omitted because of collinearity
note: 3845.sic omitted because of collinearity
note: 3851.sic omitted because of collinearity
note: 3942.sic omitted because of collinearity
note: 3944.sic omitted because of collinearity
note: 3990.sic omitted because of collinearity
note: 4011.sic omitted because of collinearity
note: 4210.sic omitted because of collinearity
note: 4213.sic omitted because of collinearity
note: 4225.sic omitted because of collinearity
note: 4400.sic omitted because of collinearity
note: 4512.sic omitted because of collinearity
note: 4513.sic omitted because of collinearity
note: 4700.sic omitted because of collinearity
note: 4731.sic omitted because of collinearity
note: 4812.sic omitted because of collinearity
note: 4813.sic omitted because of collinearity
note: 4841.sic omitted because of collinearity
note: 4888.sic omitted because of collinearity
note: 4899.sic omitted because of collinearity
note: 5000.sic omitted because of collinearity
note: 5010.sic omitted because of collinearity
note: 5013.sic omitted because of collinearity
note: 5047.sic omitted because of collinearity
note: 5122.sic omitted because of collinearity
note: 5140.sic omitted because of collinearity
note: 5200.sic omitted because of collinearity
note: 5211.sic omitted because of collinearity
note: 5311.sic omitted because of collinearity
note: 5331.sic omitted because of collinearity
note: 5399.sic omitted because of collinearity
note: 5411.sic omitted because of collinearity
note: 5500.sic omitted because of collinearity
note: 5531.sic omitted because of collinearity
note: 5600.sic omitted because of collinearity
note: 5651.sic omitted because of collinearity
note: 5661.sic omitted because of collinearity
note: 5731.sic omitted because of collinearity
note: 5812.sic omitted because of collinearity
note: 5912.sic omitted because of collinearity
note: 5944.sic omitted because of collinearity
note: 5961.sic omitted because of collinearity
note: 5990.sic omitted because of collinearity
note: 7011.sic omitted because of collinearity
note: 7200.sic omitted because of collinearity
note: 7311.sic omitted because of collinearity
note: 7323.sic omitted because of collinearity
note: 7340.sic omitted because of collinearity
note: 7350.sic omitted because of collinearity
note: 7363.sic omitted because of collinearity
note: 7370.sic omitted because of collinearity
note: 7372.sic omitted because of collinearity
note: 7373.sic omitted because of collinearity
note: 7374.sic omitted because of collinearity
note: 7389.sic omitted because of collinearity
note: 7841.sic omitted because of collinearity
note: 7990.sic omitted because of collinearity
note: 8062.sic omitted because of collinearity
note: 8071.sic omitted because of collinearity
note: 8090.sic omitted because of collinearity
note: 8700.sic omitted because of collinearity
note: 8721.sic omitted because of collinearity
note: 8731.sic omitted because of collinearity
note: 8742.sic omitted because of collinearity

Fixed-effects (within) regression Number of obs = 3,083
Group variable: id Number of groups = 351

R-sq: Obs per group:
within = 0.2696 min = 1
between = 0.5484 avg = 8.8
overall = 0.5164 max = 9

F(23,350) = 23.08
corr(u_i, Xb) = 0.1569 Prob > F = 0.0000

(Std. Err. adjusted for 351 clusters in id)
--------------------------------------------------------------------------------
| Robust
ln_cash | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
lit_risk | .001018 .0005239 1.94 0.053 -.0000123 .0020484
size | .7397585 .0579766 12.76 0.000 .6257322 .8537848
mtb | .0549462 .0131649 4.17 0.000 .0290541 .0808384
lev | -.6847657 .2475307 -2.77 0.006 -1.1716 -.197931
nwc | -1.357777 .2877874 -4.72 0.000 -1.923787 -.7917672
rd | -.2639219 .1092408 -2.42 0.016 -.4787728 -.0490709
growth | -.2644579 .1528391 -1.73 0.084 -.5650565 .0361408
cf | -.0113674 .286946 -0.04 0.968 -.5757228 .5529879
cf_vol_5y | 1.987609 .7245799 2.74 0.006 .5625311 3.412688
industry_sigma | 6.153141 2.241524 2.75 0.006 1.744591 10.56169
acq | -2.433709 .2453183 -9.92 0.000 -2.916193 -1.951226
capex | -3.205605 .8809773 -3.64 0.000 -4.93828 -1.47293
ndi | 1.518168 .2765364 5.49 0.000 .9742856 2.06205
nei | .7939962 .2483272 3.20 0.002 .3055949 1.282398
div | .0180821 .0712251 0.25 0.800 -.122001 .1581652
|
sic |
1311 | 0 (omitted)
1381 | 0 (omitted)
1389 | 0 (omitted)
1400 | 0 (omitted)
1531 | 0 (omitted)
1600 | 0 (omitted)
1731 | 0 (omitted)
2000 | 0 (omitted)
2011 | 0 (omitted)
2030 | 0 (omitted)
2033 | 0 (omitted)
2040 | 0 (omitted)
2052 | 0 (omitted)
2060 | 0 (omitted)
2070 | 0 (omitted)
2080 | 0 (omitted)
2082 | 0 (omitted)
2085 | 0 (omitted)
2086 | 0 (omitted)
2090 | 0 (omitted)
2111 | 0 (omitted)
2273 | 0 (omitted)
2300 | 0 (omitted)
2320 | 0 (omitted)
2400 | 0 (omitted)
2430 | 0 (omitted)
2510 | 0 (omitted)
2621 | 0 (omitted)
2631 | 0 (omitted)
2650 | 0 (omitted)
2670 | 0 (omitted)
2810 | 0 (omitted)
2820 | 0 (omitted)
2821 | 0 (omitted)
2834 | 0 (omitted)
2835 | 0 (omitted)
2836 | 0 (omitted)
2840 | 0 (omitted)
2842 | 0 (omitted)
2844 | 0 (omitted)
2851 | 0 (omitted)
2860 | 0 (omitted)
2870 | 0 (omitted)
2911 | 0 (omitted)
3011 | 0 (omitted)
3021 | 0 (omitted)
3081 | 0 (omitted)
3100 | 0 (omitted)
3312 | 0 (omitted)
3411 | 0 (omitted)
3420 | 0 (omitted)
3430 | 0 (omitted)
3490 | 0 (omitted)
3510 | 0 (omitted)
3511 | 0 (omitted)
3523 | 0 (omitted)
3531 | 0 (omitted)
3533 | 0 (omitted)
3540 | 0 (omitted)
3559 | 0 (omitted)
3560 | 0 (omitted)
3561 | 0 (omitted)
3570 | 0 (omitted)
3572 | 0 (omitted)
3576 | 0 (omitted)
3577 | 0 (omitted)
3585 | 0 (omitted)
3620 | 0 (omitted)
3630 | 0 (omitted)
3663 | 0 (omitted)
3674 | 0 (omitted)
3678 | 0 (omitted)
3679 | 0 (omitted)
3711 | 0 (omitted)
3714 | 0 (omitted)
3721 | 0 (omitted)
3724 | 0 (omitted)
3728 | 0 (omitted)
3730 | 0 (omitted)
3751 | 0 (omitted)
3760 | 0 (omitted)
3812 | 0 (omitted)
3823 | 0 (omitted)
3825 | 0 (omitted)
3826 | 0 (omitted)
3827 | 0 (omitted)
3841 | 0 (omitted)
3842 | 0 (omitted)
3843 | 0 (omitted)
3844 | 0 (omitted)
3845 | 0 (omitted)
3851 | 0 (omitted)
3942 | 0 (omitted)
3944 | 0 (omitted)
3990 | 0 (omitted)
4011 | 0 (omitted)
4210 | 0 (omitted)
4213 | 0 (omitted)
4225 | 0 (omitted)
4400 | 0 (omitted)
4512 | 0 (omitted)
4513 | 0 (omitted)
4700 | 0 (omitted)
4731 | 0 (omitted)
4812 | 0 (omitted)
4813 | 0 (omitted)
4841 | 0 (omitted)
4888 | 0 (omitted)
4899 | 0 (omitted)
5000 | 0 (omitted)
5010 | 0 (omitted)
5013 | 0 (omitted)
5047 | 0 (omitted)
5122 | 0 (omitted)
5140 | 0 (omitted)
5200 | 0 (omitted)
5211 | 0 (omitted)
5311 | 0 (omitted)
5331 | 0 (omitted)
5399 | 0 (omitted)
5411 | 0 (omitted)
5500 | 0 (omitted)
5531 | 0 (omitted)
5600 | 0 (omitted)
5651 | 0 (omitted)
5661 | 0 (omitted)
5731 | 0 (omitted)
5812 | 0 (omitted)
5912 | 0 (omitted)
5944 | 0 (omitted)
5961 | 0 (omitted)
5990 | 0 (omitted)
7011 | 0 (omitted)
7200 | 0 (omitted)
7311 | 0 (omitted)
7323 | 0 (omitted)
7340 | 0 (omitted)
7350 | 0 (omitted)
7363 | 0 (omitted)
7370 | 0 (omitted)
7372 | 0 (omitted)
7373 | 0 (omitted)
7374 | 0 (omitted)
7389 | 0 (omitted)
7841 | 0 (omitted)
7990 | 0 (omitted)
8062 | 0 (omitted)
8071 | 0 (omitted)
8090 | 0 (omitted)
8700 | 0 (omitted)
8721 | 0 (omitted)
8731 | 0 (omitted)
8742 | 0 (omitted)
|
year |
2011 | -.0442112 .0304748 -1.45 0.148 -.104148 .0157256
2012 | .0400003 .0355404 1.13 0.261 -.0298994 .1099
2013 | .0454666 .0405875 1.12 0.263 -.0343595 .1252927
2014 | .0194234 .0475414 0.41 0.683 -.0740794 .1129263
2015 | -.0739048 .0506218 -1.46 0.145 -.1734661 .0256564
2016 | -.0041762 .051068 -0.08 0.935 -.104615 .0962627
2017 | -.0587983 .0542472 -1.08 0.279 -.1654898 .0478933
2018 | -.2163473 .0594791 -3.64 0.000 -.3333286 -.099366
|
_cons | .0836588 .5586551 0.15 0.881 -1.015084 1.182402
---------------+----------------------------------------------------------------
sigma_u | .9933031
sigma_e | .48230923
rho | .80921242 (fraction of variance due to u_i)
--------------------------------------------------------------------------------

. dataex
input statement exceeds linesize limit. Try specifying fewer variables
r(1000);

. help dataex

.

I would appreciate help on how to best handle this problem. Do I understand right that if I clustered standard errors at industry level that would mean that I'm including industry fixed effects? And since year was my time variable in xtset I'm also including year fixed effects? However, I'm following the paper of Malm and Kanuri (2016) where they included both industry and fixed effects but clustered standard errors at firm level (id variable in my case). Therefore I would prefer to use this specification - that is cluster at firm level, and then add i.year i.industry to my xtreg command.

(I would also appreciate help on how to use dataex. Even when I used only the outputs for xtset and xtreg - first two commands - dataext said they exceed the limit)

Thank you.
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

17 Sep 2020, 11:12

Wojcieck:
welocme to this forum.
Soem comments about your post:
1) if a given variable is perfectly collinear with the -panelid- it will be omitted;
2) I'm not clear how you were able to use Breush Pagan test for heteroskedasticity (-estat hettest) as it is not allowed after xtreg:

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage c.age##c.age, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
             |
 c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
             |
       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------


. estat hettest
estat hettest not valid
r(321);

.

... and -hausman- test with non-default standard errors:

Code:

. quietly xtreg ln_wage c.age##c.age, fe vce(cluster idcode)

. estimates store fe

. xtreg ln_wage c.age##c.age, re vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1015                                         avg =        6.1
     overall = 0.0870                                         max =         15

                                                Wald chi2(2)      =    1258.33
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0590339   .0041049    14.38   0.000     .0509884    .0670795
             |
 c.age#c.age |  -.0006758   .0000688    -9.83   0.000    -.0008107    -.000541
             |
       _cons |   .5479714   .0587198     9.33   0.000     .4328826    .6630601
-------------+----------------------------------------------------------------
     sigma_u |   .3654049
     sigma_e |  .30245467
         rho |  .59342665   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. estimates store re

. hausman fe re
hausman cannot be used with vce(robust), vce(cluster cvar), or p-weighted data
r(198);

.

That said, you can cluster your standard errors on -panelid- and add -i.industry- and -i.time- among your set of predictors.
As far as -i.year- is concerned, one of the year is omitted by default as reference category and another one might be omitted due to collinearity.

Kind regards,
Carlo
(Stata 19.0)

Comment

Wojciech Gulkowski

Join Date: Sep 2020

Posts: 22
#3

18 Sep 2020, 05:56

Dear Carlo,

Thank you for reply.

1) Ok, understood.

2) As for Breush Pagan I run it on re regression with standard errors.

Here is the output:

xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

ln_cash[id,t] = Xb + u[id] + e[id,t]

Estimated results:
| Var sd = sqrt(Var)
---------+-----------------------------
ln_cash | 2.413241 1.553461
e | .2368491 .4866714
u | .6351271 .7969486

Test: Var(u) = 0
chibar2(01) = 5791.15
Prob > chibar2 = 0.0000

From this thread: https://www.statalist.org/forums/for...chibar2-1-0000
and more specifically, from Jeff's comment: "The Breusch-Pagan statistic tests for the presence of positive serial correlation in the composite error term. That's all it does. "
I understood that xttest0 tests for heteroskedasticity. If I'm wrong please correct me.
I know that you wrote here:
https://www.statalist.org/forums/for...nel-data-model
that estat hettest doesn't work for panel data that is why I used the xttest0 thinking of it as a substitute. I might have mixed it all up. In any case you also suggested to do "eye inspection" of comparing standard errors with clustered errors and judging by their difference. What size of difference would make clustered errors a better choice? As a person without much econometrics experience it's rather hard to say.

3) As for Hausman test I run it on stored estimates from regressions with default standard errors. That was before I figured out that I need to use robust standard errors. I assumed that my results from the Hausman test would be still valid with clustered errors, do you think I should run another test (since as you pointed out Hausman is not viable) to decide between fe and re regressions with clustered SE?

4) Regarding this "That said, you can cluster your standard errors on -panelid- and add -i.industry- and -i.time- among your set of predictors.". Yes I tried to do that, as I mentioned in my post, the issue here is that industry in my data is a string variable and when I tried to add such specification I got:

. xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry i.year, fe vce(cluster id)
industry: string variables may not be used as factor variables
r(109);

Unless you meant something different than this?

Thank you again for help.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

18 Sep 2020, 07:26

Wojciech (sorry for mispelling your given name in my previous reply):
1) -xttest0- tests for panel-wise effect in -re- model, not for heteroskedasticity (actually, there are two different Breush-Pagan tests);
2) the number of clusters should be large enougn (no hard and fast ruloes, though) for clustered standard errors to work properly;
3) you should impose non-default standard errors before comparing -fe- vs -re- specification (via the community-contributed command -xtoverid-);
4) see -help decode- f about how to convert -string- variables in numeric format.

Kind regards,
Carlo
(Stata 19.0)
Comment
Wojciech Gulkowski

Join Date: Sep 2020

Posts: 22
#5

18 Sep 2020, 10:53

Hi,

Thank you for reply. I have some follow up questions.

1) Since I cannot run -estat hettest- on my panel data, how should I best check if there is heteroskedasticity in my data? I guess by comparing the default and clustered SE? But again I'm not sure by what criteria I should judge such comparison..

2) I have 350 clusters so I hope that would be enough for them to work properly

3) I did as you suggested and used -xtoverid- command. Indeed it confirmed again that I should go with fixed effects regression if I'm correct, here is the output:

. xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div, re vce(cluster id)

Random-effects GLS regression Number of obs = 3,083
Group variable: id Number of groups = 351

R-sq: Obs per group:
within = 0.2498 min = 1
between = 0.6114 avg = 8.8
overall = 0.5644 max = 9

Wald chi2(15) = 708.75
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

(Std. Err. adjusted for 351 clusters in id)
--------------------------------------------------------------------------------
| Robust
ln_cash | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
lit_risk | .0014052 .0007906 1.78 0.076 -.0001444 .0029549
size | .7320445 .0362889 20.17 0.000 .6609195 .8031694
mtb | .07182 .0122373 5.87 0.000 .0478354 .0958046
lev | -1.09048 .2420424 -4.51 0.000 -1.564874 -.6160855
nwc | -1.383627 .2483779 -5.57 0.000 -1.870438 -.8968148
rd | .147625 .2203295 0.67 0.503 -.2842129 .5794629
growth | -.2238729 .1508359 -1.48 0.138 -.5195059 .0717601
cf | .046749 .3007057 0.16 0.876 -.5426233 .6361213
cf_vol_5y | 2.61963 .6215614 4.21 0.000 1.401392 3.837868
industry_sigma | 2.959833 1.814173 1.63 0.103 -.5958803 6.515547
acq | -2.585341 .2542051 -10.17 0.000 -3.083574 -2.087108
capex | -4.589984 .7118333 -6.45 0.000 -5.985152 -3.194816
ndi | 1.814541 .2797496 6.49 0.000 1.266242 2.36284
nei | .8086857 .2389388 3.38 0.001 .3403743 1.276997
div | -.0365631 .0644991 -0.57 0.571 -.162979 .0898529
_cons | .3421295 .3455221 0.99 0.322 -.3350815 1.01934
---------------+----------------------------------------------------------------
sigma_u | .7969486
sigma_e | .48667142
rho | .72837666 (fraction of variance due to u_i)
--------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re robust cluster(id)
Sargan-Hansen statistic 110.397 Chi-sq(15) P-value = 0.0000

I tried to run the test with my final specification as well, but this happened:

. quietly xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, re vce(cluster id)

. xtoverid
1b: operator invalid
r(198);

4) I managed to convert industry variable from string to numeric variable that I named industry_numeric. Again there is full collinearity and all industry variables were omitted. My id variable is a number from 1 to 365, one for each company in my sample. So does it mean there is collinearity between id and industry? And more importantly, should I use this specification if all industry_numeric variables were omitted, so to say, is there any difference if I just skip -i.industry_numeric-? Here is the output:

. . xtreg ln_cash lit_risk_L1 size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, fe vce(cluster id)
note: 2.industry_numeric omitted because of collinearity
note: 3.industry_numeric omitted because of collinearity
note: 4.industry_numeric omitted because of collinearity
note: 5.industry_numeric omitted because of collinearity
note: 6.industry_numeric omitted because of collinearity
note: 7.industry_numeric omitted because of collinearity
note: 8.industry_numeric omitted because of collinearity

Fixed-effects (within) regression Number of obs = 3,105
Group variable: id Number of groups = 350

R-sq: Obs per group:
within = 0.2576 min = 4
between = 0.5525 avg = 8.9
overall = 0.5168 max = 9

F(23,349) = 23.20
corr(u_i, Xb) = 0.1777 Prob > F = 0.0000

(Std. Err. adjusted for 350 clusters in id)
----------------------------------------------------------------------------------
| Robust
ln_cash | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
lit_risk_L1 | .001719 .0003873 4.44 0.000 .0009572 .0024807
size | .7280128 .0690938 10.54 0.000 .5921203 .8639054
mtb | .0481119 .0124148 3.88 0.000 .0236947 .0725292
lev | -.756417 .2735459 -2.77 0.006 -1.294423 -.2184112
nwc | -1.159551 .2937198 -3.95 0.000 -1.737235 -.5818679
rd | -.0306303 .1282287 -0.24 0.811 -.2828284 .2215679
growth | -.3528492 .1526895 -2.31 0.021 -.6531565 -.0525418
cf | .1878638 .2890736 0.65 0.516 -.3806816 .7564092
cf_vol_5y | 1.735107 .6622261 2.62 0.009 .4326508 3.037563
industry_sigma | 5.907335 2.403872 2.46 0.014 1.179437 10.63523
acq | -2.463465 .237868 -10.36 0.000 -2.9313 -1.99563
capex | -3.41898 .9081676 -3.76 0.000 -5.20515 -1.63281
ndi | 1.593964 .2837795 5.62 0.000 1.035831 2.152098
nei | .7494894 .236749 3.17 0.002 .2838553 1.215124
div | .0149903 .0863881 0.17 0.862 -.1549165 .184897
|
industry_numeric |
Construction | 0 (omitted)
Manufacturing | 0 (omitted)
Mining | 0 (omitted)
Retail Trade | 0 (omitted)
Services | 0 (omitted)
Transportation | 0 (omitted)
Wholesale Trade | 0 (omitted)
|
year |
2012 | .0907854 .0289685 3.13 0.002 .0338106 .1477601
2013 | .1000274 .0357862 2.80 0.005 .0296437 .1704111
2014 | .073067 .0475941 1.54 0.126 -.0205403 .1666742
2015 | -.0145752 .0505702 -0.29 0.773 -.1140358 .0848854
2016 | .0563425 .0520712 1.08 0.280 -.0460703 .1587553
2017 | .0040203 .0547841 0.07 0.942 -.1037282 .1117688
2018 | -.1512117 .0597149 -2.53 0.012 -.268658 -.0337654
2019 | -.1387516 .0657292 -2.11 0.035 -.2680267 -.0094765
|
_cons | .1717959 .6641346 0.26 0.796 -1.134414 1.478006
-----------------+----------------------------------------------------------------
sigma_u | .98210691
sigma_e | .49521909
rho | .79728316 (fraction of variance due to u_i)
----------------------------------------------------------------------------------

On the sidenote, the collinearity problem doesn't exist when I use random effects regression.

Many thanks.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#6

19 Sep 2020, 01:40

Wojciech:
1) a visual inspection is the way to go;
2) 350 clusters are actually enough;
3) try:

Code:

quietly xi: xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, re vce(cluster id)

;
4) it seems that there's colineraity between -id- and -i.industry-. What you experienced with the -fe- specification is actually expected, because this estimator wipes out all the time-invariant prdictors (usually, the firm belongs to the same industry for all the time soan the panel dataset stretches over). And it also expected that this nuisance does not creep up when you switch to the -re- estimator, as it gives back coefficients for time-invariant regressors, too.

Kind regards,
Carlo
(Stata 19.0)
Comment
Wojciech Gulkowski

Join Date: Sep 2020

Posts: 22
#7

12 Oct 2020, 07:35

Hi Carlo,

I took some break from my analysis to focus on other thesis parts, now I'm at it again. As for point 4) does it mean that I cannot use i.industry in my regression command? I followed other researchers in this field who always mentioned to have included industry and year fixed effects.

I did check the correlation between the -id- and -industry.numeric- variables as per your suggestion but it doesn't imply that there is any significant correlation:

. correlate id industry_numeric
(obs=3,510)

| id indust~c
-------------+------------------
id | 1.0000
industry_n~c | 0.1167 1.0000

So, in the end, my question is if I can safely use the regression results that I reported in my previous post or their validity would be nulled by the fact that the industry variables are omitted?

Thank you.
Comment
Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#8

12 Oct 2020, 07:59

Dear Wojciech:

As Carlos points out in post #6, point 4): "usually, the firm belongs to the same industry".

As a result, you will not be able to include both industry and firm level fixed effects. I recommend firm-level fixed effects since there are likely unobserved differences between firms you will want to consider.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#9

12 Oct 2020, 08:29

Wojciech:
as Chris helpfully replied, the main issue does not rest on correlation between -id- and -industry- (by the way: your correlation includes variables, not regression coefficients) but on the way -fe- estimator works: if, as expected, firms do not migrate to different industries as time goes by, there's no way that you can get a coefficient for -industry- (a time-invariant predictor) under the -fe- assumptions, as the mean of a constant is the constant istself and (constant-its mean=0).
Instead, it usually makes sense to include -i.timevar- among the set of predictors of your -fe- panel data regression.

Kind regards,
Carlo
(Stata 19.0)
Comment
Wojciech Gulkowski

Join Date: Sep 2020

Posts: 22
#10

12 Oct 2020, 10:36

Dear Chris and Carlo,

Thank you for your valuable replies. I think that I got it only now but please do let me know if I understand correctly. Let's use the below example:

xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, fe vce(cluster id)

So in this specification the part in bold "fe" corresponds to firm-level fixed effects since -id- is a single firm in my sample? That is why I get collinearity warning when trying to use both i.industry numeric and fe at once, that is both industry and firm-level fixed effects, correct?

@Carlo, by -i.timevar- you mean in my case for example -i.year- which I used?

Chris, I see you point, but in another similar study (Malm and Kanuri; 2016) to mine they used industry level fixed effects so I think I would stick to that or analyze firm specific effects as well as a separate specification to show how results change, what do you think? Also when I changed my specification now to only reflect industry fixed effects and not firm-specific ones, I get higher R^2 value. It increased from around 0.51 to 0.59. This is my new specification:

. xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, vce(cluster id)

Random-effects GLS regression Number of obs = 3,083
Group variable: id Number of groups = 351

R-sq: Obs per group:
within = 0.2662 min = 1
between = 0.6368 avg = 8.8
overall = 0.5912 max = 9

Wald chi2(30) = 1020.78
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

(Std. Err. adjusted for 351 clusters in id)
----------------------------------------------------------------------------------
| Robust
ln_cash | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
lit_risk | .0011825 .0006735 1.76 0.079 -.0001376 .0025026
size | .8180861 .0380493 21.50 0.000 .743511 .8926613
mtb | .0756164 .0134392 5.63 0.000 .049276 .1019567
lev | -.897595 .2408011 -3.73 0.000 -1.369556 -.4256336
nwc | -1.491774 .2565722 -5.81 0.000 -1.994646 -.9889015
rd | .1128436 .2041966 0.55 0.581 -.2873743 .5130615
growth | -.2792344 .1605548 -1.74 0.082 -.593916 .0354473
cf | .168324 .2913766 0.58 0.563 -.4027636 .7394116
cf_vol_5y | 2.742195 .612251 4.48 0.000 1.542205 3.942185
industry_sigma | 5.698774 2.152072 2.65 0.008 1.480791 9.916758
acq | -2.519466 .2470148 -10.20 0.000 -3.003606 -2.035326
capex | -3.633231 .783425 -4.64 0.000 -5.168716 -2.097747
ndi | 1.624915 .2782505 5.84 0.000 1.079554 2.170276
nei | .7045048 .2445688 2.88 0.004 .2251587 1.183851
div | -.0117488 .064654 -0.18 0.856 -.1384682 .1149706
|
industry_numeric |
Construction | .5622842 .3687984 1.52 0.127 -.1605475 1.285116
Manufacturing | .9820577 .3352041 2.93 0.003 .3250697 1.639046
Mining | -.0088663 .5076044 -0.02 0.986 -1.003753 .98602
Retail Trade | .8018587 .361213 2.22 0.026 .0938942 1.509823
Services | .9884298 .3545113 2.79 0.005 .2936004 1.683259
Transportation | .2973984 .4240756 0.70 0.483 -.5337746 1.128571
Wholesale Trade | .4882454 .3743174 1.30 0.192 -.2454033 1.221894
|
year |
2011 | -.0461177 .0306081 -1.51 0.132 -.1061085 .0138731
2012 | .0350944 .0345209 1.02 0.309 -.0325653 .1027541
2013 | .0290506 .0385723 0.75 0.451 -.0465496 .1046509
2014 | -.002016 .0440031 -0.05 0.963 -.0882605 .0842285
2015 | -.0938221 .0461756 -2.03 0.042 -.1843246 -.0033196
2016 | -.0285547 .0480308 -0.59 0.552 -.1226933 .0655839
2017 | -.0947244 .0495961 -1.91 0.056 -.1919309 .0024821
2018 | -.2546612 .0531868 -4.79 0.000 -.3589053 -.150417
|
_cons | -1.452525 .5727337 -2.54 0.011 -2.575062 -.3299872
-----------------+----------------------------------------------------------------
sigma_u | .79412345
sigma_e | .48230923
rho | .73052876 (fraction of variance due to u_i)
----------------------------------------------------------------------------------

So it seems to me that all is fine with this specification and I can safely include it in my thesis. Or would you suggest me to check for any other potential issues?

I really appreciate your insights.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

12 Oct 2020, 11:01

Wojciech:
1) the -fe- specification is your first code means firm-wise fixed effect (as panel are composed of firms in your example).
You get a warning maessage stating that -i.industry- is perfectly collinear with -firm- because firms belong to the same industry across all the data waves: hence, due to the -fe- machinery, being -industry- a time-invariant predictor, you do not get any coefficient for that regressor.
2) Yes, by -i.timevar- I meant -i.year-.
3) Since you used clustered standard errors, you should compare via the community-contributed modeule -xtoverid- which one specification (-fe- or -re-) fits your data better.
4) While for the -fe- specification you shoud look at the R-within, for the -re- counterpart is the R-between that is informative.
5) Hunting for the model with the highest possible R is not the way to go, methodologically speaking. Your research effors should rather aim at giving the fairest and truest view of the data generating process you're investigating.
6) If you had used CODE delimiters, your Stata output would have been more readable.

Kind regards,
Carlo
(Stata 19.0)
Comment
Wojciech Gulkowski

Join Date: Sep 2020

Posts: 22
#12

12 Oct 2020, 12:36

Thank you Carlo.

Sorry if I'm asking naive questions but could you confirm though that if I use this specification:

. xtreg ln_cash lit_risk_L1 size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, vce(cluster id)

I will get regression with industry and year fixed effects? I'm asking since as you can see in my previous post in order to perform this regression I have to use Random-effects GLS regression and not FE Regression (for the reasons you explained - impossible to use both firm and industry fixed effect at once). So in fact I'm estimating RE regression with fixed effects obtained by -i.industry.numeric- and -i.year-.

On the sidenote, I tried to change my -panel id- to -industry.numeric- so that I could get the industry fixed effects by running -fe- specification but I cannot due to:

Code:

. xtset industry_numeric year repeated time values within panel r(451);

So going back to my first question, is it not a problem that I'm using RE regression to indeed obtain year and industry fixed effects?

I hope it is clear what I mean. Thank you.
Comment
Wojciech Gulkowski

Join Date: Sep 2020

Posts: 22
#13

12 Oct 2020, 14:27

Also I have a question regarding regression itself if you allow me. Other papers examining cash holding and litigation risk report "OLS regression estimates" when dealing with panel data. If they say "OLS estimates" does it mean they are referring to the simple -regress- command? But I think it would not make sense to use -regress- with panel data. I'm trying to run the same experiment so in order to follow it I need to know if possibly "OLS estimates" could refer to running fixed effects or random effects regressions for panel data of -xtreg- kind? I know that GLS and OLS is not the same but then I don't understand why would they use OLS regression => -regress- command? I know it's maybe a long shot and not sure if it is allowed on this forum (hopefully yes) but here in the attachment I'm pasting two similar studies (which are publicly available) to which I'm referring. If by any chance you would be able to tell me if I need to run -regress- or -xtreg- command to follow the work of these researchers (there is evidence pointing to both from my limited understanding) I would be massively grateful.

Attached Files

The Effects of Securities Class Action Litigation on Corporate Liquidity and Investment Policy (Arena, Julio; 2015).pdf (1.15 MB, 1 view)

Litigation risk and cash holdings (Malm, Kanuri; 2016).pdf (776.2 KB, 1 view)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#14

13 Oct 2020, 02:41

Wojciech:
# 12: Stata threw an error message saying that observations have repeated time values within panel (this is pretty frequent with financial data: eg, multiple transactions per diem made by the same broker). The usual fix is to -xtset- your dataset with -panelid- only. However, this fix comes at the cost that you cannot use time-series commands, such as lags and leads.
You can still include -i.year- as a predictor.
That said, while the reason why you prefere going -re- is clear, you should check whether this specification is the right one for your dataset. As you invoked non-default standard errors, you should switch from -hausman- to the community-contributed module -xtoverid-.

# 13: I took a look at the first paper you attached (by the way; are you sure that you do not breach any copyright in distributing those articles?). In all likelihood, Authors used a pooled OLS (as Table 5 description reports clustered standard errors).
Even though pooled OLS can be the usual approach in your research field, methodologically speaking it is not the first choice when you have a panel dataset. Besides, Authors mentioned multivariate pooled OLS, whereas they shoud have stated multiple pooled OLS, as they have one regressand only and simply increased thenumber of predictors.

Kind regards,
Carlo
(Stata 19.0)
Comment
Wojciech Gulkowski

Join Date: Sep 2020

Posts: 22
#15

13 Oct 2020, 03:35

Thank you for reply.

Indeed, I checked with -xtoverid- function and it indicates I should be using FE if I'm not mistaken:

Code:

. xtoverid Test of overidentifying restrictions: fixed vs random effects Cross-section time-series model: xtreg re robust cluster(id) Sargan-Hansen statistic 137.090 Chi-sq(15) P-value = 0.0000

The paper is available here for everyone to download: https://papers.ssrn.com/sol3/papers....act_id=1571614 so I'm pretty sure I'm not doing anything illegal. As for the research itself, could you have a look at Table 4 (that is my main interest in that paper)? I'm particularly interested in different specifications (1-6) that differ from each other by the used fixed effects that are reported in Table 4. Referring to this table can I assume that the authors used commands and specifications as I type below (shown for my particular case i.e. my research)?

Specification 1 (no FE):

Code:

xtreg ln_cash lit_risk size lev mtb nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div, re

Specification 2 (Year FE):

Code:

xtreg ln_cash lit_risk size lev mtb nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.year, fe vce(cluster id)

Specification 3 (Industry FE):

Code:

xtreg ln_cash lit_risk size lev mtb nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry.numeric, re vce(cluster id)

Specification 4 (Industry and Year FE):

Code:

xtreg ln_cash lit_risk size lev mtb nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry.numeric i.year, re vce(cluster id)

Specification 5 (Firm and Year FE):

Code:

xtreg ln_cash lit_risk size lev mtb nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.year, fe vce(cluster id)

They also explain:

Specification 1 is estimated via ordinary least squares (OLS) without industry and time effects. Specification 2 adds time effects in the form of calendar year dummies. Specifications 3 and 4 include industry-fixed effects, with specification 4 including both industry and year dummies.

I would like to estimate such regressions with different specification in order to see how the results change under different specification and if coefficients signs etc. are consistent in all of them. Please let me know if specifications I wrote will give me the wanted results in terms of including fixed effects or no fixed effects. Moreover, under such specification will my estimates be OLS in all of them?

Thank you so much.

Last edited by Wojciech Gulkowski; 13 Oct 2020, 03:47.
Comment

Announcement