Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Industry/Year Fixed Effects (Panel Data)

    Hi,

    Let me first say that I'm Stata-beginner and would appreciate your help. I have read many threads about the topic already but I'm still unsure how to apply the mentioned effects specifically to my data set. I'm using Stata 16.1 version.

    I'm investigating the relationship between cash holdings (dependent variable) and litigation risk (independent variable) using another 14 independent variables as control variables. My sample consists of S&P 500 companies (excluding financials and utilities, a total of 351 firms) in the period of 2010-2019. I've run some tests on my data: Hausman test (indicated I should use fixed effects), Breush Pagan test (indicated heteroscedasticity) and Woolridge test (indicated autocorrelation). Taking these tests into account I'm using fixed effects and clustered standard errors.

    In line with what I read on this forum I used i.industry i.year specifications but the problem here is that industry is a string variable. Since industry is based on SIC number (also included in my data) I decided to use i.sic i.year specification instead. However as you can see all of my sic codes were omitted due to collinearity problem. Also I don't understand why i.year starts from 2011 and ends at 2018 and is not from 2010 to 2019? In any case I guess that my attempt to incorporate both industry and year fixed effect does not work well and needs some correction. I'm pasting my output from Stata below (I'm sorry for pasting like this but there seems to be some limit and dataex didn't work for me).

    . xtset id year
    panel variable: id (strongly balanced)
    time variable: year, 2010 to 2019
    delta: 1 unit

    .
    . xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div, fe vce(cluster id)

    Fixed-effects (within) regression Number of obs = 3,083
    Group variable: id Number of groups = 351

    R-sq: Obs per group:
    within = 0.2541 min = 1
    between = 0.5638 avg = 8.8
    overall = 0.5253 max = 9

    F(15,350) = 30.05
    corr(u_i, Xb) = 0.2648 Prob > F = 0.0000

    (Std. Err. adjusted for 351 clusters in id)
    --------------------------------------------------------------------------------
    | Robust
    ln_cash | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    lit_risk | .0010818 .0005395 2.01 0.046 .0000208 .0021429
    size | .6542184 .0506476 12.92 0.000 .5546065 .7538304
    mtb | .0520193 .0123956 4.20 0.000 .0276401 .0763986
    lev | -.8306861 .2475409 -3.36 0.001 -1.317541 -.3438313
    nwc | -1.271489 .2906379 -4.37 0.000 -1.843105 -.6998721
    rd | -.2963804 .1064015 -2.79 0.006 -.5056472 -.0871136
    growth | -.2076923 .1445206 -1.44 0.152 -.4919303 .0765457
    cf | -.1704632 .2847327 -0.60 0.550 -.7304655 .3895392
    cf_vol_5y | 1.829619 .7456975 2.45 0.015 .3630073 3.296231
    industry_sigma | 4.291135 2.156647 1.99 0.047 .0495159 8.532753
    acq | -2.505081 .2492455 -10.05 0.000 -2.995289 -2.014874
    capex | -3.471404 .8872684 -3.91 0.000 -5.216452 -1.726355
    ndi | 1.698586 .275659 6.16 0.000 1.15643 2.240743
    nei | .9345293 .2447038 3.82 0.000 .4532545 1.415804
    div | -.0070305 .0713867 -0.10 0.922 -.1474314 .1333704
    _cons | 1.013231 .4810271 2.11 0.036 .0671633 1.959298
    ---------------+----------------------------------------------------------------
    sigma_u | 1.0028185
    sigma_e | .48667142
    rho | .80937608 (fraction of variance due to u_i)
    --------------------------------------------------------------------------------

    . xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry i.year, fe vce(cluster id)
    industry: string variables may not be used as factor variables
    r(109);

    . xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.sic i.year, fe vce(cluster id)
    note: 1311.sic omitted because of collinearity
    note: 1381.sic omitted because of collinearity
    note: 1389.sic omitted because of collinearity
    note: 1400.sic omitted because of collinearity
    note: 1531.sic omitted because of collinearity
    note: 1600.sic omitted because of collinearity
    note: 1731.sic omitted because of collinearity
    note: 2000.sic omitted because of collinearity
    note: 2011.sic omitted because of collinearity
    note: 2030.sic omitted because of collinearity
    note: 2033.sic omitted because of collinearity
    note: 2040.sic omitted because of collinearity
    note: 2052.sic omitted because of collinearity
    note: 2060.sic omitted because of collinearity
    note: 2070.sic omitted because of collinearity
    note: 2080.sic omitted because of collinearity
    note: 2082.sic omitted because of collinearity
    note: 2085.sic omitted because of collinearity
    note: 2086.sic omitted because of collinearity
    note: 2090.sic omitted because of collinearity
    note: 2111.sic omitted because of collinearity
    note: 2273.sic omitted because of collinearity
    note: 2300.sic omitted because of collinearity
    note: 2320.sic omitted because of collinearity
    note: 2400.sic omitted because of collinearity
    note: 2430.sic omitted because of collinearity
    note: 2510.sic omitted because of collinearity
    note: 2621.sic omitted because of collinearity
    note: 2631.sic omitted because of collinearity
    note: 2650.sic omitted because of collinearity
    note: 2670.sic omitted because of collinearity
    note: 2810.sic omitted because of collinearity
    note: 2820.sic omitted because of collinearity
    note: 2821.sic omitted because of collinearity
    note: 2834.sic omitted because of collinearity
    note: 2835.sic omitted because of collinearity
    note: 2836.sic omitted because of collinearity
    note: 2840.sic omitted because of collinearity
    note: 2842.sic omitted because of collinearity
    note: 2844.sic omitted because of collinearity
    note: 2851.sic omitted because of collinearity
    note: 2860.sic omitted because of collinearity
    note: 2870.sic omitted because of collinearity
    note: 2911.sic omitted because of collinearity
    note: 3011.sic omitted because of collinearity
    note: 3021.sic omitted because of collinearity
    note: 3081.sic omitted because of collinearity
    note: 3100.sic omitted because of collinearity
    note: 3312.sic omitted because of collinearity
    note: 3411.sic omitted because of collinearity
    note: 3420.sic omitted because of collinearity
    note: 3430.sic omitted because of collinearity
    note: 3490.sic omitted because of collinearity
    note: 3510.sic omitted because of collinearity
    note: 3511.sic omitted because of collinearity
    note: 3523.sic omitted because of collinearity
    note: 3531.sic omitted because of collinearity
    note: 3533.sic omitted because of collinearity
    note: 3540.sic omitted because of collinearity
    note: 3559.sic omitted because of collinearity
    note: 3560.sic omitted because of collinearity
    note: 3561.sic omitted because of collinearity
    note: 3570.sic omitted because of collinearity
    note: 3572.sic omitted because of collinearity
    note: 3576.sic omitted because of collinearity
    note: 3577.sic omitted because of collinearity
    note: 3585.sic omitted because of collinearity
    note: 3620.sic omitted because of collinearity
    note: 3630.sic omitted because of collinearity
    note: 3663.sic omitted because of collinearity
    note: 3674.sic omitted because of collinearity
    note: 3678.sic omitted because of collinearity
    note: 3679.sic omitted because of collinearity
    note: 3711.sic omitted because of collinearity
    note: 3714.sic omitted because of collinearity
    note: 3721.sic omitted because of collinearity
    note: 3724.sic omitted because of collinearity
    note: 3728.sic omitted because of collinearity
    note: 3730.sic omitted because of collinearity
    note: 3751.sic omitted because of collinearity
    note: 3760.sic omitted because of collinearity
    note: 3812.sic omitted because of collinearity
    note: 3823.sic omitted because of collinearity
    note: 3825.sic omitted because of collinearity
    note: 3826.sic omitted because of collinearity
    note: 3827.sic omitted because of collinearity
    note: 3841.sic omitted because of collinearity
    note: 3842.sic omitted because of collinearity
    note: 3843.sic omitted because of collinearity
    note: 3844.sic omitted because of collinearity
    note: 3845.sic omitted because of collinearity
    note: 3851.sic omitted because of collinearity
    note: 3942.sic omitted because of collinearity
    note: 3944.sic omitted because of collinearity
    note: 3990.sic omitted because of collinearity
    note: 4011.sic omitted because of collinearity
    note: 4210.sic omitted because of collinearity
    note: 4213.sic omitted because of collinearity
    note: 4225.sic omitted because of collinearity
    note: 4400.sic omitted because of collinearity
    note: 4512.sic omitted because of collinearity
    note: 4513.sic omitted because of collinearity
    note: 4700.sic omitted because of collinearity
    note: 4731.sic omitted because of collinearity
    note: 4812.sic omitted because of collinearity
    note: 4813.sic omitted because of collinearity
    note: 4841.sic omitted because of collinearity
    note: 4888.sic omitted because of collinearity
    note: 4899.sic omitted because of collinearity
    note: 5000.sic omitted because of collinearity
    note: 5010.sic omitted because of collinearity
    note: 5013.sic omitted because of collinearity
    note: 5047.sic omitted because of collinearity
    note: 5122.sic omitted because of collinearity
    note: 5140.sic omitted because of collinearity
    note: 5200.sic omitted because of collinearity
    note: 5211.sic omitted because of collinearity
    note: 5311.sic omitted because of collinearity
    note: 5331.sic omitted because of collinearity
    note: 5399.sic omitted because of collinearity
    note: 5411.sic omitted because of collinearity
    note: 5500.sic omitted because of collinearity
    note: 5531.sic omitted because of collinearity
    note: 5600.sic omitted because of collinearity
    note: 5651.sic omitted because of collinearity
    note: 5661.sic omitted because of collinearity
    note: 5731.sic omitted because of collinearity
    note: 5812.sic omitted because of collinearity
    note: 5912.sic omitted because of collinearity
    note: 5944.sic omitted because of collinearity
    note: 5961.sic omitted because of collinearity
    note: 5990.sic omitted because of collinearity
    note: 7011.sic omitted because of collinearity
    note: 7200.sic omitted because of collinearity
    note: 7311.sic omitted because of collinearity
    note: 7323.sic omitted because of collinearity
    note: 7340.sic omitted because of collinearity
    note: 7350.sic omitted because of collinearity
    note: 7363.sic omitted because of collinearity
    note: 7370.sic omitted because of collinearity
    note: 7372.sic omitted because of collinearity
    note: 7373.sic omitted because of collinearity
    note: 7374.sic omitted because of collinearity
    note: 7389.sic omitted because of collinearity
    note: 7841.sic omitted because of collinearity
    note: 7990.sic omitted because of collinearity
    note: 8062.sic omitted because of collinearity
    note: 8071.sic omitted because of collinearity
    note: 8090.sic omitted because of collinearity
    note: 8700.sic omitted because of collinearity
    note: 8721.sic omitted because of collinearity
    note: 8731.sic omitted because of collinearity
    note: 8742.sic omitted because of collinearity

    Fixed-effects (within) regression Number of obs = 3,083
    Group variable: id Number of groups = 351

    R-sq: Obs per group:
    within = 0.2696 min = 1
    between = 0.5484 avg = 8.8
    overall = 0.5164 max = 9

    F(23,350) = 23.08
    corr(u_i, Xb) = 0.1569 Prob > F = 0.0000

    (Std. Err. adjusted for 351 clusters in id)
    --------------------------------------------------------------------------------
    | Robust
    ln_cash | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    lit_risk | .001018 .0005239 1.94 0.053 -.0000123 .0020484
    size | .7397585 .0579766 12.76 0.000 .6257322 .8537848
    mtb | .0549462 .0131649 4.17 0.000 .0290541 .0808384
    lev | -.6847657 .2475307 -2.77 0.006 -1.1716 -.197931
    nwc | -1.357777 .2877874 -4.72 0.000 -1.923787 -.7917672
    rd | -.2639219 .1092408 -2.42 0.016 -.4787728 -.0490709
    growth | -.2644579 .1528391 -1.73 0.084 -.5650565 .0361408
    cf | -.0113674 .286946 -0.04 0.968 -.5757228 .5529879
    cf_vol_5y | 1.987609 .7245799 2.74 0.006 .5625311 3.412688
    industry_sigma | 6.153141 2.241524 2.75 0.006 1.744591 10.56169
    acq | -2.433709 .2453183 -9.92 0.000 -2.916193 -1.951226
    capex | -3.205605 .8809773 -3.64 0.000 -4.93828 -1.47293
    ndi | 1.518168 .2765364 5.49 0.000 .9742856 2.06205
    nei | .7939962 .2483272 3.20 0.002 .3055949 1.282398
    div | .0180821 .0712251 0.25 0.800 -.122001 .1581652
    |
    sic |
    1311 | 0 (omitted)
    1381 | 0 (omitted)
    1389 | 0 (omitted)
    1400 | 0 (omitted)
    1531 | 0 (omitted)
    1600 | 0 (omitted)
    1731 | 0 (omitted)
    2000 | 0 (omitted)
    2011 | 0 (omitted)
    2030 | 0 (omitted)
    2033 | 0 (omitted)
    2040 | 0 (omitted)
    2052 | 0 (omitted)
    2060 | 0 (omitted)
    2070 | 0 (omitted)
    2080 | 0 (omitted)
    2082 | 0 (omitted)
    2085 | 0 (omitted)
    2086 | 0 (omitted)
    2090 | 0 (omitted)
    2111 | 0 (omitted)
    2273 | 0 (omitted)
    2300 | 0 (omitted)
    2320 | 0 (omitted)
    2400 | 0 (omitted)
    2430 | 0 (omitted)
    2510 | 0 (omitted)
    2621 | 0 (omitted)
    2631 | 0 (omitted)
    2650 | 0 (omitted)
    2670 | 0 (omitted)
    2810 | 0 (omitted)
    2820 | 0 (omitted)
    2821 | 0 (omitted)
    2834 | 0 (omitted)
    2835 | 0 (omitted)
    2836 | 0 (omitted)
    2840 | 0 (omitted)
    2842 | 0 (omitted)
    2844 | 0 (omitted)
    2851 | 0 (omitted)
    2860 | 0 (omitted)
    2870 | 0 (omitted)
    2911 | 0 (omitted)
    3011 | 0 (omitted)
    3021 | 0 (omitted)
    3081 | 0 (omitted)
    3100 | 0 (omitted)
    3312 | 0 (omitted)
    3411 | 0 (omitted)
    3420 | 0 (omitted)
    3430 | 0 (omitted)
    3490 | 0 (omitted)
    3510 | 0 (omitted)
    3511 | 0 (omitted)
    3523 | 0 (omitted)
    3531 | 0 (omitted)
    3533 | 0 (omitted)
    3540 | 0 (omitted)
    3559 | 0 (omitted)
    3560 | 0 (omitted)
    3561 | 0 (omitted)
    3570 | 0 (omitted)
    3572 | 0 (omitted)
    3576 | 0 (omitted)
    3577 | 0 (omitted)
    3585 | 0 (omitted)
    3620 | 0 (omitted)
    3630 | 0 (omitted)
    3663 | 0 (omitted)
    3674 | 0 (omitted)
    3678 | 0 (omitted)
    3679 | 0 (omitted)
    3711 | 0 (omitted)
    3714 | 0 (omitted)
    3721 | 0 (omitted)
    3724 | 0 (omitted)
    3728 | 0 (omitted)
    3730 | 0 (omitted)
    3751 | 0 (omitted)
    3760 | 0 (omitted)
    3812 | 0 (omitted)
    3823 | 0 (omitted)
    3825 | 0 (omitted)
    3826 | 0 (omitted)
    3827 | 0 (omitted)
    3841 | 0 (omitted)
    3842 | 0 (omitted)
    3843 | 0 (omitted)
    3844 | 0 (omitted)
    3845 | 0 (omitted)
    3851 | 0 (omitted)
    3942 | 0 (omitted)
    3944 | 0 (omitted)
    3990 | 0 (omitted)
    4011 | 0 (omitted)
    4210 | 0 (omitted)
    4213 | 0 (omitted)
    4225 | 0 (omitted)
    4400 | 0 (omitted)
    4512 | 0 (omitted)
    4513 | 0 (omitted)
    4700 | 0 (omitted)
    4731 | 0 (omitted)
    4812 | 0 (omitted)
    4813 | 0 (omitted)
    4841 | 0 (omitted)
    4888 | 0 (omitted)
    4899 | 0 (omitted)
    5000 | 0 (omitted)
    5010 | 0 (omitted)
    5013 | 0 (omitted)
    5047 | 0 (omitted)
    5122 | 0 (omitted)
    5140 | 0 (omitted)
    5200 | 0 (omitted)
    5211 | 0 (omitted)
    5311 | 0 (omitted)
    5331 | 0 (omitted)
    5399 | 0 (omitted)
    5411 | 0 (omitted)
    5500 | 0 (omitted)
    5531 | 0 (omitted)
    5600 | 0 (omitted)
    5651 | 0 (omitted)
    5661 | 0 (omitted)
    5731 | 0 (omitted)
    5812 | 0 (omitted)
    5912 | 0 (omitted)
    5944 | 0 (omitted)
    5961 | 0 (omitted)
    5990 | 0 (omitted)
    7011 | 0 (omitted)
    7200 | 0 (omitted)
    7311 | 0 (omitted)
    7323 | 0 (omitted)
    7340 | 0 (omitted)
    7350 | 0 (omitted)
    7363 | 0 (omitted)
    7370 | 0 (omitted)
    7372 | 0 (omitted)
    7373 | 0 (omitted)
    7374 | 0 (omitted)
    7389 | 0 (omitted)
    7841 | 0 (omitted)
    7990 | 0 (omitted)
    8062 | 0 (omitted)
    8071 | 0 (omitted)
    8090 | 0 (omitted)
    8700 | 0 (omitted)
    8721 | 0 (omitted)
    8731 | 0 (omitted)
    8742 | 0 (omitted)
    |
    year |
    2011 | -.0442112 .0304748 -1.45 0.148 -.104148 .0157256
    2012 | .0400003 .0355404 1.13 0.261 -.0298994 .1099
    2013 | .0454666 .0405875 1.12 0.263 -.0343595 .1252927
    2014 | .0194234 .0475414 0.41 0.683 -.0740794 .1129263
    2015 | -.0739048 .0506218 -1.46 0.145 -.1734661 .0256564
    2016 | -.0041762 .051068 -0.08 0.935 -.104615 .0962627
    2017 | -.0587983 .0542472 -1.08 0.279 -.1654898 .0478933
    2018 | -.2163473 .0594791 -3.64 0.000 -.3333286 -.099366
    |
    _cons | .0836588 .5586551 0.15 0.881 -1.015084 1.182402
    ---------------+----------------------------------------------------------------
    sigma_u | .9933031
    sigma_e | .48230923
    rho | .80921242 (fraction of variance due to u_i)
    --------------------------------------------------------------------------------

    . dataex
    input statement exceeds linesize limit. Try specifying fewer variables
    r(1000);

    . help dataex

    .

    I would appreciate help on how to best handle this problem. Do I understand right that if I clustered standard errors at industry level that would mean that I'm including industry fixed effects? And since year was my time variable in xtset I'm also including year fixed effects? However, I'm following the paper of Malm and Kanuri (2016) where they included both industry and fixed effects but clustered standard errors at firm level (id variable in my case). Therefore I would prefer to use this specification - that is cluster at firm level, and then add i.year i.industry to my xtreg command.

    (I would also appreciate help on how to use dataex. Even when I used only the outputs for xtset and xtreg - first two commands - dataext said they exceed the limit)

    Thank you.




  • #2
    Wojcieck:
    welocme to this forum.
    Soem comments about your post:
    1) if a given variable is perfectly collinear with the -panelid- it will be omitted;
    2) I'm not clear how you were able to use Breush Pagan test for heteroskedasticity (-estat hettest) as it is not allowed after xtreg:
    Code:
    . use "https://www.stata-press.com/data/r16/nlswork.dta"
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . xtreg ln_wage c.age##c.age, fe vce(cluster idcode)
    
    Fixed-effects (within) regression               Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1087                                         min =          1
         between = 0.1006                                         avg =        6.1
         overall = 0.0865                                         max =         15
    
                                                    F(2,4709)         =     507.42
    corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000
    
                                 (Std. Err. adjusted for 4,710 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
                 |
     c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
                 |
           _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
    -------------+----------------------------------------------------------------
         sigma_u |   .4039153
         sigma_e |  .30245467
             rho |  .64073314   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    
    . estat hettest
    estat hettest not valid
    r(321);
    
    .
    ... and -hausman- test with non-default standard errors:
    Code:
    . quietly xtreg ln_wage c.age##c.age, fe vce(cluster idcode)
    
    . estimates store fe
    
    . xtreg ln_wage c.age##c.age, re vce(cluster idcode)
    
    Random-effects GLS regression                   Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1087                                         min =          1
         between = 0.1015                                         avg =        6.1
         overall = 0.0870                                         max =         15
    
                                                    Wald chi2(2)      =    1258.33
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                 (Std. Err. adjusted for 4,710 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0590339   .0041049    14.38   0.000     .0509884    .0670795
                 |
     c.age#c.age |  -.0006758   .0000688    -9.83   0.000    -.0008107    -.000541
                 |
           _cons |   .5479714   .0587198     9.33   0.000     .4328826    .6630601
    -------------+----------------------------------------------------------------
         sigma_u |   .3654049
         sigma_e |  .30245467
             rho |  .59342665   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . estimates store re
    
    . hausman fe re
    hausman cannot be used with vce(robust), vce(cluster cvar), or p-weighted data
    r(198);
    
    .
    That said, you can cluster your standard errors on -panelid- and add -i.industry- and -i.time- among your set of predictors.
    As far as -i.year- is concerned, one of the year is omitted by default as reference category and another one might be omitted due to collinearity.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Dear Carlo,

      Thank you for reply.

      1) Ok, understood.

      2) As for Breush Pagan I run it on re regression with standard errors.

      Here is the output:

      xttest0

      Breusch and Pagan Lagrangian multiplier test for random effects

      ln_cash[id,t] = Xb + u[id] + e[id,t]

      Estimated results:
      | Var sd = sqrt(Var)
      ---------+-----------------------------
      ln_cash | 2.413241 1.553461
      e | .2368491 .4866714
      u | .6351271 .7969486

      Test: Var(u) = 0
      chibar2(01) = 5791.15
      Prob > chibar2 = 0.0000


      From this thread: https://www.statalist.org/forums/for...chibar2-1-0000
      and more specifically, from Jeff's comment: "The Breusch-Pagan statistic tests for the presence of positive serial correlation in the composite error term. That's all it does. "
      I understood that xttest0 tests for heteroskedasticity. If I'm wrong please correct me.
      I know that you wrote here:
      https://www.statalist.org/forums/for...nel-data-model
      that estat hettest doesn't work for panel data that is why I used the xttest0 thinking of it as a substitute. I might have mixed it all up. In any case you also suggested to do "eye inspection" of comparing standard errors with clustered errors and judging by their difference. What size of difference would make clustered errors a better choice? As a person without much econometrics experience it's rather hard to say.

      3) As for Hausman test I run it on stored estimates from regressions with default standard errors. That was before I figured out that I need to use robust standard errors. I assumed that my results from the Hausman test would be still valid with clustered errors, do you think I should run another test (since as you pointed out Hausman is not viable) to decide between fe and re regressions with clustered SE?

      4) Regarding this "That said, you can cluster your standard errors on -panelid- and add -i.industry- and -i.time- among your set of predictors.". Yes I tried to do that, as I mentioned in my post, the issue here is that industry in my data is a string variable and when I tried to add such specification I got:

      . xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry i.year, fe vce(cluster id)
      industry: string variables may not be used as factor variables
      r(109);

      Unless you meant something different than this?

      Thank you again for help.

      Comment


      • #4
        Wojciech (sorry for mispelling your given name in my previous reply):
        1) -xttest0- tests for panel-wise effect in -re- model, not for heteroskedasticity (actually, there are two different Breush-Pagan tests);
        2) the number of clusters should be large enougn (no hard and fast ruloes, though) for clustered standard errors to work properly;
        3) you should impose non-default standard errors before comparing -fe- vs -re- specification (via the community-contributed command -xtoverid-);
        4) see -help decode- f about how to convert -string- variables in numeric format.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Hi,

          Thank you for reply. I have some follow up questions.

          1) Since I cannot run -estat hettest- on my panel data, how should I best check if there is heteroskedasticity in my data? I guess by comparing the default and clustered SE? But again I'm not sure by what criteria I should judge such comparison..

          2) I have 350 clusters so I hope that would be enough for them to work properly

          3) I did as you suggested and used -xtoverid- command. Indeed it confirmed again that I should go with fixed effects regression if I'm correct, here is the output:

          . xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div, re vce(cluster id)

          Random-effects GLS regression Number of obs = 3,083
          Group variable: id Number of groups = 351

          R-sq: Obs per group:
          within = 0.2498 min = 1
          between = 0.6114 avg = 8.8
          overall = 0.5644 max = 9

          Wald chi2(15) = 708.75
          corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

          (Std. Err. adjusted for 351 clusters in id)
          --------------------------------------------------------------------------------
          | Robust
          ln_cash | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
          lit_risk | .0014052 .0007906 1.78 0.076 -.0001444 .0029549
          size | .7320445 .0362889 20.17 0.000 .6609195 .8031694
          mtb | .07182 .0122373 5.87 0.000 .0478354 .0958046
          lev | -1.09048 .2420424 -4.51 0.000 -1.564874 -.6160855
          nwc | -1.383627 .2483779 -5.57 0.000 -1.870438 -.8968148
          rd | .147625 .2203295 0.67 0.503 -.2842129 .5794629
          growth | -.2238729 .1508359 -1.48 0.138 -.5195059 .0717601
          cf | .046749 .3007057 0.16 0.876 -.5426233 .6361213
          cf_vol_5y | 2.61963 .6215614 4.21 0.000 1.401392 3.837868
          industry_sigma | 2.959833 1.814173 1.63 0.103 -.5958803 6.515547
          acq | -2.585341 .2542051 -10.17 0.000 -3.083574 -2.087108
          capex | -4.589984 .7118333 -6.45 0.000 -5.985152 -3.194816
          ndi | 1.814541 .2797496 6.49 0.000 1.266242 2.36284
          nei | .8086857 .2389388 3.38 0.001 .3403743 1.276997
          div | -.0365631 .0644991 -0.57 0.571 -.162979 .0898529
          _cons | .3421295 .3455221 0.99 0.322 -.3350815 1.01934
          ---------------+----------------------------------------------------------------
          sigma_u | .7969486
          sigma_e | .48667142
          rho | .72837666 (fraction of variance due to u_i)
          --------------------------------------------------------------------------------

          . xtoverid

          Test of overidentifying restrictions: fixed vs random effects
          Cross-section time-series model: xtreg re robust cluster(id)
          Sargan-Hansen statistic 110.397 Chi-sq(15) P-value = 0.0000

          I tried to run the test with my final specification as well, but this happened:

          . quietly xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, re vce(cluster id)

          . xtoverid
          1b: operator invalid
          r(198);


          4) I managed to convert industry variable from string to numeric variable that I named industry_numeric. Again there is full collinearity and all industry variables were omitted. My id variable is a number from 1 to 365, one for each company in my sample. So does it mean there is collinearity between id and industry? And more importantly, should I use this specification if all industry_numeric variables were omitted, so to say, is there any difference if I just skip -i.industry_numeric-? Here is the output:

          . . xtreg ln_cash lit_risk_L1 size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, fe vce(cluster id)
          note: 2.industry_numeric omitted because of collinearity
          note: 3.industry_numeric omitted because of collinearity
          note: 4.industry_numeric omitted because of collinearity
          note: 5.industry_numeric omitted because of collinearity
          note: 6.industry_numeric omitted because of collinearity
          note: 7.industry_numeric omitted because of collinearity
          note: 8.industry_numeric omitted because of collinearity

          Fixed-effects (within) regression Number of obs = 3,105
          Group variable: id Number of groups = 350

          R-sq: Obs per group:
          within = 0.2576 min = 4
          between = 0.5525 avg = 8.9
          overall = 0.5168 max = 9

          F(23,349) = 23.20
          corr(u_i, Xb) = 0.1777 Prob > F = 0.0000

          (Std. Err. adjusted for 350 clusters in id)
          ----------------------------------------------------------------------------------
          | Robust
          ln_cash | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          -----------------+----------------------------------------------------------------
          lit_risk_L1 | .001719 .0003873 4.44 0.000 .0009572 .0024807
          size | .7280128 .0690938 10.54 0.000 .5921203 .8639054
          mtb | .0481119 .0124148 3.88 0.000 .0236947 .0725292
          lev | -.756417 .2735459 -2.77 0.006 -1.294423 -.2184112
          nwc | -1.159551 .2937198 -3.95 0.000 -1.737235 -.5818679
          rd | -.0306303 .1282287 -0.24 0.811 -.2828284 .2215679
          growth | -.3528492 .1526895 -2.31 0.021 -.6531565 -.0525418
          cf | .1878638 .2890736 0.65 0.516 -.3806816 .7564092
          cf_vol_5y | 1.735107 .6622261 2.62 0.009 .4326508 3.037563
          industry_sigma | 5.907335 2.403872 2.46 0.014 1.179437 10.63523
          acq | -2.463465 .237868 -10.36 0.000 -2.9313 -1.99563
          capex | -3.41898 .9081676 -3.76 0.000 -5.20515 -1.63281
          ndi | 1.593964 .2837795 5.62 0.000 1.035831 2.152098
          nei | .7494894 .236749 3.17 0.002 .2838553 1.215124
          div | .0149903 .0863881 0.17 0.862 -.1549165 .184897
          |
          industry_numeric |
          Construction | 0 (omitted)
          Manufacturing | 0 (omitted)
          Mining | 0 (omitted)
          Retail Trade | 0 (omitted)
          Services | 0 (omitted)
          Transportation | 0 (omitted)
          Wholesale Trade | 0 (omitted)
          |
          year |
          2012 | .0907854 .0289685 3.13 0.002 .0338106 .1477601
          2013 | .1000274 .0357862 2.80 0.005 .0296437 .1704111
          2014 | .073067 .0475941 1.54 0.126 -.0205403 .1666742
          2015 | -.0145752 .0505702 -0.29 0.773 -.1140358 .0848854
          2016 | .0563425 .0520712 1.08 0.280 -.0460703 .1587553
          2017 | .0040203 .0547841 0.07 0.942 -.1037282 .1117688
          2018 | -.1512117 .0597149 -2.53 0.012 -.268658 -.0337654
          2019 | -.1387516 .0657292 -2.11 0.035 -.2680267 -.0094765
          |
          _cons | .1717959 .6641346 0.26 0.796 -1.134414 1.478006
          -----------------+----------------------------------------------------------------
          sigma_u | .98210691
          sigma_e | .49521909
          rho | .79728316 (fraction of variance due to u_i)
          ----------------------------------------------------------------------------------

          On the sidenote, the collinearity problem doesn't exist when I use random effects regression.

          Many thanks.

          Comment


          • #6
            Wojciech:
            1) a visual inspection is the way to go;
            2) 350 clusters are actually enough;
            3) try:
            Code:
            quietly xi: xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, re vce(cluster id)
            ;
            4) it seems that there's colineraity between -id- and -i.industry-. What you experienced with the -fe- specification is actually expected, because this estimator wipes out all the time-invariant prdictors (usually, the firm belongs to the same industry for all the time soan the panel dataset stretches over). And it also expected that this nuisance does not creep up when you switch to the -re- estimator, as it gives back coefficients for time-invariant regressors, too.
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Hi Carlo,

              I took some break from my analysis to focus on other thesis parts, now I'm at it again. As for point 4) does it mean that I cannot use i.industry in my regression command? I followed other researchers in this field who always mentioned to have included industry and year fixed effects.

              I did check the correlation between the -id- and -industry.numeric- variables as per your suggestion but it doesn't imply that there is any significant correlation:

              . correlate id industry_numeric
              (obs=3,510)

              | id indust~c
              -------------+------------------
              id | 1.0000
              industry_n~c | 0.1167 1.0000

              So, in the end, my question is if I can safely use the regression results that I reported in my previous post or their validity would be nulled by the fact that the industry variables are omitted?

              Thank you.

              Comment


              • #8
                Dear Wojciech:

                As Carlos points out in post #6, point 4): "usually, the firm belongs to the same industry".

                As a result, you will not be able to include both industry and firm level fixed effects. I recommend firm-level fixed effects since there are likely unobserved differences between firms you will want to consider.

                Comment


                • #9
                  Wojciech:
                  as Chris helpfully replied, the main issue does not rest on correlation between -id- and -industry- (by the way: your correlation includes variables, not regression coefficients) but on the way -fe- estimator works: if, as expected, firms do not migrate to different industries as time goes by, there's no way that you can get a coefficient for -industry- (a time-invariant predictor) under the -fe- assumptions, as the mean of a constant is the constant istself and (constant-its mean=0).
                  Instead, it usually makes sense to include -i.timevar- among the set of predictors of your -fe- panel data regression.
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #10
                    Dear Chris and Carlo,

                    Thank you for your valuable replies. I think that I got it only now but please do let me know if I understand correctly. Let's use the below example:

                    xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, fe vce(cluster id)

                    So in this specification the part in bold "fe" corresponds to firm-level fixed effects since -id- is a single firm in my sample? That is why I get collinearity warning when trying to use both i.industry numeric and fe at once, that is both industry and firm-level fixed effects, correct?

                    @Carlo, by -i.timevar- you mean in my case for example -i.year- which I used?

                    Chris, I see you point, but in another similar study (Malm and Kanuri; 2016) to mine they used industry level fixed effects so I think I would stick to that or analyze firm specific effects as well as a separate specification to show how results change, what do you think? Also when I changed my specification now to only reflect industry fixed effects and not firm-specific ones, I get higher R^2 value. It increased from around 0.51 to 0.59. This is my new specification:

                    . xtreg ln_cash lit_risk size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, vce(cluster id)

                    Random-effects GLS regression Number of obs = 3,083
                    Group variable: id Number of groups = 351

                    R-sq: Obs per group:
                    within = 0.2662 min = 1
                    between = 0.6368 avg = 8.8
                    overall = 0.5912 max = 9

                    Wald chi2(30) = 1020.78
                    corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

                    (Std. Err. adjusted for 351 clusters in id)
                    ----------------------------------------------------------------------------------
                    | Robust
                    ln_cash | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                    -----------------+----------------------------------------------------------------
                    lit_risk | .0011825 .0006735 1.76 0.079 -.0001376 .0025026
                    size | .8180861 .0380493 21.50 0.000 .743511 .8926613
                    mtb | .0756164 .0134392 5.63 0.000 .049276 .1019567
                    lev | -.897595 .2408011 -3.73 0.000 -1.369556 -.4256336
                    nwc | -1.491774 .2565722 -5.81 0.000 -1.994646 -.9889015
                    rd | .1128436 .2041966 0.55 0.581 -.2873743 .5130615
                    growth | -.2792344 .1605548 -1.74 0.082 -.593916 .0354473
                    cf | .168324 .2913766 0.58 0.563 -.4027636 .7394116
                    cf_vol_5y | 2.742195 .612251 4.48 0.000 1.542205 3.942185
                    industry_sigma | 5.698774 2.152072 2.65 0.008 1.480791 9.916758
                    acq | -2.519466 .2470148 -10.20 0.000 -3.003606 -2.035326
                    capex | -3.633231 .783425 -4.64 0.000 -5.168716 -2.097747
                    ndi | 1.624915 .2782505 5.84 0.000 1.079554 2.170276
                    nei | .7045048 .2445688 2.88 0.004 .2251587 1.183851
                    div | -.0117488 .064654 -0.18 0.856 -.1384682 .1149706
                    |
                    industry_numeric |
                    Construction | .5622842 .3687984 1.52 0.127 -.1605475 1.285116
                    Manufacturing | .9820577 .3352041 2.93 0.003 .3250697 1.639046
                    Mining | -.0088663 .5076044 -0.02 0.986 -1.003753 .98602
                    Retail Trade | .8018587 .361213 2.22 0.026 .0938942 1.509823
                    Services | .9884298 .3545113 2.79 0.005 .2936004 1.683259
                    Transportation | .2973984 .4240756 0.70 0.483 -.5337746 1.128571
                    Wholesale Trade | .4882454 .3743174 1.30 0.192 -.2454033 1.221894
                    |
                    year |
                    2011 | -.0461177 .0306081 -1.51 0.132 -.1061085 .0138731
                    2012 | .0350944 .0345209 1.02 0.309 -.0325653 .1027541
                    2013 | .0290506 .0385723 0.75 0.451 -.0465496 .1046509
                    2014 | -.002016 .0440031 -0.05 0.963 -.0882605 .0842285
                    2015 | -.0938221 .0461756 -2.03 0.042 -.1843246 -.0033196
                    2016 | -.0285547 .0480308 -0.59 0.552 -.1226933 .0655839
                    2017 | -.0947244 .0495961 -1.91 0.056 -.1919309 .0024821
                    2018 | -.2546612 .0531868 -4.79 0.000 -.3589053 -.150417
                    |
                    _cons | -1.452525 .5727337 -2.54 0.011 -2.575062 -.3299872
                    -----------------+----------------------------------------------------------------
                    sigma_u | .79412345
                    sigma_e | .48230923
                    rho | .73052876 (fraction of variance due to u_i)
                    ----------------------------------------------------------------------------------

                    So it seems to me that all is fine with this specification and I can safely include it in my thesis. Or would you suggest me to check for any other potential issues?

                    I really appreciate your insights.

                    Comment


                    • #11
                      Wojciech:
                      1) the -fe- specification is your first code means firm-wise fixed effect (as panel are composed of firms in your example).
                      You get a warning maessage stating that -i.industry- is perfectly collinear with -firm- because firms belong to the same industry across all the data waves: hence, due to the -fe- machinery, being -industry- a time-invariant predictor, you do not get any coefficient for that regressor.
                      2) Yes, by -i.timevar- I meant -i.year-.
                      3) Since you used clustered standard errors, you should compare via the community-contributed modeule -xtoverid- which one specification (-fe- or -re-) fits your data better.
                      4) While for the -fe- specification you shoud look at the R-within, for the -re- counterpart is the R-between that is informative.
                      5) Hunting for the model with the highest possible R is not the way to go, methodologically speaking. Your research effors should rather aim at giving the fairest and truest view of the data generating process you're investigating.
                      6) If you had used CODE delimiters, your Stata output would have been more readable.
                      Kind regards,
                      Carlo
                      (Stata 18.0 SE)

                      Comment


                      • #12
                        Thank you Carlo.

                        Sorry if I'm asking naive questions but could you confirm though that if I use this specification:

                        . xtreg ln_cash lit_risk_L1 size mtb lev nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry_numeric i.year, vce(cluster id)

                        I will get regression with industry and year fixed effects? I'm asking since as you can see in my previous post in order to perform this regression I have to use Random-effects GLS regression and not FE Regression (for the reasons you explained - impossible to use both firm and industry fixed effect at once). So in fact I'm estimating RE regression with fixed effects obtained by -i.industry.numeric- and -i.year-.

                        On the sidenote, I tried to change my -panel id- to -industry.numeric- so that I could get the industry fixed effects by running -fe- specification but I cannot due to:

                        Code:
                        . xtset industry_numeric year
                        repeated time values within panel
                        r(451);
                        So going back to my first question, is it not a problem that I'm using RE regression to indeed obtain year and industry fixed effects?

                        I hope it is clear what I mean. Thank you.

                        Comment


                        • #13
                          Also I have a question regarding regression itself if you allow me. Other papers examining cash holding and litigation risk report "OLS regression estimates" when dealing with panel data. If they say "OLS estimates" does it mean they are referring to the simple -regress- command? But I think it would not make sense to use -regress- with panel data. I'm trying to run the same experiment so in order to follow it I need to know if possibly "OLS estimates" could refer to running fixed effects or random effects regressions for panel data of -xtreg- kind? I know that GLS and OLS is not the same but then I don't understand why would they use OLS regression => -regress- command? I know it's maybe a long shot and not sure if it is allowed on this forum (hopefully yes) but here in the attachment I'm pasting two similar studies (which are publicly available) to which I'm referring. If by any chance you would be able to tell me if I need to run -regress- or -xtreg- command to follow the work of these researchers (there is evidence pointing to both from my limited understanding) I would be massively grateful.

                          Attached Files

                          Comment


                          • #14
                            Wojciech:
                            # 12: Stata threw an error message saying that observations have repeated time values within panel (this is pretty frequent with financial data: eg, multiple transactions per diem made by the same broker). The usual fix is to -xtset- your dataset with -panelid- only. However, this fix comes at the cost that you cannot use time-series commands, such as lags and leads.
                            You can still include -i.year- as a predictor.
                            That said, while the reason why you prefere going -re- is clear, you should check whether this specification is the right one for your dataset. As you invoked non-default standard errors, you should switch from -hausman- to the community-contributed module -xtoverid-.

                            # 13: I took a look at the first paper you attached (by the way; are you sure that you do not breach any copyright in distributing those articles?). In all likelihood, Authors used a pooled OLS (as Table 5 description reports clustered standard errors).
                            Even though pooled OLS can be the usual approach in your research field, methodologically speaking it is not the first choice when you have a panel dataset. Besides, Authors mentioned multivariate pooled OLS, whereas they shoud have stated multiple pooled OLS, as they have one regressand only and simply increased thenumber of predictors.
                            Kind regards,
                            Carlo
                            (Stata 18.0 SE)

                            Comment


                            • #15
                              Thank you for reply.

                              Indeed, I checked with -xtoverid- function and it indicates I should be using FE if I'm not mistaken:

                              Code:
                              . xtoverid
                              
                              Test of overidentifying restrictions: fixed vs random effects
                              Cross-section time-series model: xtreg re  robust cluster(id)
                              Sargan-Hansen statistic 137.090  Chi-sq(15)   P-value = 0.0000
                              The paper is available here for everyone to download: https://papers.ssrn.com/sol3/papers....act_id=1571614 so I'm pretty sure I'm not doing anything illegal. As for the research itself, could you have a look at Table 4 (that is my main interest in that paper)? I'm particularly interested in different specifications (1-6) that differ from each other by the used fixed effects that are reported in Table 4. Referring to this table can I assume that the authors used commands and specifications as I type below (shown for my particular case i.e. my research)?

                              Specification 1 (no FE):
                              Code:
                              xtreg ln_cash lit_risk size lev mtb nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div, re

                              Specification 2 (Year FE):
                              Code:
                              xtreg ln_cash lit_risk size lev mtb nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.year, fe vce(cluster id)

                              Specification 3 (Industry FE):
                              Code:
                              xtreg ln_cash lit_risk size lev mtb nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry.numeric, re vce(cluster id)

                              Specification 4 (Industry and Year FE):
                              Code:
                              xtreg ln_cash lit_risk size lev mtb nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.industry.numeric i.year, re vce(cluster id)

                              Specification 5 (Firm and Year FE):
                              Code:
                              xtreg ln_cash lit_risk size lev mtb nwc rd growth cf cf_vol_5y industry_sigma acq capex ndi nei div i.year, fe vce(cluster id)
                              They also explain:
                              Specification 1 is estimated via ordinary least squares (OLS) without industry and time effects. Specification 2 adds time effects in the form of calendar year dummies. Specifications 3 and 4 include industry-fixed effects, with specification 4 including both industry and year dummies.
                              I would like to estimate such regressions with different specification in order to see how the results change under different specification and if coefficients signs etc. are consistent in all of them. Please let me know if specifications I wrote will give me the wanted results in terms of including fixed effects or no fixed effects. Moreover, under such specification will my estimates be OLS in all of them?

                              Thank you so much.
                              Last edited by Wojciech Gulkowski; 13 Oct 2020, 03:47.

                              Comment

                              Working...
                              X