correct modeling of fixed effect regression with stata (how to cluster?)

Hannes Schneider

Join Date: Mar 2019

Posts: 5
#1

correct modeling of fixed effect regression with stata (how to cluster?)

14 Mar 2019, 09:04

Dear forum users,

I’m currently looking for some wisdom concerning the fixed modeling of my master thesis hypothesis. Since my professor is currently not available I’m thankful for any insightful suggestions regarding the correct modeling using fixed effects models.

My research focusses on the local influence of religiosity on Corporate Social Responsibility strategies of firms. Im trying to support the hypothesis that the level of religiosity in US-states influences the CSR-activities of firms. The study design is similar to McGuire et al. (2012): Does Local Religiosity Impact Corporate Social Responsibility and Kim et al. (2018): Local Religiosity and Corporate Social Responsibility; both in SSRN.

The dependent variable is a percentage score (ES_SCORE) which captures CSR-Performance and is nested within firm level. The independent variable is a religiosity ratio (REL) and is defined as percentage of adherents within a state in comparison to the total population and therefore is nested within state level. In addition I’ve accumulated firm and state level control variables. All variables are observed yearly covering a 8 year time period. In a last step I’ve matched firm level and state level data by identifying firm headquarters and assigning demographic/religiosity data to firms according to the location.

The final data set is unbalanced, in long format, covering observations over a 8 year period which are nested within 444 firms. The firms are located in 34 states and belong to 22 industry sectors. The data set does not include time invariant variables except the state_id and firm_id. Thank’s a lot if you’ve mad it this far!

My professor hasn’t told me much concerning the empirical modeling but insists on fixed effects. Therefore I’ve specified:

xtset firm_id yearly
xtreg es_score rel firmlevelcontrols democontrols, fe

Here comes the question. What is the correct approach for clustering standard errors? If I dont use vce(cluster) or robust option results are significant at the 5% level. If cluster on firm or industry, (using vce(cluster=comp_ID) or using vce(cluster=industry) I get similar results: (P>ItI=0.13)). Contrary if I cluster on state (vce(cluster=state)), I get significant results at the 10% level. Or should

1. Do you agree with this approach of fe-modeling? Is it reasonable to cluster on state, or should I cluster on firm level? Theoretically clustering on industry level is also possible. I’ve read that in nested data sets, clustering on the highest aggregation is plausible.
2. Im somewhat confused regarding xtset and vce(cluster) commands. By using ‚xtset firm_id yearly‘ I include firm fixed effects, but not time fixed effects, right? And by using vce(cluster=?) I account for cross-sectional dependencies of the standard errors according to state, industry or firm?
3. Which additional tests should I run? (did a hausman test before, which indicated fe over re; be aware that I’m a stata beginner, so I’d appreciate if you could add codes)
4. What are the limitations of this model? I’m aware that including xtreg fe vce(cluster=state) option should account for autocorrelation and heteroscedasticity of standard errors, but since cluster size is limited (34 for states, 22 for industry) my standard errors might be biased. Also the firms are not evenly distributed across the states or industry (some states contain up to 40 firms, while some only include 4-5) which further increases bias.
5. What is the correct Interpretation of the coefficients, since fe regression focusses on within estimation. Is ‚A one standard deviation increase in REL leads to a 0.38 standard deviation increase in ES-score‘ applicable?

Again, thanks a lot. Have a wonderful day!

Last edited by Hannes Schneider; 14 Mar 2019, 09:12.
Tags: None
Hannes Schneider

Join Date: Mar 2019

Posts: 5
#2

14 Mar 2019, 09:10

These are the stata outputs.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17743
#3

14 Mar 2019, 09:47

Hannes:
unfortunately no Stata output is reported in your last post (hint: please use CODE delimiters, not screnshots. Thanks).
If you have detected/suspect autorcorrelation and/or heteroskedasticity, you should cluster your standard error on -panelid-. With 444 firms, clusters are enough to give reasonable standard errors. Obviously, clustering shoud not be a way to get significant p-value.
Besides, you cannot run -hausman- and then invoke non-default standard errors, but invoke them first and then test -re- specification with the user-written command -xtoverid- (type -search xtoverid- from within Stata to spot and install it). If the -xtoverid- outcome reaches statistical significance, go -fe-.
An example/excerpt of your data, along with the code you typed and Stata outcome would make futher replies easier.

Kind regards,
Carlo
(Stata 19.0)
Comment

Hannes Schneider

Join Date: Mar 2019
Posts: 5

15 Mar 2019, 03:28

Hi Carlo,
thank you for pointing that out. I've followed your advice and will present an example of my data as well as some outputs after running regressions.
ES_SCORE is the dependend variable, REL_linear ist the independend variable.
lnTA, ROA, LEV are firmcontrols while age, mf, edu, lnincome, married, minority, lnTP are demographic controls (shown to be correlated to the religiosity ratio, therefore I included them following McGuire et al. (2012): Does Local Religiosity Impact Corporate Social Responsibility).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float comp_ID str14 state int year double(ES_SCORE REL_linear lnTA ROA LEV age mf edu lnincome married minority lnTP)
43 "New York"       2009 .7824000000000001  .5219489020237461 18.61220152450666  .0148791584440463 .4515106226622299 37.7              .943               .318   10.3298657822245 .461   .329104521564572  16.77598170086138
43 "New York"       2010 .7595500000000002  .5113845603636348 18.78285553596167  .0278881965957743 .4968359497371994 37.7              .937               .321 10.340063655669345 .463  .3362159844807151 16.781050856365052
43 "New York"       2011 .7800499999999999  .5003347319205022 18.82922111857746  .0324134997540907 .4392138878919594 37.8              .938               .325 10.367095774693004 .459  .3384877918075469 16.787276520107326
43 "New York"       2012 .7665000000000001  .4899735949452059 18.83068221386169  .0294195723444074 .4168978378306633   38 .9390000000000001 .32699999999999996 10.376735911946753 .454 .33971618390952735 16.792335662179912
43 "New York"       2013 .6615000000000001 .48000687786448754 18.83233996222386  .0351946571966183 .4027840351946572 38.1               .94 .33199999999999996 10.385357991846698  .45  .3442145408030655 16.796764220462435
43 "New York"       2014             .6241  .4707469461776775 18.87171177633134  .0371927410776277 .3954443828705739 38.1               .94               .332 10.399067550086553 .447  .3501427198582447 16.799857261201506
43 "New York"       2015 .6872999999999999 .46190030882450367 18.88411911900268  .0318521827206784 .3352060042905765 38.1 .9420000000000001  .3420805393354096 10.411388904784872 .445  .3542151866292648   16.8021691398047
43 "New York"       2016 .6972499999999999   .453752915238266 18.86893071888587  .0337576729242385 .3501983303205861 38.2 .9420000000000001 .34740944646148875 10.440331738700673 .459  .3569010964207207   16.8030234447167
44 "Wisconsin"      2009             .3391  .5409632890618606 16.94551766022569 -.0070475648878983 .1390588114736719 37.8              .986               .254  10.18289800930364 .528 .12278289537130631 15.550569861281934
44 "Wisconsin"      2010            .35105  .5355406286690064 16.89675957543349 -.0013948206879445 .1450952730418759 38.1 .9840000000000001               .257 10.189568343620936  .53 .12830840729790471 15.554291629605014
44 "Wisconsin"      2011             .1601  .5306905134624134 16.90310238292803  .0052374504412176 .1683780086650301 38.3              .985               .261 10.210678091375952 .526 .12777081579475552  15.55699586243852
44 "Wisconsin"      2012            .27325  .5258799438916637 16.97198892797578  .0073911341387324 .1422991616688455 38.5              .985               .264 10.219246747559799 .521 .12754335642780767  15.55966728276096
44 "Wisconsin"      2013            .30495  .5210645438566919 17.00297496956638  .0075163083049764 .1580140191159256 38.7              .986               .268  10.22277729773385 .515 .12966317269130492   15.5623899835283
44 "Wisconsin"      2014             .2919  .5163649363132191  17.1047245782985  .0068514856623578 .1863562417608917 38.8              .986               .266 10.236632832397653 .512 .13266984494536999  15.56493160569988
44 "Wisconsin"      2015             .3396   .512233443014134 17.13748509854819  .0064684057067826 .1406217227834682   39              .986  .2782501647896695 10.252029513244672 .509 .13496642440410045 15.566403587239947
44 "Wisconsin"      2016 .3653500000000001  .5077003185737816 17.18759885124845  .0064965494212887 .1322553395644338 39.1              .987 .28373438677320306 10.283737411503228 .526 .13790318965148735 15.568688056666202
64 "North Carolina" 2009 .8033500000000001 .47354802326371387 21.52225796580774 -.0009940183484093 .3439618332936775 36.6              .958               .258  10.10834492616302 .517  .2951992133283144 16.061479372489373
64 "North Carolina" 2010             .7338  .4769370374505692 21.54080041847386 -.0015890263140815 .3327957105561415 37.1              .951               .261 10.116378727385987 .514   .303893528956083 16.074587447640866
64 "North Carolina" 2011 .8201500000000002  .4820274885283361 21.46379600158594  .0000400563268253 .2970019269954369 37.3               .95               .266 10.136819030237827  .51  .3034152353351872 16.083808507710383
64 "North Carolina" 2012            .79705  .4868423202610192  21.5011842205122  .0012669182741247 .2754215098652418 37.4               .95               .268 10.137966613514083 .504  .3022115202568583  16.09332118251785
64 "North Carolina" 2013            .77065  .4914593395285108 21.45077434680144  .0048697604783187 .2385501447642587 37.6               .95               .273 10.137927063592189 .496  .3029297364729189 16.102962926671708
64 "North Carolina" 2014 .8229500000000001  .4961465060415485 21.45384366791442  .0018248983759416 .2290577378773576 37.8               .95               .272 10.150660081649699 .492   .304141622835154 16.112194272025025
64 "North Carolina" 2015            .90035  .5002865033043481 21.47461211187266  .0067952857086249 .2071621037695084   38               .95  .2836037124403761  10.16277015046621  .49 .30527174652193073 16.122263851926427
64 "North Carolina" 2016            .89495  .5036337333948101 21.49729045362023  .0074817521614258 .1895607790909376 38.3 .9470000000000001  .2902812583773744 10.195373277248482  .51 .30761149876046545 16.133643061175512
end

This is the output if I cluster on state.

Code:

. xtset comp_ID year
       panel variable:  comp_ID (strongly balanced)
        time variable:  year, 2009 to 2016
                delta:  1 unit

. xtreg ES_SCORE REL_linear lnTA ROA LEV lnTP age mf edu lnincome married minori
> ty, fe vce(cluster state)

Fixed-effects (within) regression               Number of obs      =      3499
Group variable: comp_ID                         Number of groups   =       444

R-sq:  within  = 0.3784                         Obs per group: min =         4
       between = 0.0041                                        avg =       7.9
       overall = 0.0087                                        max =         8

                                                F(11,33)           =     80.11
corr(u_i, Xb)  = -0.8066                        Prob > F           =    0.0000

                                 (Std. Err. adjusted for 34 clusters in state)
------------------------------------------------------------------------------
             |               Robust
    ES_SCORE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  REL_linear |   .3800766   .2094661     1.81   0.079    -.0460854    .8062386
        lnTA |   .0441458   .0098538     4.48   0.000      .024098    .0641936
         ROA |   .0497631   .0282956     1.76   0.088    -.0078048     .107331
         LEV |  -.0040003   .0262363    -0.15   0.880    -.0573785    .0493778
        lnTP |   .4405484   .1561815     2.82   0.008     .1227948     .758302
         age |   .0203106   .0195119     1.04   0.305    -.0193867    .0600079
          mf |  -.3180113   .7557162    -0.42   0.677    -1.855527    1.219505
         edu |    3.91578   1.221147     3.21   0.003     1.431338    6.400221
    lnincome |   .3708798    .248566     1.49   0.145    -.1348315    .8765911
     married |   2.793456    .390572     7.15   0.000     1.998831    3.588081
    minority |   .7728751   .4415952     1.75   0.089    -.1255571    1.671307
       _cons |  -14.57876   3.889656    -3.75   0.001    -22.49233   -6.665198
-------------+----------------------------------------------------------------
     sigma_u |  .47428435
     sigma_e |    .083146
         rho |  .97018334   (fraction of variance due to u_i)
------------------------------------------------------------------------------

And this is the output when clustering on firm.

Code:

. xtreg ES_SCORE REL_linear lnTA ROA LEV lnTP age mf edu lnincome married minority, fe vce(cluster company)

Fixed-effects (within) regression               Number of obs      =      3499
Group variable: comp_ID                         Number of groups   =       444

R-sq:  within  = 0.3784                         Obs per group: min =         4
       between = 0.0041                                        avg =       7.9
       overall = 0.0087                                        max =         8

                                                F(11,443)          =     74.77
corr(u_i, Xb)  = -0.8066                        Prob > F           =    0.0000

                              (Std. Err. adjusted for 444 clusters in company)
------------------------------------------------------------------------------
             |               Robust
    ES_SCORE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  REL_linear |   .3800766   .2534994     1.50   0.135    -.1181343    .8782875
        lnTA |   .0441458   .0097148     4.54   0.000      .025053    .0632387
         ROA |   .0497631   .0281476     1.77   0.078    -.0055563    .1050826
         LEV |  -.0040003   .0236451    -0.17   0.866    -.0504708    .0424701
        lnTP |   .4405484   .2124728     2.07   0.039     .0229685    .8581283
         age |   .0203106    .017299     1.17   0.241    -.0136877    .0543089
          mf |  -.3180113   .8920337    -0.36   0.722    -2.071155    1.435132
         edu |    3.91578   .7498717     5.22   0.000     2.442032    5.389528
    lnincome |   .3708798   .2077685     1.79   0.075    -.0374545    .7792141
     married |   2.793456    .302732     9.23   0.000     2.198487    3.388425
    minority |   .7728751   .4093368     1.89   0.060    -.0316083    1.577358
       _cons |  -14.57876   3.468899    -4.20   0.000     -21.3963   -7.761219
-------------+----------------------------------------------------------------
     sigma_u |  .47428435
     sigma_e |    .083146
         rho |  .97018334   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17743
#5

15 Mar 2019, 03:51

Hannes:
thanks for provising further detail the right way.
As per your results, I would stay with the -vce(cluster panelid)- model, that makes more sense given the really high number of clusters in your dataset.

Kind regards,
Carlo
(Stata 19.0)
Comment

Hannes Schneider

Join Date: Mar 2019
Posts: 5

15 Mar 2019, 04:10

Furthermore here's my approach for the hausman test.

Code:

 
xtreg ES_SCORE REL_linear lnTA ROA LEV lnTP age mf edu lnincome married minority, fe
estimates store fixed
xtreg ES_SCORE REL_linear lnTA ROA LEV lnTP age mf edu lnincome married minority, re
estimates store random

. hausman fixed random

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |     fixed        random       Difference          S.E.
-------------+----------------------------------------------------------------
  REL_linear |    .3800766     .3994458       -.0193692        .0924612
        lnTA |    .0441458     .0586081       -.0144622        .0029781
         ROA |    .0497631     .0742549       -.0244917               .
         LEV |   -.0040003    -.0010938       -.0029065        .0025281
        lnTP |    .4405484     .1059638        .3345846        .1445578
         age |    .0203106     .0669563       -.0466457        .0089685
          mf |   -.3180113     2.935166       -3.253177        .4994923
         edu |     3.91578     1.930038        1.985742        .5133419
    lnincome |    .3708798     .3288518         .042028        .1021415
     married |    2.793456     2.670166        .1232898        .1131671
    minority |    .7728751     .3620802        .4107948        .2172554
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                 chi2(11) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =      123.39
                Prob>chi2 =      0.0000
                (V_b-V_B is not positive definite)

I also tried out implementing xtoverid, using the same approach.

Code:

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re   
Sargan-Hansen statistic 241.780  Chi-sq(11)   P-value = 0.0000

Therefore I assume using FE ist the appropriate approach?

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17743
#7

15 Mar 2019, 04:26

Hannes:
if, as I think, you have run -xtoverid- on a -re- model with clustered standard errors, as per its outcome you're right in preferring the -fe- specification for your dataset. Please note that, with default standard errors, -xtoverid- gives the same results as -hausman- (but -hausman- results with default standard errors are not anymore valid if you then impose non-default standard errors).

Kind regards,
Carlo
(Stata 19.0)
Comment
Hannes Schneider

Join Date: Mar 2019

Posts: 5
#8

15 Mar 2019, 04:52

Dear Carlo,
Your comments are helping a lot. Thanks! Actually I've tested both approaches and in both cases (xtoverid with and without clustering on firm level) and the results lead to FE. Regarding my FE regression results I'm somewhat disappointed that I have to cluster on firm level, since that way I can only proof 'borderline significance' (p>ItI =0,135), but I understand your reasoning. Is it safe to assume that if my sample would include all US-states (then clusters>50), clustering on country level is a valid option or is this approach just flawed in itself?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17743
#9

15 Mar 2019, 06:03

Hannes:
the general rule is that clustering should occurr on -panelid-.
Maybe you can add -i.state- as a predictor (in all likelihood, you'll have a share of US states omitted due to collinearity, but you can give it a try notwithstanding).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement