Fixed effect model and Panel data

Thanh Trinh

Join Date: Nov 2019
Posts: 6

Fixed effect model and Panel data

25 Nov 2019, 18:32

Hi everyone,

For my master thesis, I am analyzing the impact of the legal system of a country (i.e. common law versus civil law) on the earnings' forecast accuracy of security analysts. My data is composed of 628 firms in 16 countries during 5 years. My model is as follows:

EPA_i,t = β₀ + β₁*LegalSyst_i,t + β₂*LnSize_i,t+ β₃*Cover_i,t + β₄*Loss_i,t + β₅*Flev_i,t + β₆*Roe_i,t+ ε_i,t,where, i and t correspond to the firm i at the year t ; and LegalSyst and Loss are dummy variables.

I ran some diagnostic tests and it seems that a fixed effect model is appropriate. But the problem is that my variable of interest (LegalSyst) is omitted (collinearity + time-invariant, I suppose) with the fixed effect model. Therefore, I cannot examine the effect of the legal system on my dependant variable. I have seen some threads suggesting going for "hybrid models". But I don't know how to perform it because I have basic knowledges of econometrics and Stata/SE 16.0.

(1) Is there another alternatives to fix the problem of omitted variable in order to get an estimated coefficient value ?

I tried to run "xtset CountryID Year" but I got the message "repeated time values within panel data" because I have multiple firms for every Country and Year. Therefore, I went with the following code:

Code:

. xtset EnterpriseID Year
       panel variable:  EnterpriseID (strongly balanced)
        time variable:  Year, 2014 to 2018
                delta:  1 unit

(2) Is this panel variable relevant for my analysis given the fact that I want to control country effect in my model? If no, how can I do it?

(3) Furthermore, for example If I want to analyze jointly 2 common law and 2 civil law countries in my sample, should I use "cluster" ? If yes, could you suggest me the syntax code ? (Note: CountryID is the variable that refers to the country. It can take the value from 1 to 16 depending on the corresponding country)

Code:

. xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, fe
note: LegalSyst omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      3,140
Group variable: EnterpriseID                    Number of groups  =        628

R-sq:                                           Obs per group:
     within  = 0.0704                                         min =          5
     between = 0.0447                                         avg =        5.0
     overall = 0.0331                                         max =          5

                                                F(5,2507)         =      37.99
corr(u_i, Xb)  = -0.7049                        Prob > F          =     0.0000

------------------------------------------------------------------------------
         EPA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   LegalSyst |          0  (omitted)
      LnSize |   -.021504    .006088    -3.53   0.000     -.033442    -.009566
       Cover |  -.0022359   .0005592    -4.00   0.000    -.0033324   -.0011394
        Loss |   .0692554   .0056121    12.34   0.000     .0582506    .0802602
        Flev |  -.0004474   .0008064    -0.55   0.579    -.0020287    .0011339
         Roe |  -.0012693   .0010576    -1.20   0.230    -.0033431    .0008044
       _cons |   .2085857   .0481035     4.34   0.000      .114259    .3029124
-------------+----------------------------------------------------------------
     sigma_u |  .07222027
     sigma_e |  .06471257
         rho |  .55466326   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(627, 2507) = 2.30                   Prob > F = 0.0000

Code:

estimates store fixed

Code:

. xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, re

Random-effects GLS regression                   Number of obs     =      3,140
Group variable: EnterpriseID                    Number of groups  =        628

R-sq:                                           Obs per group:
     within  = 0.0601                                         min =          5
     between = 0.2990                                         avg =        5.0
     overall = 0.1551                                         max =          5

                                                Wald chi2(6)      =     411.26
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
         EPA |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   LegalSyst |   .0219064   .0046445     4.72   0.000     .0128033    .0310094
      LnSize |   .0045568   .0013606     3.35   0.001     .0018902    .0072235
       Cover |  -.0009433   .0002839    -3.32   0.001    -.0014997   -.0003869
        Loss |   .0848819   .0044507    19.07   0.000     .0761587    .0936052
        Flev |   .0006721   .0007244     0.93   0.353    -.0007476    .0020919
         Roe |  -.0008226   .0009642    -0.85   0.394    -.0027124    .0010673
       _cons |  -.0183203   .0088933    -2.06   0.039    -.0357508   -.0008897
-------------+----------------------------------------------------------------
     sigma_u |  .03070813
     sigma_e |  .06471257
         rho |  .18379327   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Code:

estimates store random

Code:

. hausman fixed random

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |     fixed        random       Difference          S.E.
-------------+----------------------------------------------------------------
      LnSize |    -.021504     .0045568       -.0260608         .005934
       Cover |   -.0022359    -.0009433       -.0012926        .0004818
        Loss |    .0692554     .0848819       -.0156266        .0034186
        Flev |   -.0004474     .0006721       -.0011196        .0003544
         Roe |   -.0012693    -.0008226       -.0004468        .0004344
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                  chi2(5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       58.99
                Prob>chi2 =      0.0000

According to Hausman test, I should use a fixed effect model.

Code:

. xtreg EPA LegalSyst LnSize Cover Loss Flev Roe i.Year,fe
note: LegalSyst omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      3,140
Group variable: EnterpriseID                    Number of groups  =        628

R-sq:                                           Obs per group:
     within  = 0.0736                                         min =          5
     between = 0.0412                                         avg =        5.0
     overall = 0.0310                                         max =          5

                                                F(9,2503)         =      22.10
corr(u_i, Xb)  = -0.7333                        Prob > F          =     0.0000

------------------------------------------------------------------------------
         EPA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   LegalSyst |          0  (omitted)
      LnSize |  -.0249718   .0068277    -3.66   0.000    -.0383603   -.0115833
       Cover |   -.001942   .0005912    -3.28   0.001    -.0031013   -.0007827
        Loss |   .0696952   .0056099    12.42   0.000     .0586946    .0806957
        Flev |  -.0004988    .000807    -0.62   0.537    -.0020813    .0010837
         Roe |  -.0012968    .001058    -1.23   0.220    -.0033715     .000778
             |
        Year |
       2015  |   .0057079   .0036651     1.56   0.120    -.0014791    .0128949
       2016  |   .0094027   .0036612     2.57   0.010     .0022234     .016582
       2017  |   .0093605    .003829     2.44   0.015     .0018522    .0168688
       2018  |   .0077649   .0039683     1.96   0.050    -.0000166    .0155465
             |
       _cons |   .2259927   .0527099     4.29   0.000     .1226332    .3293522
-------------+----------------------------------------------------------------
     sigma_u |  .07569337
     sigma_e |  .06465357
         rho |  .57817704   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(627, 2503) = 2.31                   Prob > F = 0.0000

Code:

. testparm i.Year

 ( 1)  2015.Year = 0
 ( 2)  2016.Year = 0
 ( 3)  2017.Year = 0
 ( 4)  2018.Year = 0

       F(  4,  2503) =    2.14
            Prob > F =    0.0729

The Prob>F is > 0.05, therefore no time fixed effects are needed in this case.

Code:

. xttest3

Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (628)  =   9.4e+08
Prob>chi2 =      0.0000

According to this modified Wald test, there is a presence of heteroskedasticity.

I would very appreciate if you could help me. Thanks in advance.

Thanh

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

26 Nov 2019, 00:37

Thanh:
just use cluster or robust standard errors.

Kind regards,
Carlo
(Stata 19.0)
Comment

Thanh Trinh

Join Date: Nov 2019
Posts: 6

26 Nov 2019, 05:25

Thank you for your response Carlo. Since there is a presence of heteroskedasticity and my variable (LegalSyst) is omitted, I ran as you suggested this :

Code:

. regress EPA LegalSyst LnSize Cover Loss Flev Roe, cluster (CountryID)

Linear regression                               Number of obs     =      3,140
                                                F(6, 15)          =      48.31
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1558
                                                Root MSE          =     .07224

                             (Std. Err. adjusted for 16 clusters in CountryID)
------------------------------------------------------------------------------
             |               Robust
         EPA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   LegalSyst |    .021816   .0036234     6.02   0.000     .0140929    .0295392
      LnSize |   .0046782    .001279     3.66   0.002     .0019521    .0074044
       Cover |  -.0008315   .0002366    -3.51   0.003    -.0013358   -.0003271
        Loss |   .0920928   .0261716     3.52   0.003     .0363093    .1478762
        Flev |   .0015158   .0011241     1.35   0.198    -.0008803    .0039118
         Roe |  -.0004429   .0011174    -0.40   0.697    -.0028247    .0019388
       _cons |  -.0221708   .0091211    -2.43   0.028     -.041612   -.0027296
------------------------------------------------------------------------------

(1) As I mentionned in #1 : How can I proceed if I want to compare 2 different pairs of countries in my sample and not all the 16 countries? (Note: CountryID for UK, Ireland, Germany, France = 14, 9, 1, 7, respectively).

Thanks in advance.

Thanh

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

26 Nov 2019, 05:48

Thanh:
I can't follow your last post.
You started with -xtreg-, but now you've seemingly switched to -regress-. Why, if you have panel data?
(1) you can simply -flag- the country you're interested in and then use -if- qualifier:

Code:

gen flag==1 if CountryID==14 | CountryID==9 |CountryID==1 | CountryID==7
quietly xtreg EPA LegalSyst LnSize Cover Loss Flev Roe i.Year if flag==1,fe robust
estimates store fe
quietly xtreg EPA LegalSyst LnSize Cover Loss Flev Roe i.Year if flag==1,re robust
estimates store re
xtoverid
*-xtoverid- is the community-contribute programme that replace -hausman- test when non-default standard error is invoked under -xtreg-*

Kind regards,
Carlo
(Stata 19.0)

Comment

Thanh Trinh

Join Date: Nov 2019
Posts: 6

26 Nov 2019, 07:11

Please excuse me for my last post (#3). Indeed, I should use -xtreg- . But when I ran it my variable (LegalSyst) is still omitted for fixed effect model.

(1) How can I fix this problem? Information: among the 16 countries, only 2 countries are common law (i.e. LegalSyst = 1) and 14 countries are civil law (LegalSyst = 0).

Code:

. xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, fe robust cluster(CountryID)
note: LegalSyst omitted because of collinearity

Fixed-effects (within) regression               Number of obs      =      3140
Group variable: EnterpriseID                    Number of groups   =       628

R-sq:  within  = 0.0704                         Obs per group: min =         5
       between = 0.0447                                        avg =       5.0
       overall = 0.0331                                        max =         5

                                                F(5,15)            =     16.47
corr(u_i, Xb)  = -0.7049                        Prob > F           =    0.0000

                             (Std. Err. adjusted for 16 clusters in CountryID)
------------------------------------------------------------------------------
             |               Robust
         EPA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   LegalSyst |          0  (omitted)
      LnSize |   -.021504   .0131145    -1.64   0.122    -.0494569    .0064489
       Cover |  -.0022359   .0008509    -2.63   0.019    -.0040496   -.0004222
        Loss |   .0692554   .0134871     5.13   0.000     .0405084    .0980024
        Flev |  -.0004474   .0006848    -0.65   0.523    -.0019072    .0010123
         Roe |  -.0012693    .001407    -0.90   0.381    -.0042683    .0017296
       _cons |   .2085857   .1097119     1.90   0.077    -.0252596    .4424311
-------------+----------------------------------------------------------------
     sigma_u |  .07222027
     sigma_e |  .06471257
         rho |  .55466326   (fraction of variance due to u_i)
------------------------------------------------------------------------------

(2) Furthermore, my R-sq (0.0331) is relatively small compare to -xtreg- with random effect (0.1551). Is there something to improve this coefficient of determination?

Code:

. xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, robust cluster(CountryID)

Random-effects GLS regression                   Number of obs     =      3,140
Group variable: EnterpriseID                    Number of groups  =        628

R-sq:                                           Obs per group:
     within  = 0.0601                                         min =          5
     between = 0.2990                                         avg =        5.0
     overall = 0.1551                                         max =          5

                                                Wald chi2(6)      =     190.65
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 16 clusters in CountryID)
------------------------------------------------------------------------------
             |               Robust
         EPA |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   LegalSyst |   .0219064   .0036843     5.95   0.000     .0146853    .0291275
      LnSize |   .0045568   .0013257     3.44   0.001     .0019584    .0071553
       Cover |  -.0009433   .0002518    -3.75   0.000    -.0014367   -.0004499
        Loss |   .0848819   .0219644     3.86   0.000     .0418325    .1279314
        Flev |   .0006721   .0007248     0.93   0.354    -.0007484    .0020927
         Roe |  -.0008226   .0013022    -0.63   0.528    -.0033747    .0017296
       _cons |  -.0183203   .0090572    -2.02   0.043     -.036072   -.0005685
-------------+----------------------------------------------------------------
     sigma_u |  .03070813
     sigma_e |  .06471257
         rho |  .18379327   (fraction of variance due to u_i)
------------------------------------------------------------------------------

(3) I don't know why I got an error message when I typed -flag- command:

Code:

. gen flag==1 if CountryID==14 | CountryID==9 |CountryID==1 | CountryID==7
== invalid name
r(198);

Thanks in advance.

Thanh

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10191
#6

26 Nov 2019, 07:54

For my master thesis, I am analyzing the impact of the legal system of a country (i.e. common law versus civil law) on the earnings' forecast accuracy of security analysts. My data is composed of 628 firms in 16 countries during 5 years.

(3) Furthermore, for example If I want to analyze jointly 2 common law and 2 civil law countries in my sample, should I use "cluster" ? If yes, could you suggest me the syntax code ? (Note: CountryID is the variable that refers to the country. It can take the value from 1 to 16 depending on the corresponding country)

This suggests that your countries either implement common law or civil law, and therefore you cannot estimate that particular coefficient using fixed effects as it is time invariant. Hybrid models do not solve this problem as some would like to think and you clearly reject random effects. Therefore, go back to your supervisor and start from scratch thinking about your research question. If you are interested in a purely descriptive model, just run OLS.
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#7

26 Nov 2019, 07:56

Your variable is most likely to be omitted because it is time invariant, and estimation under the assumption of fixed effects with either least-squares dummy variable or within estimator, will drop time invariant variables. So if your variable of interest is time invariant, then you have to either estimate using random effects, or I suggest the correlated random effects model.

Before going there, I noticed you said that you tested for fixed versus random effects. Did you do this under homoskedastic errors using the Hausman test? If so, you should test under cluster-robust standard errors using either a test of joint significance of the parameters capturing the between effects with cluster-robust standard errors using test after the correlated random effects estimation, or user-written command xtoverid (SSC) after a random effects estimation with cluster-robust standard errors. These tests are asymptotically equivalent and robust to heteroskedasticity and correlation within clusters (panels). The Hausman test is not valid with robust standard errors.

For the correlated random effects model here are the links to two presentations by Jeff Wooldridge:

http://conference.iza.org/conference...linear_iza.pdf

http://conference.iza.org/conference...nonlin_iza.pdf

Another reference is

Schnuck, Reinhard (2013) "Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models", The Stata Journal 13(1), pp. 65-76.

This is available at

https://journals.sagepub.com/doi/pdf...867X1301300105

Alfonso Sanchez-Penalver
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#8

26 Nov 2019, 08:20

Thanh:
(3) my typo, sorry. It should have been:

Code:

. gen flag=1 if CountryID==14 | CountryID==9 |CountryID==1 | CountryID==7

That said, on the same line of Andrew and Alfonso's comments, you're experiencing -fe- 's hunger for time-invariant predictors: this is the main drawback of this specification.
As far as R-sqs are concerned, you should look at within and between R-sq for -fe- and -re- specification, respectively.
That said, your dataset shows a limited within panel variation, as you can see from the non-significant coefficients and the low R-sq within.
What did -xtoverid- give you back?

Kind regards,
Carlo
(Stata 19.0)
Comment
Thanh Trinh

Join Date: Nov 2019

Posts: 6
#9

27 Nov 2019, 01:36

Thank you Andrew, Alfonso and Carlo for your time and precious advices. Indeed, the variable (LegalSyst) is time-invariant and hence omitted.

Andrew : At the beginning, I ran a purely descriptive model with OLS but I would like to go further in the analysis with a robustness test. You're right, I think it would be wise to contact my supervisor to see what he suggests.

@Carlo : The -flag- command works fine for me thank you. Concerning the -xtoverid-, after typing "xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, fe robust cluster(CountryID)" and "xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, re robust cluster(CountryID)" (in #5) I used -test- and -xtoverid- which give me the results shown below.

@Alfonso : I have looked at your attached documentations and tried to use a "correlated random effects model" but I am not sure how to implement it correctly in Stata. I used a Hausman test for -xtreg, fe- and -xtreg, re- under "homoskedastic errors". Therefore, I tested again for cluster-robust standard errors :

Code:

. test ( 1) LegalSyst = 0 ( 2) LnSize = 0 ( 3) Cover = 0 ( 4) Loss = 0 ( 5) Flev = 0 ( 6) Roe = 0 chi2( 6) = 190.65 Prob > chi2 = 0.0000

Code:

. xtoverid Test of overidentifying restrictions: fixed vs random effects Cross-section time-series model: xtreg re robust cluster(CountryID) Sargan-Hansen statistic 5.818 Chi-sq(5) P-value = 0.3243

(1) From these results, what can we say?

(2) What potential codes/approach would be appropriate for my research question concerning the impact of the legal system on my dependant variable (robustness test, panel data, ... )?

Thank you in advance.

Thanh
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#10

27 Nov 2019, 01:48

Thanh:
-xtoverid- outcome points you towards -re- specification, that allows you to estimate coefficients for time-invariant predictors, too.
As an aside, please note that -cluster(CountryID)- is enough to invoke clustered robust standard errors (ie, -robust- is redundant).

Last edited by Carlo Lazzaro; 27 Nov 2019, 01:50.

Kind regards,
Carlo
(Stata 19.0)
Comment
Thanh Trinh

Join Date: Nov 2019

Posts: 6
#11

28 Nov 2019, 06:20

Hello,

Thank you Carlo for your reply.

As a reminder, I am interested on the impact of the legal system which depends on a specific country. Each firm is only and only in 1 specific country during T years, but 1 country can have multiple firms. Therefore, (1) What is the most appropriate "combination" for my case ?

(2)

Code:

xtset EntepriseID Year

or

Code:

xtset CountryID Note: "xtset CountryID Year" does not work because of "repeated time values within panel stata"

(3)

Code:

xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, re robust cluster(CountryID)

or

Code:

xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, re robust cluster(EnterpriseID)

(4) What are the differences between these possible combinations?

(5) Last but not least, some papers in my research field also use some dummy variables such as "Year" or "Industry". According to some diagnostic tests that I ran, I should use -xtreg, re cluster(VarID)-. If I decide to add Year dummies for example on my model afterwards, is the code -i.Year- right? After adding it, can we say that it is a fixed effects model even if we use -xtreg, re cluster(VarID)- ?

I am looking forward to more details. Thank you in advance.

Thanh
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#12

28 Nov 2019, 06:46

Thanh:
usually, it's up to posters (not repliers) to give more details.
That said:
(2) if you do not plan to use time-series related commands, such as lags and leads, you can simply:

Code:

xtset EntepriseID

(3) I would go:

Code:

xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, re robust cluster(CountryID)

As per my previous reply, please note that -robust- is redundant if you invoke -cluster-.
(4) The difference, that affects standard errors and related stuff (but not coefficients estimates) rests on the way standard errors are calculated.
(5) I would rather say that you're investigating if -i.year- do contribute to explain variation in the regressand (when adjusted for the remaining predictors).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Fixed effect model and Panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment