Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effect model and Panel data

    Hi everyone,

    For my master thesis, I am analyzing the impact of the legal system of a country (i.e. common law versus civil law) on the earnings' forecast accuracy of security analysts. My data is composed of 628 firms in 16 countries during 5 years. My model is as follows:

    EPAi,t = β0 + β1*LegalSysti,t + β2*LnSizei,t+ β3*Coveri,t + β4*Lossi,t + β5*Flevi,t + β6*Roei,t + εi,t, where, i and t correspond to the firm i at the year t ; and LegalSyst and Loss are dummy variables.

    I ran some diagnostic tests and it seems that a fixed effect model is appropriate. But the problem is that my variable of interest (LegalSyst) is omitted (collinearity + time-invariant, I suppose) with the fixed effect model. Therefore, I cannot examine the effect of the legal system on my dependant variable. I have seen some threads suggesting going for "hybrid models". But I don't know how to perform it because I have basic knowledges of econometrics and Stata/SE 16.0.

    (1) Is there another alternatives to fix the problem of omitted variable in order to get an estimated coefficient value ?

    I tried to run "xtset CountryID Year" but I got the message "repeated time values within panel data" because I have multiple firms for every Country and Year. Therefore, I went with the following code:

    Code:
    . xtset EnterpriseID Year
           panel variable:  EnterpriseID (strongly balanced)
            time variable:  Year, 2014 to 2018
                    delta:  1 unit
    (2) Is this panel variable relevant for my analysis given the fact that I want to control country effect in my model? If no, how can I do it?

    (3) Furthermore, for example If I want to analyze jointly 2 common law and 2 civil law countries in my sample, should I use "cluster" ? If yes, could you suggest me the syntax code ? (Note: CountryID is the variable that refers to the country. It can take the value from 1 to 16 depending on the corresponding country)

    Code:
    . xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, fe
    note: LegalSyst omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =      3,140
    Group variable: EnterpriseID                    Number of groups  =        628
    
    R-sq:                                           Obs per group:
         within  = 0.0704                                         min =          5
         between = 0.0447                                         avg =        5.0
         overall = 0.0331                                         max =          5
    
                                                    F(5,2507)         =      37.99
    corr(u_i, Xb)  = -0.7049                        Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
             EPA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       LegalSyst |          0  (omitted)
          LnSize |   -.021504    .006088    -3.53   0.000     -.033442    -.009566
           Cover |  -.0022359   .0005592    -4.00   0.000    -.0033324   -.0011394
            Loss |   .0692554   .0056121    12.34   0.000     .0582506    .0802602
            Flev |  -.0004474   .0008064    -0.55   0.579    -.0020287    .0011339
             Roe |  -.0012693   .0010576    -1.20   0.230    -.0033431    .0008044
           _cons |   .2085857   .0481035     4.34   0.000      .114259    .3029124
    -------------+----------------------------------------------------------------
         sigma_u |  .07222027
         sigma_e |  .06471257
             rho |  .55466326   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(627, 2507) = 2.30                   Prob > F = 0.0000
    Code:
    estimates store fixed
    Code:
    . xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, re
    
    Random-effects GLS regression                   Number of obs     =      3,140
    Group variable: EnterpriseID                    Number of groups  =        628
    
    R-sq:                                           Obs per group:
         within  = 0.0601                                         min =          5
         between = 0.2990                                         avg =        5.0
         overall = 0.1551                                         max =          5
    
                                                    Wald chi2(6)      =     411.26
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
             EPA |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       LegalSyst |   .0219064   .0046445     4.72   0.000     .0128033    .0310094
          LnSize |   .0045568   .0013606     3.35   0.001     .0018902    .0072235
           Cover |  -.0009433   .0002839    -3.32   0.001    -.0014997   -.0003869
            Loss |   .0848819   .0044507    19.07   0.000     .0761587    .0936052
            Flev |   .0006721   .0007244     0.93   0.353    -.0007476    .0020919
             Roe |  -.0008226   .0009642    -0.85   0.394    -.0027124    .0010673
           _cons |  -.0183203   .0088933    -2.06   0.039    -.0357508   -.0008897
    -------------+----------------------------------------------------------------
         sigma_u |  .03070813
         sigma_e |  .06471257
             rho |  .18379327   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    Code:
    estimates store random
    Code:
    . hausman fixed random
    
                     ---- Coefficients ----
                 |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                 |     fixed        random       Difference          S.E.
    -------------+----------------------------------------------------------------
          LnSize |    -.021504     .0045568       -.0260608         .005934
           Cover |   -.0022359    -.0009433       -.0012926        .0004818
            Loss |    .0692554     .0848819       -.0156266        .0034186
            Flev |   -.0004474     .0006721       -.0011196        .0003544
             Roe |   -.0012693    -.0008226       -.0004468        .0004344
    ------------------------------------------------------------------------------
                               b = consistent under Ho and Ha; obtained from xtreg
                B = inconsistent under Ha, efficient under Ho; obtained from xtreg
    
        Test:  Ho:  difference in coefficients not systematic
    
                      chi2(5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                              =       58.99
                    Prob>chi2 =      0.0000
    According to Hausman test, I should use a fixed effect model.

    Code:
    . xtreg EPA LegalSyst LnSize Cover Loss Flev Roe i.Year,fe
    note: LegalSyst omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =      3,140
    Group variable: EnterpriseID                    Number of groups  =        628
    
    R-sq:                                           Obs per group:
         within  = 0.0736                                         min =          5
         between = 0.0412                                         avg =        5.0
         overall = 0.0310                                         max =          5
    
                                                    F(9,2503)         =      22.10
    corr(u_i, Xb)  = -0.7333                        Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
             EPA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       LegalSyst |          0  (omitted)
          LnSize |  -.0249718   .0068277    -3.66   0.000    -.0383603   -.0115833
           Cover |   -.001942   .0005912    -3.28   0.001    -.0031013   -.0007827
            Loss |   .0696952   .0056099    12.42   0.000     .0586946    .0806957
            Flev |  -.0004988    .000807    -0.62   0.537    -.0020813    .0010837
             Roe |  -.0012968    .001058    -1.23   0.220    -.0033715     .000778
                 |
            Year |
           2015  |   .0057079   .0036651     1.56   0.120    -.0014791    .0128949
           2016  |   .0094027   .0036612     2.57   0.010     .0022234     .016582
           2017  |   .0093605    .003829     2.44   0.015     .0018522    .0168688
           2018  |   .0077649   .0039683     1.96   0.050    -.0000166    .0155465
                 |
           _cons |   .2259927   .0527099     4.29   0.000     .1226332    .3293522
    -------------+----------------------------------------------------------------
         sigma_u |  .07569337
         sigma_e |  .06465357
             rho |  .57817704   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(627, 2503) = 2.31                   Prob > F = 0.0000
    Code:
    . testparm i.Year
    
     ( 1)  2015.Year = 0
     ( 2)  2016.Year = 0
     ( 3)  2017.Year = 0
     ( 4)  2018.Year = 0
    
           F(  4,  2503) =    2.14
                Prob > F =    0.0729
    The Prob>F is > 0.05, therefore no time fixed effects are needed in this case.

    Code:
    . xttest3
    
    Modified Wald test for groupwise heteroskedasticity
    in fixed effect regression model
    
    H0: sigma(i)^2 = sigma^2 for all i
    
    chi2 (628)  =   9.4e+08
    Prob>chi2 =      0.0000
    According to this modified Wald test, there is a presence of heteroskedasticity.


    I would very appreciate if you could help me. Thanks in advance.

    Thanh



  • #2
    Thanh:
    just use cluster or robust standard errors.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Thank you for your response Carlo. Since there is a presence of heteroskedasticity and my variable (LegalSyst) is omitted, I ran as you suggested this :

      Code:
      . regress EPA LegalSyst LnSize Cover Loss Flev Roe, cluster (CountryID)
      
      Linear regression                               Number of obs     =      3,140
                                                      F(6, 15)          =      48.31
                                                      Prob > F          =     0.0000
                                                      R-squared         =     0.1558
                                                      Root MSE          =     .07224
      
                                   (Std. Err. adjusted for 16 clusters in CountryID)
      ------------------------------------------------------------------------------
                   |               Robust
               EPA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
         LegalSyst |    .021816   .0036234     6.02   0.000     .0140929    .0295392
            LnSize |   .0046782    .001279     3.66   0.002     .0019521    .0074044
             Cover |  -.0008315   .0002366    -3.51   0.003    -.0013358   -.0003271
              Loss |   .0920928   .0261716     3.52   0.003     .0363093    .1478762
              Flev |   .0015158   .0011241     1.35   0.198    -.0008803    .0039118
               Roe |  -.0004429   .0011174    -0.40   0.697    -.0028247    .0019388
             _cons |  -.0221708   .0091211    -2.43   0.028     -.041612   -.0027296
      ------------------------------------------------------------------------------
      (1) As I mentionned in #1 : How can I proceed if I want to compare 2 different pairs of countries in my sample and not all the 16 countries? (Note: CountryID for UK, Ireland, Germany, France = 14, 9, 1, 7, respectively).

      Thanks in advance.

      Thanh

      Comment


      • #4
        Thanh:
        I can't follow your last post.
        You started with -xtreg-, but now you've seemingly switched to -regress-. Why, if you have panel data?
        (1) you can simply -flag- the country you're interested in and then use -if- qualifier:
        Code:
        gen flag==1 if CountryID==14 | CountryID==9 |CountryID==1 | CountryID==7
        quietly xtreg EPA LegalSyst LnSize Cover Loss Flev Roe i.Year if flag==1,fe robust
        estimates store fe
        quietly xtreg EPA LegalSyst LnSize Cover Loss Flev Roe i.Year if flag==1,re robust
        estimates store re
        xtoverid
        *-xtoverid- is the community-contribute programme that replace -hausman- test when non-default standard error is invoked under -xtreg-*
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Please excuse me for my last post (#3). Indeed, I should use -xtreg- . But when I ran it my variable (LegalSyst) is still omitted for fixed effect model.

          (1) How can I fix this problem? Information: among the 16 countries, only 2 countries are common law (i.e. LegalSyst = 1) and 14 countries are civil law (LegalSyst = 0).

          Code:
          . xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, fe robust cluster(CountryID)
          note: LegalSyst omitted because of collinearity
          
          Fixed-effects (within) regression               Number of obs      =      3140
          Group variable: EnterpriseID                    Number of groups   =       628
          
          R-sq:  within  = 0.0704                         Obs per group: min =         5
                 between = 0.0447                                        avg =       5.0
                 overall = 0.0331                                        max =         5
          
                                                          F(5,15)            =     16.47
          corr(u_i, Xb)  = -0.7049                        Prob > F           =    0.0000
          
                                       (Std. Err. adjusted for 16 clusters in CountryID)
          ------------------------------------------------------------------------------
                       |               Robust
                   EPA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
             LegalSyst |          0  (omitted)
                LnSize |   -.021504   .0131145    -1.64   0.122    -.0494569    .0064489
                 Cover |  -.0022359   .0008509    -2.63   0.019    -.0040496   -.0004222
                  Loss |   .0692554   .0134871     5.13   0.000     .0405084    .0980024
                  Flev |  -.0004474   .0006848    -0.65   0.523    -.0019072    .0010123
                   Roe |  -.0012693    .001407    -0.90   0.381    -.0042683    .0017296
                 _cons |   .2085857   .1097119     1.90   0.077    -.0252596    .4424311
          -------------+----------------------------------------------------------------
               sigma_u |  .07222027
               sigma_e |  .06471257
                   rho |  .55466326   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          (2) Furthermore, my R-sq (0.0331) is relatively small compare to -xtreg- with random effect (0.1551). Is there something to improve this coefficient of determination?

          Code:
          . xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, robust cluster(CountryID)
          
          Random-effects GLS regression                   Number of obs     =      3,140
          Group variable: EnterpriseID                    Number of groups  =        628
          
          R-sq:                                           Obs per group:
               within  = 0.0601                                         min =          5
               between = 0.2990                                         avg =        5.0
               overall = 0.1551                                         max =          5
          
                                                          Wald chi2(6)      =     190.65
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
          
                                       (Std. Err. adjusted for 16 clusters in CountryID)
          ------------------------------------------------------------------------------
                       |               Robust
                   EPA |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
             LegalSyst |   .0219064   .0036843     5.95   0.000     .0146853    .0291275
                LnSize |   .0045568   .0013257     3.44   0.001     .0019584    .0071553
                 Cover |  -.0009433   .0002518    -3.75   0.000    -.0014367   -.0004499
                  Loss |   .0848819   .0219644     3.86   0.000     .0418325    .1279314
                  Flev |   .0006721   .0007248     0.93   0.354    -.0007484    .0020927
                   Roe |  -.0008226   .0013022    -0.63   0.528    -.0033747    .0017296
                 _cons |  -.0183203   .0090572    -2.02   0.043     -.036072   -.0005685
          -------------+----------------------------------------------------------------
               sigma_u |  .03070813
               sigma_e |  .06471257
                   rho |  .18379327   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          (3) I don't know why I got an error message when I typed -flag- command:
          Code:
          . gen flag==1 if CountryID==14 | CountryID==9 |CountryID==1 | CountryID==7
          == invalid name
          r(198);
          Thanks in advance.

          Thanh

          Comment


          • #6
            For my master thesis, I am analyzing the impact of the legal system of a country (i.e. common law versus civil law) on the earnings' forecast accuracy of security analysts. My data is composed of 628 firms in 16 countries during 5 years.
            (3) Furthermore, for example If I want to analyze jointly 2 common law and 2 civil law countries in my sample, should I use "cluster" ? If yes, could you suggest me the syntax code ? (Note: CountryID is the variable that refers to the country. It can take the value from 1 to 16 depending on the corresponding country)
            This suggests that your countries either implement common law or civil law, and therefore you cannot estimate that particular coefficient using fixed effects as it is time invariant. Hybrid models do not solve this problem as some would like to think and you clearly reject random effects. Therefore, go back to your supervisor and start from scratch thinking about your research question. If you are interested in a purely descriptive model, just run OLS.

            Comment


            • #7
              Your variable is most likely to be omitted because it is time invariant, and estimation under the assumption of fixed effects with either least-squares dummy variable or within estimator, will drop time invariant variables. So if your variable of interest is time invariant, then you have to either estimate using random effects, or I suggest the correlated random effects model.

              Before going there, I noticed you said that you tested for fixed versus random effects. Did you do this under homoskedastic errors using the Hausman test? If so, you should test under cluster-robust standard errors using either a test of joint significance of the parameters capturing the between effects with cluster-robust standard errors using test after the correlated random effects estimation, or user-written command xtoverid (SSC) after a random effects estimation with cluster-robust standard errors. These tests are asymptotically equivalent and robust to heteroskedasticity and correlation within clusters (panels). The Hausman test is not valid with robust standard errors.

              For the correlated random effects model here are the links to two presentations by Jeff Wooldridge:

              http://conference.iza.org/conference...linear_iza.pdf

              http://conference.iza.org/conference...nonlin_iza.pdf

              Another reference is

              Schnuck, Reinhard (2013) "Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models", The Stata Journal 13(1), pp. 65-76.

              This is available at

              https://journals.sagepub.com/doi/pdf...867X1301300105
              Alfonso Sanchez-Penalver

              Comment


              • #8
                Thanh:
                (3) my typo, sorry. It should have been:
                Code:
                . gen flag=1 if CountryID==14 | CountryID==9 |CountryID==1 | CountryID==7
                That said, on the same line of Andrew and Alfonso's comments, you're experiencing -fe- 's hunger for time-invariant predictors: this is the main drawback of this specification.
                As far as R-sqs are concerned, you should look at within and between R-sq for -fe- and -re- specification, respectively.
                That said, your dataset shows a limited within panel variation, as you can see from the non-significant coefficients and the low R-sq within.
                What did -xtoverid- give you back?
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  Thank you Andrew, Alfonso and Carlo for your time and precious advices. Indeed, the variable (LegalSyst) is time-invariant and hence omitted.

                  Andrew : At the beginning, I ran a purely descriptive model with OLS but I would like to go further in the analysis with a robustness test. You're right, I think it would be wise to contact my supervisor to see what he suggests.

                  @Carlo : The -flag- command works fine for me thank you. Concerning the -xtoverid-, after typing "xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, fe robust cluster(CountryID)" and "xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, re robust cluster(CountryID)" (in #5) I used -test- and -xtoverid- which give me the results shown below.

                  @Alfonso : I have looked at your attached documentations and tried to use a "correlated random effects model" but I am not sure how to implement it correctly in Stata. I used a Hausman test for -xtreg, fe- and -xtreg, re- under "homoskedastic errors". Therefore, I tested again for cluster-robust standard errors :

                  Code:
                  . test
                  
                   ( 1)  LegalSyst = 0
                   ( 2)  LnSize = 0
                   ( 3)  Cover = 0
                   ( 4)  Loss = 0
                   ( 5)  Flev = 0
                   ( 6)  Roe = 0
                  
                             chi2(  6) =  190.65
                           Prob > chi2 =    0.0000
                  Code:
                  . xtoverid
                  
                  Test of overidentifying restrictions: fixed vs random effects
                  Cross-section time-series model: xtreg re  robust cluster(CountryID)
                  Sargan-Hansen statistic   5.818  Chi-sq(5)    P-value = 0.3243
                  (1) From these results, what can we say?

                  (2) What potential codes/approach would be appropriate for my research question concerning the impact of the legal system on my dependant variable (robustness test, panel data, ... )?

                  Thank you in advance.

                  Thanh

                  Comment


                  • #10
                    Thanh:
                    -xtoverid- outcome points you towards -re- specification, that allows you to estimate coefficients for time-invariant predictors, too.
                    As an aside, please note that -cluster(CountryID)- is enough to invoke clustered robust standard errors (ie, -robust- is redundant).
                    Last edited by Carlo Lazzaro; 27 Nov 2019, 01:50.
                    Kind regards,
                    Carlo
                    (Stata 18.0 SE)

                    Comment


                    • #11
                      Hello,

                      Thank you Carlo for your reply.

                      As a reminder, I am interested on the impact of the legal system which depends on a specific country. Each firm is only and only in 1 specific country during T years, but 1 country can have multiple firms. Therefore, (1) What is the most appropriate "combination" for my case ?

                      (2)
                      Code:
                      xtset EntepriseID Year
                      or
                      Code:
                      xtset CountryID
                      
                      Note: "xtset CountryID Year" does not work because of "repeated time values within panel stata"
                      (3)
                      Code:
                      xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, re robust cluster(CountryID)
                      or
                      Code:
                       xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, re robust cluster(EnterpriseID)
                      (4) What are the differences between these possible combinations?

                      (5) Last but not least, some papers in my research field also use some dummy variables such as "Year" or "Industry". According to some diagnostic tests that I ran, I should use -xtreg, re cluster(VarID)-. If I decide to add Year dummies for example on my model afterwards, is the code -i.Year- right? After adding it, can we say that it is a fixed effects model even if we use -xtreg, re cluster(VarID)- ?

                      I am looking forward to more details. Thank you in advance.

                      Thanh

                      Comment


                      • #12
                        Thanh:
                        usually, it's up to posters (not repliers) to give more details.
                        That said:
                        (2) if you do not plan to use time-series related commands, such as lags and leads, you can simply:
                        Code:
                        xtset EntepriseID
                        (3) I would go:
                        Code:
                        xtreg EPA LegalSyst LnSize Cover Loss Flev Roe, re robust cluster(CountryID)
                        As per my previous reply, please note that -robust- is redundant if you invoke -cluster-.
                        (4) The difference, that affects standard errors and related stuff (but not coefficients estimates) rests on the way standard errors are calculated.
                        (5) I would rather say that you're investigating if -i.year- do contribute to explain variation in the regressand (when adjusted for the remaining predictors).
                        Kind regards,
                        Carlo
                        (Stata 18.0 SE)

                        Comment

                        Working...
                        X