Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed vs. Random model in case of heteroscedasticity and autocorrelation.

    Hello everyone,
    I have one basic doubt regarding panel data regression. In my analysis when I am testing for autocorrelation and heteroscedasticity using xtserial and xttest3 command respectively, after fixed effect (fe) it shows that both issues are there in my data. In this case it is prescribed to run fe with cluster/robust standard error. My doubt is what next, do we still need to apply hausman test to decide between fixed and random effect or just stick with fe only. If yes than how, because stata result shows that hausman test cannot be used with vce(robust/cluster).

    Thanks!
    Minhaj

  • #2
    Minhaj:
    welcome to this forum.
    You should switch to the community-contributed module -xtoverid-.
    However:
    1) -xtoverid- tests whether -re- is the way to go;
    2) -xtoverid- being glorious but a bit old-fashioned, does not allow -fvvarlist- notation. Therefore, categorical variables and interactions should be created by hand;
    3) the following toy-example wraps up what above (note the use of -xi:- prefix to deal with categorical variables):
    Code:
    . use https://www.stata-press.com/data/r18/nlswork.dta
    (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
    
    . xtset idcode year
    
    Panel variable: idcode (unbalanced)
     Time variable: year, 68 to 88, but with gaps
             Delta: 1 unit
    
    . xi: xtreg ln_wage c.age##c.age i.race, re vce(cluster idcode)
    i.race            _Irace_1-3          (naturally coded; _Irace_1 omitted)
    
    Random-effects GLS regression                   Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-squared:                                      Obs per group:
         Within  = 0.1087                                         min =          1
         Between = 0.1175                                         avg =        6.1
         Overall = 0.1048                                         max =         15
    
                                                    Wald chi2(4)      =    1354.70
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
    
                                 (Std. err. adjusted for 4,710 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             age |   .0594573   .0041032    14.49   0.000     .0514151    .0674995
                 |
     c.age#c.age |  -.0006835   .0000688    -9.94   0.000    -.0008182   -.0005487
                 |
        _Irace_2 |  -.1237269   .0126612    -9.77   0.000    -.1485424   -.0989114
        _Irace_3 |   .0965773   .0613496     1.57   0.115    -.0236657    .2168203
           _cons |   .5761164   .0586669     9.82   0.000     .4611314    .6911015
    -------------+----------------------------------------------------------------
         sigma_u |  .36094993
         sigma_e |  .30245467
             rho |   .5874941   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . xtoverid
    cage#c:  operator invalid
    r(198);
    
    . g sq_age=age*age
    
    . xi: xtreg ln_wage age sq_age i.race, re vce(cluster idcode)
    i.race            _Irace_1-3          (naturally coded; _Irace_1 omitted)
    
    Random-effects GLS regression                   Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-squared:                                      Obs per group:
         Within  = 0.1087                                         min =          1
         Between = 0.1175                                         avg =        6.1
         Overall = 0.1048                                         max =         15
    
                                                    Wald chi2(4)      =    1354.70
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
    
                                 (Std. err. adjusted for 4,710 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             age |   .0594573   .0041032    14.49   0.000     .0514151    .0674995
          sq_age |  -.0006835   .0000688    -9.94   0.000    -.0008182   -.0005487
        _Irace_2 |  -.1237269   .0126612    -9.77   0.000    -.1485424   -.0989114
        _Irace_3 |   .0965773   .0613496     1.57   0.115    -.0236657    .2168203
           _cons |   .5761164   .0586669     9.82   0.000     .4611314    .6911015
    -------------+----------------------------------------------------------------
         sigma_u |  .36094993
         sigma_e |  .30245467
             rho |   .5874941   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . xtoverid
    
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  robust cluster(idcode)
    Sargan-Hansen statistic  64.865  Chi-sq(2)    P-value = 0.0000
    
    .
    In this case, -xtoverid- outcome points out to -fe- specification.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Thank you so much, Carlo, for your quick reply:
      Just to make sure I've got it right: Considering the autocorrelation and heteroscedasticity issues, your advice is to use xtoverid to decide between the FE and RE models. If the outcome supports (statistically significant) the FE model, applying clustered or robust errors within that model would be the way to go.

      Comment


      • #4
        Minhaj:
        correct.
        Just an aside: unlike the -re- estimator, the -fe- one does not give back coefficients for time-invariant predictors, such as -race- in my pevious example.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Thank you.

          I have another query regarding the ongoing panel data modelling that has been perplexing me for some time now. What factors should be considered when choosing between a static panel model and a dynamic panel model? It's evident from existing literature that studies have employed both models. Some recent papers advocate for the dynamic model (GMM), highlighting its effectiveness in addressing endogeneity, autocorrelation, and heteroscedasticity, suggesting its superiority over conventional methods like Fixed and Random Effects models. Given these considerations, wouldn't it be more appropriate to opt for GMM in my case instead of pursuing the standard error correction approach?

          Thanks and regards,
          Minhaj

          Comment


          • #6
            Minhaj:
            dynamic panel models are way more difficult than their static cousins.
            In addition, they are a wise choice for well specified econometric issues (quoting the -xtabond- helpfile-, the following sentence seems to be enlightening in this respect):
            xtabond fits a linear dynamic panel-data model where the unobserved panel-level effects are correlated with the lags of the dependent variable, known as the Arellano-Bond estimator. This estimator is designed for datasets with many panels and few periods, and it requires that there be no autocorrelation in the idiosyncratic errors.
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Carlo:
              Thank you so much. It was really helpful.

              Comment


              • #8
                Hello everyone,

                I am performing panel data analysis with ROA as a dependent variable and ESG as an independent variable.

                Some of the previous studies have used level ESG and it's square in the model to estimate the nonlinear relationship, while others have used lagged ESG and it's lagged square to predict the same model. Both have their own theoretical justification.



                My doubt is: will it be appropriate to use all four in the same model (i.e., ESG, ESG^2, L.ESG, and L.ESG^2)?

                Thanks and regards,
                Minhaj

                Comment


                • #9
                  Minhaj:
                  provided that my crush with corporate finance dates back to 35 years ago, I would run the two regression models separately and compare their results.
                  In addition, it is difficult to believe that twom predictors only make your regression correctly specified (see -linktest-).
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #10
                    Actually along with the above mentioned variables I have some other control variables as well. By the way it sounds a great idea to try two models, I will try to implement.

                    Thank you

                    ​​

                    Comment


                    • #11
                      Minhaj: You're proposing a nonlinear distributed lag model, and there's nothing wrong with that. But you might have trouble precisely estimating the effects at both lags, especially if you used fixed effects. I would center the squared terms about, say, the overall mean of ESG so that the level terms have meaning and are more precisely estimated.

                      It's often the case that x, L.x, and so on appear in time series and panel data analysis. It allows testing persistence of effects.

                      JW

                      Comment


                      • #12
                        Thank you, Wooldridge sir.
                        I'm employing a dynamic model (system GMM) based on the literature supporting the persistence in profitability (ROA & ROE) of firms. Would it be advisable to proceed with your suggestion of centering the squared term within my dynamic model?

                        I am curious about how the interpretation of this squared term might differ from that of the plain square of the ESG. My understanding from prior studies is that the squared term primarily defines the U-shaped relationship. Could you please shed some light on this?

                        Thanks and regards, Minhaj

                        Comment


                        • #13
                          So you're including lagged value or ROA and including lagged ESG? That isn't clear from your initial post. The centering of ESG before squaring ensures that the coefficient on ESG will be the average partial effect. The coefficient on ESG^2 won't change if you center it about the overall mean (or any other value).

                          Comment


                          • #14
                            Thank you, Professor Wooldridge, for the clarification above.

                            I had one doubt related to your old post (link provided below). There you have talked about finding endogenous independent variables in panel data, but the process is not explained in that post. Could you please help me with the procedure?

                            https://www.statalist.org/forums/for...u-hausman-test

                            Comment


                            • #15
                              Hello everyone,

                              Continuing the discussion on my previous post, I am seeking guidance on the appropriate sequence of tests and model selection for my panel data analysis. The dataset comprises N=65 entities observed over T=7 time periods, further divided into two groups with 15 and 50 firms, respectively.

                              Initially, I ran Fixed Effects (FE) and Random Effects (RE) models, followed by the Hausman test, which indicated support for the FE model. However, postestimation tests (xttest3, xtserial, and xtcd) following the FE model revealed the presence of autocorrelation, heteroscedasticity, and cross-sectional dependence.

                              Moreover, when I utilized xtoverid with RE, it also favored the random effect model. I've come across suggestions on this platform that in the presence of autocorrelation, heteroscedasticity, and cross-sectional dependence, Xtscc should be considered.

                              Now, within Xtscc, I have two options: Pooled OLS and FE. Given that both Xtoverid and Hausman tests support the Random Effects model, I'm uncertain about the appropriate choice within Xtscc. Could you guide whether Pooled OLS or FE within Xtscc would be more suitable for my analysis?

                              Your insights would be greatly appreciated, and thank you in advance for your help.

                              Comment

                              Working...
                              X