Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data Tests and Regression Steps

    Dear Statalist,

    I am new to panel data regression analysis and Stata, so please forgive my perhaps basic questions.
    Context:
    I am testing an asset pricing model with portfolio excess returns as the dependent variable and the market excess returns and an ESG factor as independent variables. Simplified, let's call: DV = portfolio return (Ri); IV1 = market factor (RmRf); IV2: ESG factor (ESG).
    Portfolios are formed, so the data has 5 portfolios over 13 years with 65 total observations.

    The first steps I took were to test the model assumptions (principally heteroskedasticity, multicollinearity, and autocorrelation).
    • Heteroskedasticity:
    Code:
    xtgls Ri RmRf ESG, igls panels(heteroskedastic)
    estimates store hetero
    xtgls Ri RmRf ESG, igls 
    local df = e(N_g) - 1
    lrtest hetero . , df(`df')
    The test showed Prob > chi2 = 0.000, so the data are heteroskedastic.
    ​​​​​​
    • FE vs RE:
    Code:
    xtreg Ri RmRf ESG, fe
    estimate store fe
    xtreg Ri RmRf ESG, re
    estimate store re
    hausman fe re
    The test showed Prob > chi2 = 1.000, so RE will be used.
    • Multicollinearity: running the VIF yields average values of 1.01, so the IVs are not correlated.
    • Autocorrelation:
    Code:
    xtserial Ri RmRf ESG
    The test showed Prob > F = 0.9308, so there is no autocorrelation in the error terms across panels.

    In sum: the data are heteroskedastic. Therefore, I assume I can run panel regressions with robust standard errors using:
    Code:
    xtreg Ri RmRf ESG, robust
    The resulting table:
    Code:
    Random-effects GLS regression                   Number of obs     =         65
    Group variable: ID                              Number of groups  =          5
    
    R-squared:                                      Obs per group:
         Within  = 0.0000                                         min =         13
         Between = 0.0000                                         avg =       13.0
         Overall = 0.7903                                         max =         13
    
                                                    Wald chi2(2)      =      73.92
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
    
                                         (Std. err. adjusted for 5 clusters in ID)
    ------------------------------------------------------------------------------
                 |               Robust
              Ri | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            RmRf |   1.042645   .1598131     6.52   0.000     .7294168    1.355873
             ESG |    .104679   .2043089     0.51   0.608    -.2957591    .5051172
           _cons |   .0447226   .0077805     5.75   0.000     .0294731    .0599721
    -------------+----------------------------------------------------------------
         sigma_u |          0
         sigma_e |  .10128858
             rho |          0   (fraction of variance due to u_i)
    My questions are the following:
    1. Are my steps correct?
    2. Must I conduct additional tests?
    3. What other methods may I use to evaluate the factor model?
    4. Is it right to conclude that the RmRf factor and the intercept have statistically significant coefficients? (According to the xtreg output above)
    My sincere appreciation for your time and expertise.
    Best regards,
    Nicco

  • #2
    Nicco:
    welcome to this forum.
    If you have a T>N panel dataset, -xtreg- is not the way to go.
    Take a look at -xtgls- (as you already did) and -xtregar-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,
      Thank you for your prompt reply.

      My understanding is that because I have a T>N panel dataset, where T=time periods=13 and N=sample size=5, then estimating the standard errors with the GLS method is better.
      The appropriate code would then be:
      Code:
      xtgls y x1 x2, options
      However, I read somewhere (cannot seem to find it, but will look further it needed) that (F)GLS performs only with a large enough sample due to it requiring a consistent estimate of the variance-covariance matrix. I also read that xtregar is used when there is autocorrelation in the error terms (https://www.stata.com/manuals13/xtxtregar.pdf). I don't think this is the case with my data, as the Prob > F = 0.9308 when I run:
      Code:
      xtserial Ri RmRf ESG
      Can I not instead use robust standard errors with a simple panel regression?
      Code:
      xtreg y x1 x2, vce(robust)
      Thank you in advance for your time.
      Best regards,
      Nicco

      Comment


      • #4
        NIcco:
        I'd stick with -xtgls- that offers option for non-default standard errors.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X