Problems with Panel Data Analysis

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

#16

24 May 2020, 08:22

Samuel:
please read the -xtreg- entry in Stata .pdf manual.
1) non-default standard errors (SEs) makes thec alculation of the F-test you mention unfeasible (ie, too complicated): that's why Stata omits it;
2) you're correct in clustering SEs on -panelid- for both -fe- and -re- specification;
3) -xtoverid- outcome points you toward -fe- specification; it's equivalent to -hausman- outcome;
4) -xi:- is needed before -xtreg,re- because -xtoverid- does not support -fvvarlist- notation. Conversely, if you compare -fe- vs -re- with default SEs, you go -hausman- which does not support non-default SEs (unlike (-xtoverid-), but support -fvvarlist- notation (unlike -xtoverid- again).
Please note that you can use the community-contributed progranmme -xtoverid- with default SEs, too:

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage age, re

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                Wald chi2(1)      =    3140.35
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0185667   .0003313    56.04   0.000     .0179174    .0192161
       _cons |   1.120439   .0112038   100.01   0.000      1.09848    1.142398
-------------+----------------------------------------------------------------
     sigma_u |  .36972456
     sigma_e |  .30349389
         rho |  .59743613   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  
Sargan-Hansen statistic  17.401  Chi-sq(1)    P-value = 0.0000

. quietly xtreg ln_wage age, fe

. estimate store fe

. quietly xtreg ln_wage age, re

. estimate store re

. hausman fe re

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |       fe           re         Difference          S.E.
-------------+----------------------------------------------------------------
         age |    .0181349     .0185667       -.0004318        .0001055
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                  chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       16.76
                Prob>chi2 =      0.0000

. estimate store re

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  
Sargan-Hansen statistic  17.401  Chi-sq(1)    P-value = 0.0000

.

That'swhy -xtoverid- can be a work-around when -hausman- throws the warning message you got.

5) your data do not show evidence of serial correlation. However, since your dataset showed heteroskedasticity and you wiseely invoked SEs, stay with clustered SEs.

Kind regards,
Carlo
(Stata 19.0)

Comment

Samuel Renhoar

Join Date: May 2020

Posts: 25
#17

24 May 2020, 13:09

Carlo:

Now i understand some procedures you explain so far, but because i have to report everything in my paper and since i couldn't find a good format or "step-by-step" to report all of this procedure, so i'm really sorry Carlo, if i have to ask too much detail over this subject.

1. After considering that i have to test hetero and autocorellation before deciding that i should use option vce(cluster panelid)or not (because i assume that i have to use this option vce(cluster panelid) to almost every panel data test that i'm using). Do you think i should report every assumption test including autocorrelation and heterocedasticity, and any other cllasical assumptions like multicollinearity and normality, before i report my panel data estimation or is it just autocor and hetero test? because some people around me suggests that i dont need to do any of classical assumption of OLS, if i'm doing panel data analysis.

2. Before every procedure that you suggested, i consider (After read some references) that in order to choose a proper method between pooled, fixed and random there are 3 test that i should conduct i.e. F Test/Chow Test, BP LM test, and Hausman. Could you verify, does it a correct procedure? especially about F test, is that really necessary?
because now i don't have F result for fixed effect after using vce(Cluster) how can i get the F-test result for u_i, even though i could still get testparm for i.year?

for number 4
so xi before xtreg is only necessary for doing xtoverid, is it right?
Comment
Samuel Renhoar

Join Date: May 2020

Posts: 25
#18

24 May 2020, 13:19

Carlo:

Why
should i write re command for xtreg as
xtreg Y X1 X2 i.id i.year, re vce(cluster id)
or just
xtreg Y X1 X2 i.year, re vce(cluster id)

i was wondering about this because i'm thinking, why we just put dummy over "time effect" (i.year) and not "individual effect"(i.id)? it makes sense for fixed effect but i cannot find a reason for re.
and can i considered both i.year and i.id as "controlling variables" in my model?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#19

25 May 2020, 01:43

Samuel:
1) when you test for heteroskedastcity and serial correlation you're actualluy testing some of the OLS requirements: hence, I do not follow/share the advice you received. Quasi-extreme multicollinearity is usually apparent when you detect "weird"standard errors and too wide CIs vs theory. At the top of that, most on and off this forum consider quasi-extreme multicollinearity an oversold problem (see https://www.hup.harvard.edu/catalog.php?isbn=9780674175440&content=toc, Chapter 23). Eventually, normality is a (weak) requirement concening residuall distribution only.
2) Please note that nobody on this list has to check methodological details/requirements on behalf of other posters: "it's your research, man!", as one of my academic professors told me many years ago (and, admittedly, he was right). Hence, you have to verify it yourself and/or with your teacher/mentor/supervisor. The reason you do not see F-test as a footnote after -xtreg,fe- is because of non-default standard errors: I have pointed this out in one of my previous reply and it's clearly reported in -xtreg- entry of Stata .pdf manual, that you wer recommended to take a look at (did you?).
3)-xtoverid- code:

Code:

xtreg Y X1 X2 i.year, re vce(cluster id)

is the way to go.

Kind regards,
Carlo
(Stata 19.0)
Comment
Samuel Renhoar

Join Date: May 2020

Posts: 25
#20

25 May 2020, 05:48

Carlo:

I think that's everything i'd like to know,
Thank you for being really helpful Carlo,
and have a great day.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment