Timedependend Variance in Panel Analysis (N~300, T=4)

Kai Kettler

Join Date: Feb 2023

Posts: 5
#1

Timedependend Variance in Panel Analysis (N~300, T=4)

24 Feb 2023, 06:06

Dear members of Statlist,

I am, for the first time, working more intensively with stata. I am analyzing a panel structure dataset (300 Individuals with 4 observations each).

I use a RE-Model and have heteroscedastic errors as well as autocorrelated ones. Using robust errors kills some of the significance. This is why I continue to look for options to find more efficient regression models.

One thing I found is, that the variance of the error terms depends on the observation time (T).

I have two questions regarding this:
Is there a formal test, to verify this problem?

How can I implement a regression (FGLS?!) that will estimate different variances for the error term for each wave/time?

Thank you in advance, regards
Kai

Best regards,
Kai
Tags: FGLS, panel, timevariant variance
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#2

24 Feb 2023, 07:09

Kay:
with N=300, you cannot avoid cluster-robust standard errrors (SEs).
The significance was not killed by non default SEs: it was unreliable when you went default standard errors.
I think you should stick with your results that are what they are.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kai Kettler

Join Date: Feb 2023

Posts: 5
#3

24 Feb 2023, 07:27

Thank you for your quick reply!

I see your argument and I will probably do that. But I am still curious about a potential solution in order to learn it and further my understanding.

If anyone can provide insight into my two questions I would very much appreciate it.

regards

Best regards,
Kai
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17854

24 Feb 2023, 09:39

Kai:
a) the wave-specific epsilon variance also depends on the -re- (please note that -epsilon-=-delta-):

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age##c.age, re vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1087                                         min =          1
     Between = 0.1015                                         avg =        6.1
     Overall = 0.0870                                         max =         15

                                                Wald chi2(2)      =    1258.33
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0590339   .0041049    14.38   0.000     .0509884    .0670795
             |
 c.age#c.age |  -.0006758   .0000688    -9.83   0.000    -.0008107    -.000541
             |
       _cons |   .5479714   .0587198     9.33   0.000     .4328826    .6630601
-------------+----------------------------------------------------------------
     sigma_u |   .3654049
     sigma_e |  .30245467
         rho |  .59342665   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict epsilon, e

. bysort year: sum epsilon if year<=72

------------------------------------------------------------------------------------------------------------------------------------------
-> year = 68

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     epsilon |      1,375   -.0229783    .2759894  -1.201086   1.093735

------------------------------------------------------------------------------------------------------------------------------------------
-> year = 69

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     epsilon |      1,223    .0359193    .2521537  -1.089347   1.499997

------------------------------------------------------------------------------------------------------------------------------------------
-> year = 70

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     epsilon |      1,686   -.0019489    .2649476  -1.152775   1.073215

------------------------------------------------------------------------------------------------------------------------------------------
-> year = 71

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     epsilon |      1,851    .0232261    .2669117  -1.359019   1.219603

------------------------------------------------------------------------------------------------------------------------------------------
-> year = 72

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     epsilon |      1,693    .0169544    .2708474  -1.326722   2.313672

------------------------------------------------------------------------------------------------------------------------------------------
<snip)

. predict fitted, xb


. g delta=( ln_wage- fitted+re) if year<=72

. drop delta 

. g delta=( ln_wage- fitted-re) if year<=72

. list idcode year ln_wage fitted re delta epsilon in 1/10

     +-------------------------------------------------------------------------+
     | idcode   year    ln_wage     fitted          re       delta     epsilon |
     |-------------------------------------------------------------------------|
  1. |   3195     68   .8285847   1.575498    -.260727   -.4861859   -.4861859 |
  2. |    918     68   1.694751   1.548229    .0869504    .0595722    .0595722 |
  3. |   1758     68   1.403949   1.425635   -.1322301    .1105438    .1105438 |
  4. |   3712     68   1.646895   1.519608    .1080533    .0192339    .0192339 |
  5. |   1234     68   1.549131   1.601415    .1658492   -.2181332   -.2181332 |
     |-------------------------------------------------------------------------|
  6. |   1514     68   2.130894   1.548229    .4970358    .0856294    .0856294 |
  7. |    324     68   1.850236   1.575498    .1255661    .1491726    .1491726 |
  8. |   3795     68   1.434721   1.425635   -.0038257    .0129112    .0129112 |
  9. |   3182     68   1.809414   1.548229   -.0142728    .2754585    .2754585 |
 10. |   3297     68   1.403949   1.519608   -.3863472    .2706883    .2706884 |
     +-------------------------------------------------------------------------+

.

2) a more efficient regression is the one with the most reliable results;

3) what's the gain in showing that (as it frequently happens, no matter if the panel is balanced or not), epsilon has wave-specific variance, but to highlight its heteroskedasticity;

4) while I do not think that there's a tool to do9 what you're after, for more comples error structure, you may want to take a look at -xtgee-.

Kind regards,
Carlo
(Stata 19.0)

Comment

Kai Kettler

Join Date: Feb 2023

Posts: 5
#5

25 Feb 2023, 05:01

Hey, Carlo thank you for your answer. Your input is very much appreciated.

some replies to your comments:

1) Thank you for the input. I rerun the "analysis". It changes things, but the "problem" might persist, even if less extreme.

2) I get your point, but I think of efficiency in the sense that the estimator has a smaller variance for a given amount of data points. I still think in some cases using robust errors with non-efficient estimators (in the provided meaning) will show no significant effect, but would do so with more data. I guess it depends on how significant (not statistical significance) heteroskedastic errors or autocorrelation are. I guess my thinking is if the assumptions that make the standard procedure efficient are heavily violated, the difference in efficiency between estimators taking those violations into account and standard estimators will be quite relevant. Therefore in those cases relying on standard estimators with robust errors might miss relevant effects. Is this thinking flawed? Furthermore, I don't see why using appropriate estimators to correct problems would be less reliable compared to using significantly less efficient estimators. In one case one might over-specify a model and use actually irrelevant information to "boost significance" in the other case one takes the risk of missing relevant actual relevant information.

I am not trying to deliberately debate you, but just putting my thought down here, so others might profit from it and I might get further clarifications.

3) I guess it is just the idea to specify heteroskedasticity and use a tailored approach to take it into account. The whole idea is, that I consider this especially important in my analysis (motivated not only by data but also by theory).

4) Thank you for the input. I checked it out.

Stata help reads: "xtgee fits population-averaged panel-data models. In particular,xtgee fits generalized linear
models and allows you to specify the within-group correlation structure for the panels"

I take it that one can only specify correlation. Different variances depending on the time should not be modeled by a correlation. I also looked into ARCH/GARCH modeling. There, variance is conditional on the previous variance, which would not fit this case either.

Attached Files

Best regards,
Kai
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#6

25 Feb 2023, 15:55

Kai:
the main issue seems to rest on heteroskedasticity and autocorrelation of the epsilon, that with such a large number of panels, call for clustered robust standard errors.
Unfortunately, I'm not aware of modules that can help you out with what you're after.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement