Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Timedependend Variance in Panel Analysis (N~300, T=4)

    Dear members of Statlist,

    I am, for the first time, working more intensively with stata. I am analyzing a panel structure dataset (300 Individuals with 4 observations each).

    I use a RE-Model and have heteroscedastic errors as well as autocorrelated ones. Using robust errors kills some of the significance. This is why I continue to look for options to find more efficient regression models.

    One thing I found is, that the variance of the error terms depends on the observation time (T).

    Click image for larger version

Name:	Scatter_resid_TIME_ID.jpg
Views:	1
Size:	20.4 KB
ID:	1703268
    Click image for larger version

Name:	Summarize_resid_TIME_ID.jpg
Views:	1
Size:	56.1 KB
ID:	1703269


    I have two questions regarding this:
    1. Is there a formal test, to verify this problem?
    2. How can I implement a regression (FGLS?!) that will estimate different variances for the error term for each wave/time?

    Thank you in advance, regards
    Kai
    Best regards,
    Kai

  • #2
    Kay:
    with N=300, you cannot avoid cluster-robust standard errrors (SEs).
    The significance was not killed by non default SEs: it was unreliable when you went default standard errors.
    I think you should stick with your results that are what they are.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you for your quick reply!

      I see your argument and I will probably do that. But I am still curious about a potential solution in order to learn it and further my understanding.

      If anyone can provide insight into my two questions I would very much appreciate it.

      regards
      Best regards,
      Kai

      Comment


      • #4
        Kai:
        a) the wave-specific epsilon variance also depends on the -re- (please note that -epsilon-=-delta-):
        Code:
        . use "https://www.stata-press.com/data/r17/nlswork.dta"
        (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
        
        . xtreg ln_wage c.age##c.age, re vce(cluster idcode)
        
        Random-effects GLS regression                   Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-squared:                                      Obs per group:
             Within  = 0.1087                                         min =          1
             Between = 0.1015                                         avg =        6.1
             Overall = 0.0870                                         max =         15
        
                                                        Wald chi2(2)      =    1258.33
        corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
        
                                     (Std. err. adjusted for 4,710 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 age |   .0590339   .0041049    14.38   0.000     .0509884    .0670795
                     |
         c.age#c.age |  -.0006758   .0000688    -9.83   0.000    -.0008107    -.000541
                     |
               _cons |   .5479714   .0587198     9.33   0.000     .4328826    .6630601
        -------------+----------------------------------------------------------------
             sigma_u |   .3654049
             sigma_e |  .30245467
                 rho |  .59342665   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . predict epsilon, e
        
        . bysort year: sum epsilon if year<=72
        
        ------------------------------------------------------------------------------------------------------------------------------------------
        -> year = 68
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
             epsilon |      1,375   -.0229783    .2759894  -1.201086   1.093735
        
        ------------------------------------------------------------------------------------------------------------------------------------------
        -> year = 69
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
             epsilon |      1,223    .0359193    .2521537  -1.089347   1.499997
        
        ------------------------------------------------------------------------------------------------------------------------------------------
        -> year = 70
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
             epsilon |      1,686   -.0019489    .2649476  -1.152775   1.073215
        
        ------------------------------------------------------------------------------------------------------------------------------------------
        -> year = 71
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
             epsilon |      1,851    .0232261    .2669117  -1.359019   1.219603
        
        ------------------------------------------------------------------------------------------------------------------------------------------
        -> year = 72
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
             epsilon |      1,693    .0169544    .2708474  -1.326722   2.313672
        
        ------------------------------------------------------------------------------------------------------------------------------------------
        <snip)
        
        . predict fitted, xb
        
        
        . g delta=( ln_wage- fitted+re) if year<=72
        
        . drop delta 
        
        . g delta=( ln_wage- fitted-re) if year<=72
        
        . list idcode year ln_wage fitted re delta epsilon in 1/10
        
             +-------------------------------------------------------------------------+
             | idcode   year    ln_wage     fitted          re       delta     epsilon |
             |-------------------------------------------------------------------------|
          1. |   3195     68   .8285847   1.575498    -.260727   -.4861859   -.4861859 |
          2. |    918     68   1.694751   1.548229    .0869504    .0595722    .0595722 |
          3. |   1758     68   1.403949   1.425635   -.1322301    .1105438    .1105438 |
          4. |   3712     68   1.646895   1.519608    .1080533    .0192339    .0192339 |
          5. |   1234     68   1.549131   1.601415    .1658492   -.2181332   -.2181332 |
             |-------------------------------------------------------------------------|
          6. |   1514     68   2.130894   1.548229    .4970358    .0856294    .0856294 |
          7. |    324     68   1.850236   1.575498    .1255661    .1491726    .1491726 |
          8. |   3795     68   1.434721   1.425635   -.0038257    .0129112    .0129112 |
          9. |   3182     68   1.809414   1.548229   -.0142728    .2754585    .2754585 |
         10. |   3297     68   1.403949   1.519608   -.3863472    .2706883    .2706884 |
             +-------------------------------------------------------------------------+
        
        .
        2) a more efficient regression is the one with the most reliable results;

        3) what's the gain in showing that (as it frequently happens, no matter if the panel is balanced or not), epsilon has wave-specific variance, but to highlight its heteroskedasticity;

        4) while I do not think that there's a tool to do9 what you're after, for more comples error structure, you may want to take a look at -xtgee-.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hey, Carlo thank you for your answer. Your input is very much appreciated.

          some replies to your comments:

          1) Thank you for the input. I rerun the "analysis". It changes things, but the "problem" might persist, even if less extreme.

          Click image for larger version

Name:	Scatter_e.jpg
Views:	1
Size:	20.2 KB
ID:	1703412
          Click image for larger version

Name:	Sum_e.jpg
Views:	1
Size:	58.0 KB
ID:	1703413


          2) I get your point, but I think of efficiency in the sense that the estimator has a smaller variance for a given amount of data points. I still think in some cases using robust errors with non-efficient estimators (in the provided meaning) will show no significant effect, but would do so with more data. I guess it depends on how significant (not statistical significance) heteroskedastic errors or autocorrelation are. I guess my thinking is if the assumptions that make the standard procedure efficient are heavily violated, the difference in efficiency between estimators taking those violations into account and standard estimators will be quite relevant. Therefore in those cases relying on standard estimators with robust errors might miss relevant effects. Is this thinking flawed? Furthermore, I don't see why using appropriate estimators to correct problems would be less reliable compared to using significantly less efficient estimators. In one case one might over-specify a model and use actually irrelevant information to "boost significance" in the other case one takes the risk of missing relevant actual relevant information.

          I am not trying to deliberately debate you, but just putting my thought down here, so others might profit from it and I might get further clarifications.

          3) I guess it is just the idea to specify heteroskedasticity and use a tailored approach to take it into account. The whole idea is, that I consider this especially important in my analysis (motivated not only by data but also by theory).

          4) Thank you for the input. I checked it out.

          Stata help reads: "xtgee fits population-averaged panel-data models. In particular,xtgee fits generalized linear
          models and allows you to specify the within-group correlation structure for the panels
          "

          I take it that one can only specify correlation. Different variances depending on the time should not be modeled by a correlation. I also looked into ARCH/GARCH modeling. There, variance is conditional on the previous variance, which would not fit this case either.


          Attached Files
          Best regards,
          Kai

          Comment


          • #6
            Kai:
            the main issue seems to rest on heteroskedasticity and autocorrelation of the epsilon, that with such a large number of panels, call for clustered robust standard errors.
            Unfortunately, I'm not aware of modules that can help you out with what you're after.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X