Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does anyone care about serial correlation (in panels)?

    Perhaps a silly question, but I'm interested to hear if researchers these days care about serial correlation, especially in panel setups? I remember in time series that this was a big issue - you'd almost always do a test for white noise and adjust your lag structure until your residuals were free of serial correlation. Yet, in just about any modern economics paper I've read (my background, but I'm also interested in other fields' input), people do not care at all. They just use robust or clustered standard errors and state that these are robust to autocorrelation (and heteroskedasticity) and that's it.

    Is that really all there is to it? Does it depend on the dimensions (# of units (N), # of time periods (T))? My intuition is that serial correlation plays a role in two ways. The first is through the standard errors, which will not be reliable if you do not control for serial correlation. This issue is generally solved by using robust/cluster(). The second is more complicated - is the presence of serial correlation an indication of model misspecification? In other words, can serial correlation tests help you figure out what kind of model you need to properly explain the data? E.g. do you use a static model, one in differences, one with lags of dependent and independent variables? Or will these tests give you a false sense of credibility (i.e. you think your model is correctly specified because your test says it is free of serial correlation, but actually the test isn't that informative in practice)?

    I'd be very interested to hear your opinion.

  • #2
    Jesse:
    in the light of what I have (self)learnt in these years, serial correlation is a nasty issue when you have small N, large T panel dataset.
    If the reverse holds, it is a minor issue and clustering standard errors can be all that is required to deal with it.
    Please consider that I usually work with linear regression models for panel dataset (-xtreg- in Stata).
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      is the presence of serial correlation an indication of model misspecification?
      It could be a result of model misspecification. But it could also be a result of:

      (i) omitted variables - If the omitted variable is time persistent, then successive error terms can be temporally correlated.

      (ii) measurement errors - systematic measurement errors over time accumulate and this temporal persistence can lead to serial correlation

      In the case of misspecification, you may be able to modify the functional form and get rid of the serial correlation. So yes, the tests can lead you to investigate whether the correlation is as a result of misspecification. However, with measurement errors (which are endemic in cross country panel data) and in some cases omitted variables, there may be no direct approaches to tackling the issue. The important thing to note is that OLS still remains unbiased and consistent in the presence of serial correlation, so what we are worried about is the estimated variances of our coefficients (their bias make our hypotheses tests invalid).

      Comment


      • #4
        Dear Jesse,

        As always, it all depends on what you are doing.

        With time series, the focus is often on getting the dynamic specification right and in that case serial correlation is a sign of misspecification.

        With panel data the focus is generally very different and in most cases the models of interest are static. In that case we know that there will be serial-correlation, but that is irrelevant for the way we specify the model, and hence people just use robust (clustered) standard errors.

        Best wishes,

        Joao

        Comment


        • #5
          First of all thank you all for the responses, they've been very interesting.

          I think my internal confusion comes down to the following sentences
          The important thing to note is that OLS still remains unbiased and consistent in the presence of serial correlation, [...]
          With panel data the focus is generally very different and in most cases the models of interest are static. In that case we know that there will be serial-correlation, but that is irrelevant for the way we specify the model, and hence people just use robust (clustered) standard errors.
          I agree that the mere presence of serial correlation does not in itself lead to biased coefficient estimates. However, (as Andrew also noted, and Joao Santos for time series), serial correlation can also be an indicator of model misspecification, which does affect the betas. Below, I have attached a table of three ways to estimate the impact of wages, capital and industry output on employment levels in the UK as an example. This is based on the Arellano Bond dataset available in stata (abdata). This is a short panel (N140, T9). The first column shows estimates using standard twoway fixed effects (industry and year dummies), the second adds an industry-specific trend, the third is estimated in differences (but still with both fixed effects). Code is at the bottom.
          Click image for larger version

Name:	serialcorrelation.PNG
Views:	1
Size:	20.5 KB
ID:	1382130


          The estimates are wildly different. The impact of wages (w) and industrial output (ys) double, that of capital (k) halves. Even though we are not particularly interested in the dynamics of this model, the choice we make there does affect the outcome. In other words, I feel the phrase "OLS is consistent even in the presence of serial correlation if you use the right standard errors" is a bit misleading. OLS will consistently estimate the parameters of your model under the assumption that it is correctly specified. Serial correlation does not change that, but (I think) its presence does challenge the "correctly specified" assumption (and consequently your beta estimates). Would you agree with that statement? I have no idea if it's actually true, it is more based on intuition than rigourous econometrics unfortunately.

          For what it's worth, the bottom four rows show the p-values for four different serial correlation tests which have "no serial correlation" as null hypothesis.* Without any other information, would the estimates in differences then be more credible than the other two?

          Code:
          use "http://www.stata-press.com/data/r7/abdata.dta", clear
          xtreg n w k ys yr*, fe
          xtreg n w k ys yr* id#c.year, fe
          xtreg D.n D.(w k ys) i.year, fe
          * Available on ssc as -xtqptest-, -xthrtest- and -xtistest- for those interested

          Comment


          • #6
            Interesting discussion. It seems true that (cluster-)robust standard errors are often seen as a panacea in the presence of serial correlation. At the end, it stands and falls with the assumptions we make. With fixed T, under the assumption that the regressors are all strictly exogenous with respect to the idiosyncratic errors, OLS (with fixed effects) is indeed consistent and unbiased and robust standard errors allow valid inference. While serial correlation can hint towards a model misspecification, it usually does not tell us in which direction the model is misspecified. It could be neglected trends as in your example, neglected dynamics, omitted variables, measurement error, ...

            Such model misspecification could imply that the strict exogeneity assumption of the regressors is invalid. Standard fixed-effects and differencing procedures would then no longer be consistent. What makes everything even more complicated is the fact that your serial correlation tests also rely on the assumption of strict exogeneity and your test results become invalid if strict exogeneity does not hold.

            At the end, every model is in some way misspecified. More complicated models can help to add robustness by relaxing some assumptions but this often comes at the cost of efficiency. (For example, you need to estimate many more parameters in your model with industry-specific trends.) More complicated models can even require new assumptions for the consistency of the estimators (for example, when estimating a short-T dynamic panel model with a lagged dependent variable) and we might do more harm than good by moving away from the simple model.

            That said, I am not against model specification tests. Actually, the opposite is true and I personally believe that serial correlation tests can be quite helpful in the process of finding a reasonable model specification. But we should keep the limitations in mind and not blindly follow the results of these tests.
            https://twitter.com/Kripfganz

            Comment


            • #7
              Dear Jesse Wursten,

              Each of your 3 models has a different set of conditioning variables and therefore they lead to very different results; this is natural and has nothing to do with serial correlation. Given a dependent variable y, several models may be correctly specified in the sense that they allow you to answer an interesting question; some models may have serial correlation and others possibly not. So, not having serial correlation is not necessarily an indication that the model will be useful to answer the question we are interested in.

              Best wishes,

              Joao

              Comment


              • #8
                Originally posted by Joao Santos Silva View Post
                Dear Jesse Wursten,

                Each of your 3 models has a different set of conditioning variables and therefore they lead to very different results; this is natural and has nothing to do with serial correlation.
                Isn't that exactly the question though? If we take the example from above, we might be interested in knowing how strongly employment reacts to changes in wages. All three models try to provide an answer to the same question, they differ only in the way they model the problem. I suppose the thing I'm wondering is whether serial correlation statistics are at all useful in this situation. If this were time series, we would do tests for serial correlation, unit roots and possibly cointegration and adjust our model accordingly. In panel studies this seems to be completely absent. I wonder whether this is just the way it is ("there are no meaningful statistical tests possible in panel studies"), or whether this is a shortcoming of the literature.

                In this situation the results at least point in the same direction, but I've seen fields where small changes to specification lead to opposite conclusions. I think we definitely should not see statistical tests as some holy grail, mainly for the reasons Sebastian Kripfganz mentioned - tests will always be of limited reliability and often affected by the same issues polluting your coefficient/variance estimates. But do they offer no value at all? In the end, the choice between models now often seems to be made based on tradition, intuition, vague stories or simply what fits the narrative best...

                Comment


                • #9
                  Dear Jesse Wursten,

                  I think there are several issues being confounded here.

                  a)
                  I suppose the thing I'm wondering is whether serial correlation statistics are at all useful in this situation.
                  I do not think so: serial correlation is not the issue here. One would need to think carefully about what regressors to include and which ones to exclude from the conditioning set. This is a difficult part of the process because we want do avoid omitted variables bias but we also want to avoid what I call "included variables bias"; the bias that results from including regressors we should not be conditioning on. So, what we need to do is to define carefully what we want to estimate; possible serial correlation is not an issue unless part of our objective is to have a model that is dynamically complete.

                  b)
                  If this were time series, we would do tests for serial correlation, unit roots and possibly cointegration and adjust our model accordingly.
                  Indeed. With time-series, the purpose is often to construct dynamically complete models and in that case we cannot have serial correlation. Even if that is not the case, we need to worry about serial correlation because even to compute HAC standard errors we need to know something about the serial correlation.

                  c)
                  In panel studies this seems to be completely absent. I wonder whether this is just the way it is ("there are no meaningful statistical tests possible in panel studies"), or whether this is a shortcoming of the literature.
                  With fixed T panels we do not need to worry about unit roots and cointegration. Moreover, we can compute robust standard errors without little or no knowledge of serial correlation. Other statistical tests may, however, be useful; see below.

                  d)
                  But do they [specification tests] offer no value at all? In the end, the choice between models now often seems to be made based on tradition, intuition, vague stories or simply what fits the narrative best...
                  I tend to share your feeling that often people just choose the model they like rather that trying to find a suitable model. Specification tests are not sacred but are useful. For example, I am a great fan of the RESET test (which is not a test for omitted variables!).

                  e)
                  I think that part of your confusion is caused by the fact that the way econometrics is taught is still heavily influenced by the way econometrics was done in the 60s. In those times, econometrics was dominated by time series data and one of the few specification tests available was the DW. In that context, serial correlation was a likely indication of spurious regressions and therefore it was a major thing. Unfortunately, most textbooks still reflect this situation and ignore that nowadays econometrics is much broader and therefore serial correlation may or may not be a problem.

                  Best wishes,

                  Joao

                  Comment


                  • #10
                    Dear Joao Santos Silva

                    Thank you very much for the comments. I am not sure if I fully agree with your view, however it has been very informative nonetheless.

                    Comment


                    • #11
                      I want to test serial correlation in panel data model. when I use -xtserial- I reject H0: , although when I use -xtqptest- or -xthrtest- after -xtreg. fe- I cann't reject H0: "no first order serial correlation". what it does means? have I serial correlation or not?

                      Comment

                      Working...
                      X