Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • correlate against lags vs corrgram

    Say I have a time series called y. I generate lags of it using the commands:

    generate ly = L.y
    generate l2y = L2.y
    generate l3y = L3.y

    and so forth. I then produce a table of correlations of the original variable against its lags, using the command:

    correlate y ly l2y l3y...

    and so forth. I would like to know why the correlations reported by the correlate command are different from the autocorrelations reported by the command:

    corrgram y

    Thanks!

  • #2
    I would hazard a guess that correlate will by default exclude any observation for which one or more variables is missing - and thus in your example would omit the first three observations since L3.y is missing for them - corrgram uses all the observations available for each autocorrelation, so that the correlation of y with L.y will omit only the first observation.

    Perhaps pwcorr would replicate the results of corrgram.

    Comment


    • #3
      Thanks for your reply William. Unfortunately, that doesn't seem to be the case. I tried:

      1) Using pwcorr as you suggested: pwcorr y ly l2y l3y...

      2) Computing pairwise correlations one by one:
      correlate y ly
      correlate y l2y
      correlate y l3y

      The correlations I get are still quite different from the correlations reported under the AC column in the corrgram table. This is most noticeable for distant lags, where, for example:

      correlate y l10y yields: 0.9978
      corrgram reports: 0.8879

      correlate y l20y yields: 0.9964
      corrgram reports: 0.7740

      correlate y l30y yields: 0.9954
      corrgram reports: 0.6606

      correlate y l40y yields: 0.9947
      corrgram reports: 0.5488

      The variable I'm working with is the natural log of US quarterly GDP (2005 USD), and my dta file is available here:

      https://drive.google.com/open?id=0B0...XlFM19PaU1tYzA

      Any further suggestions would be greatly appreciated.

      Comment


      • #4
        I think you're thinking that autocorrelation is, or should be, calculated as an exact analogue of correlation, namely as

        cov(series, series displaced) / [sd(series) sd(series_displaced)].

        But it isn't calculated that way in corrgram (or typically in statistical software, so far as I know). See the Methods and formulas section of [R] corrgram. The numerator, the autocovariance function, is produced by dividing by the sample size, not the the number of paired terms in the covariance. And the denominator is not the product of separate terms: it is a variance for the series as a whole.

        The estimator is used, loosely, because it behaves better, not just in estimating autocorrelation, but also when used in spectrum estimation.

        As you report, the difference is bigger for longer lags.

        Comment


        • #5
          Well, guessing didn't help here. When in doubt, refer to the documentation. Which I did before I wrote, but didn't understand the subtleties until they became unavoidable.

          A look at the documentation for corrgram in the Stata Time-Series Reference Manual PDF included with the Stata insallation, in particular at the Methods and formulas section, show us at least part of the difference. In defining the autocovariance, it's clear that the concept of autocorrelation of a time series is more subtle than just computing the correlation between values at different lags.

          Notice that the autocorrelation at lag v of a time series having n values is computed using the mean of all n values, but only n-v pairs of differences from the mean.

          When we use correlate, both the mean and differences are calculated on just n-v values.

          Note: crossed with Nick's more technically savvy answer.

          Comment


          • #6
            I understand now. Thank you both!

            Comment

            Working...
            X