correlate against lags vs corrgram

Alvaro Fuentes

Join Date: May 2014

Posts: 21
#1

correlate against lags vs corrgram

02 Feb 2017, 16:21

Say I have a time series called y. I generate lags of it using the commands:

generate ly = L.y
generate l2y = L2.y
generate l3y = L3.y

and so forth. I then produce a table of correlations of the original variable against its lags, using the command:

correlate y ly l2y l3y...

and so forth. I would like to know why the correlations reported by the correlate command are different from the autocorrelations reported by the command:

corrgram y

Thanks!
Tags: autocorrelation, correlogram, lags, Time Series
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

02 Feb 2017, 17:33

I would hazard a guess that correlate will by default exclude any observation for which one or more variables is missing - and thus in your example would omit the first three observations since L3.y is missing for them - corrgram uses all the observations available for each autocorrelation, so that the correlation of y with L.y will omit only the first observation.

Perhaps pwcorr would replicate the results of corrgram.
1 like
Comment
Alvaro Fuentes

Join Date: May 2014

Posts: 21
#3

03 Feb 2017, 08:58

Thanks for your reply William. Unfortunately, that doesn't seem to be the case. I tried:

1) Using pwcorr as you suggested: pwcorr y ly l2y l3y...

2) Computing pairwise correlations one by one:
correlate y ly
correlate y l2y
correlate y l3y

The correlations I get are still quite different from the correlations reported under the AC column in the corrgram table. This is most noticeable for distant lags, where, for example:

correlate y l10y yields: 0.9978
corrgram reports: 0.8879

correlate y l20y yields: 0.9964
corrgram reports: 0.7740

correlate y l30y yields: 0.9954
corrgram reports: 0.6606

correlate y l40y yields: 0.9947
corrgram reports: 0.5488

The variable I'm working with is the natural log of US quarterly GDP (2005 USD), and my dta file is available here:

https://drive.google.com/open?id=0B0...XlFM19PaU1tYzA

Any further suggestions would be greatly appreciated.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35701
#4

03 Feb 2017, 09:11

I think you're thinking that autocorrelation is, or should be, calculated as an exact analogue of correlation, namely as

cov(series, series displaced) / [sd(series) sd(series_displaced)].

But it isn't calculated that way in corrgram (or typically in statistical software, so far as I know). See the Methods and formulas section of [R] corrgram. The numerator, the autocovariance function, is produced by dividing by the sample size, not the the number of paired terms in the covariance. And the denominator is not the product of separate terms: it is a variance for the series as a whole.

The estimator is used, loosely, because it behaves better, not just in estimating autocorrelation, but also when used in spectrum estimation.

As you report, the difference is bigger for longer lags.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

03 Feb 2017, 09:26

Well, guessing didn't help here. When in doubt, refer to the documentation. Which I did before I wrote, but didn't understand the subtleties until they became unavoidable.

A look at the documentation for corrgram in the Stata Time-Series Reference Manual PDF included with the Stata insallation, in particular at the Methods and formulas section, show us at least part of the difference. In defining the autocovariance, it's clear that the concept of autocorrelation of a time series is more subtle than just computing the correlation between values at different lags.

Notice that the autocorrelation at lag v of a time series having n values is computed using the mean of all n values, but only n-v pairs of differences from the mean.

When we use correlate, both the mean and differences are calculated on just n-v values.

Note: crossed with Nick's more technically savvy answer.
1 like
Comment
Alvaro Fuentes

Join Date: May 2014

Posts: 21
#6

03 Feb 2017, 10:03

I understand now. Thank you both!
Comment

Announcement

correlate against lags vs corrgram

Comment

Comment

Comment

Comment

Comment