Stata tip 118 - Orthogonalizing powered and product terms using residual centering

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#1

Stata tip 118 - Orthogonalizing powered and product terms using residual centering

01 Dec 2014, 17:35

For models with linear and squared terms, Stata tip 118 recommends regressing the square on the linear and then regressing the real Y on the linear and residuals of the first stage regression.

I'm having trouble seeing how this does anything except fix the reported colinearity diagnostics. Here is what I'm running:

clear
set obs 2000
g x=5 + runiform()
g xx=x*x
g y = x + xx + rnormal()
reg xx x
predict xxrc,resid

reg y x xxrc
reg y x xx

I do get a different parameter on the main effect (which I should since I've changed the zero point) but it is generally further from the parameter in the simulated data. It is closer to the parameter on x from running regress y x. I get identical parameter estimates and standard errors on the squared term.

What am I missing?

Phil
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3828
#2

02 Dec 2014, 03:29

I'm having trouble seeing how this does anything except fix the reported colinearity diagnostics.
[..].
What am I missing?

I do not believe it is supposed to do anything else. Like mean-centering, such approaches seem pretty useless to me, as the underlying problem of collinearity is a lack of information. As neither approach adds information to the data, I wonder what the rational behind this might be? For how well Stata really deals with collinear data see Bill Gould's explanation.

Best
Daniel
Comment
Doug Hemken

Join Date: Jul 2014

Posts: 219
#3

02 Dec 2014, 06:40

All of these models are equivalent, in the sense that they give you the same predicted values and same overall fit. One way to think of the differences is that they are just recentering the data to a more convenient? (statistically? computationally?) point, moving the zero in both the data space and the parameter space.

Interesting to note that mean-centering and "residualizing" move you into two different coordinates bases, but leave you with the same statistical judgements.

reg y x xxrc
predict y1
reg y x xx
predict y2

scatter y1 y2

summarize x
generate xc = x - r(mean)

reg y c.xc##c.xc
predict y3

graph matrix y1 y2 y3, half

Doug Hemken
SSCC, Univ. of Wisc.-Madison
Comment
Doug Hemken

Join Date: Jul 2014

Posts: 219
#4

02 Dec 2014, 12:59

One downside of residual centering is it makes it more difficult to interpret your model. Consider
regress price c.weight##c.weight

In the original units, the constant term tells you about the price where weight, and therefore, weight^2, are zero. Because the parameter estimate for the first order term depends on where zero is located, collinearity diagnostics depend on the location of zero as well.

A nice thing about the above specification is that it allows us to construct post-estimation that is aware of the connection between weight and weight^2, for example with margins.

margins, at(weight=(2000(100)5000))
marginsplot

If we center:
summarize weight
generate wc = weight - r(mean)
regress price c.wc##c.wc

our model is now expressed in deviation units of weight. The constant is now the price where wc and wc^2 are both zero, at the mean of weight. Our collinearity diagnostics now pertain to that point on the weight scale. We can still use margins gracefully. Notice that the plot is identical to the previous plot over the same range (rescaled to deviation units)

margins, at(wc=(-1000(100)2000))
marginsplot

Now try the same thing with residual centering
generate w2 = weight^2
regress w2 weight
predict w2dev, resid
regress price weight w2dev

Now we have broken the (easy) connection between the first order term and the second order term. One is scaled in original units, the second in a new kind of deviation unit: both are in dollars, but zero is in a different place on each scale. There is no point in the data space at which both weight=0 and w2dev=0. So what can the constant mean?

Now in order to use margins, we have to do a convoluted translation between one scale and the other: in general, w2dev = weight^2-(-8866487+6153.254*weight). Weight and it's linear regression deviation units are still analytically connected, just not in an easily interpretable way. In fact, the w2dev term brings together a constant, a weight term, and a weight^2 term, all by virtue of being able to factor out a common coefficient. The constant from the (second) regression is whatever is left to balance all that. It perhaps has some nice mathematical meaning, but I'm not aware of what it is.

This is the same awkwardness we run into when we try to standardize the coefficients in a model term-wise instead of variable-wise. We are making use of a legitimate linear transformation of the parameter space, but one that makes it hard to interpret the parameter space and the data space in the same terms. We have broken the tensored relation between the product term and its components.

To me it seems like residual centering is a technical expedient that might sometimes solve computational difficulties (the final model is equivalent, after all). For intrepretation is seems like a hack, giving you pieces of information from what would typically be a couple of different interpretive models (again, equivalents). For use with latent interaction models, could be a different story?

Doug Hemken
SSCC, Univ. of Wisc.-Madison
Comment

Announcement

Stata tip 118 - Orthogonalizing powered and product terms using residual centering

Comment

Comment

Comment