Logistic regression: adjust numerical variables by subtracting sample mean before coefficients calculation?

Nils Oddvar Skaga

Join Date: Dec 2014

Posts: 4
#1

Logistic regression: adjust numerical variables by subtracting sample mean before coefficients calculation?

08 Dec 2014, 15:18

I intend to develop a survival prediction model for outcome following major trauma.
Dependent variable: survival vs. death 30 days after injury. Independent variables: age (numeric), anatomical injury (numeric), physiological derangement on admission (numeric), and pre-injury comorbidity (categorical).

In a recent publication I have read the following statement: "For numerical variables, the statistical package Stata adjusts the variables by subtracting the sample mean before model coefficients are calculated".

I have performed the same logistic regression (same data set) both in Stata 11.2 and in JMP (SAS) and obtains the same Odds ratios and the same coefficients, exactly as expected. I have not done any maneuver to subtract the sample mean to adjust the numerical variables.

"...subtracting the sample mean before model coefficients are calculated"; Is that something that happens in Stata (and JMP, seeing that I have the same results of the logistic regression in both packages) independent of whether or not the user have knowledge of such adjustments? Can anyone help me understand the statement above? Do I have to do any maneuver in Stata to adhere to this, or should I use the regression coefficients for numerical variables as presented in the output when presenting my new model?
I would appreciate if anyone could help me understand.

Nils Oddvar Skaga

Last edited by Nils Oddvar Skaga; 08 Dec 2014, 15:26.
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 4992
#2

08 Dec 2014, 15:35

I have no idea what your source is talking about. Stata does not do that. What is your source?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Nils Oddvar Skaga

Join Date: Dec 2014

Posts: 4
#3

09 Dec 2014, 02:51

Hello Richard,

I have struggled with this topic for ours. I do also have access to the statistics package JMP 11.2 from SAS, and posted a similar question in the JMP forum yesterday.

I now understand that this has to do with centering of continuous data, and that it is recommended in the following situations:

1. To lessen the correlation between a multiplicative term (interaction or polynomial term) and its component variables.

2. To make interpretation of parameter estimates easier.

The answer from Julian Parris in the JMP forum has the following link: https://community.jmp.com/message/213910#213910 That answer gave me new insight.

When also able to perform a more precise Google-search (centering of numerical data) I also found this useful link today: http://www.theanalysisfactor.com/whe...in-regression/

Best wishes from

Nils Oddvar Skaga
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#4

09 Dec 2014, 04:14

noskaga (please, see FAQ 6 about the preference on this forum for real full names and re-register accordingly. Thanks):
centering variables around their mean (or other meaningful value) is easy in Stata:

Code:

sum <yourvariable> g mean_cent_var=<yourvariable> - r(mean)

Kind regards,
Carlo
(Stata 19.0)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#5

09 Dec 2014, 07:56

As Carlo notes, centering is easy to do. But it isn't done automatically by Stata. I'll add to his note that if, say, you are only analyzing a subsample, your sum statement should include some sort of if qualifier.

Centering isn't generally necessary. Collinearity is rarely, if ever, a problem, but if you were having trouble getting the model to converge it might be handy. A more common problem may be if, say, you have squared terms. You may want to rescale the variable (e.g. measure income in thousands of dollars rather than in dollars) both to make the coefficients easier to read and because Stata might have problems if the numbers get really huge.

Centering can be an aid to interpretation but there are other ways to achieve the same goals. For a discussion of centering, see

http://www3.nd.edu/~rwilliam/stats2/l53.pdf

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Thomas Stiles

Join Date: Mar 2017

Posts: 4
#6

14 Oct 2017, 22:53

Originally posted by Richard Williams View Post

For a discussion of centering, see

http://www3.nd.edu/~rwilliam/stats2/l53.pdf

That is a very helpful link. I get the impression that you wouldn't center a time variable or any other variable which has a meaningful zero, yes?

For a logistic regression, would you report just the effect on the interaction term or also include effects on the independent variables as well? If the latter case, what is a standard approach to interpreting the main effects on independent variables? Do you try to interpret them relative to the interaction?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#7

15 Oct 2017, 08:15

Thomas Stiles ,the key is to have meaningful zero points. The mean isn't always meaningful, and even if it is there may be other choices that are better.

Yes, I would report both main effects and interactions. If anything the main effects will be easier to interpret after centering. The handout linked to goes over that. If you are mean-centering, then the main effect of X is the effect of X for an average person.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

Logistic regression: adjust numerical variables by subtracting sample mean before coefficients calculation?

Comment

Comment

Comment

Comment

Comment

Comment