Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression: adjust numerical variables by subtracting sample mean before coefficients calculation?

    I intend to develop a survival prediction model for outcome following major trauma.
    Dependent variable: survival vs. death 30 days after injury. Independent variables: age (numeric), anatomical injury (numeric), physiological derangement on admission (numeric), and pre-injury comorbidity (categorical).

    In a recent publication I have read the following statement: "For numerical variables, the statistical package Stata adjusts the variables by subtracting the sample mean before model coefficients are calculated".

    I have performed the same logistic regression (same data set) both in Stata 11.2 and in JMP (SAS) and obtains the same Odds ratios and the same coefficients, exactly as expected. I have not done any maneuver to subtract the sample mean to adjust the numerical variables.

    "...subtracting the sample mean before model coefficients are calculated"; Is that something that happens in Stata (and JMP, seeing that I have the same results of the logistic regression in both packages) independent of whether or not the user have knowledge of such adjustments? Can anyone help me understand the statement above? Do I have to do any maneuver in Stata to adhere to this, or should I use the regression coefficients for numerical variables as presented in the output when presenting my new model?
    I would appreciate if anyone could help me understand.

    Nils Oddvar Skaga
    Last edited by Nils Oddvar Skaga; 08 Dec 2014, 15:26.

  • #2
    I have no idea what your source is talking about. Stata does not do that. What is your source?
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Hello Richard,

      I have struggled with this topic for ours. I do also have access to the statistics package JMP 11.2 from SAS, and posted a similar question in the JMP forum yesterday.

      I now understand that this has to do with centering of continuous data, and that it is recommended in the following situations:

      1. To lessen the correlation between a multiplicative term (interaction or polynomial term) and its component variables.

      2. To make interpretation of parameter estimates easier.

      The answer from Julian Parris in the JMP forum has the following link: https://community.jmp.com/message/213910#213910 That answer gave me new insight.

      When also able to perform a more precise Google-search (centering of numerical data) I also found this useful link today: http://www.theanalysisfactor.com/whe...in-regression/

      Best wishes from

      Nils Oddvar Skaga

      Comment


      • #4
        noskaga (please, see FAQ 6 about the preference on this forum for real full names and re-register accordingly. Thanks):
        centering variables around their mean (or other meaningful value) is easy in Stata:
        Code:
        sum <yourvariable>
        g mean_cent_var=<yourvariable> - r(mean)
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          As Carlo notes, centering is easy to do. But it isn't done automatically by Stata. I'll add to his note that if, say, you are only analyzing a subsample, your sum statement should include some sort of if qualifier.

          Centering isn't generally necessary. Collinearity is rarely, if ever, a problem, but if you were having trouble getting the model to converge it might be handy. A more common problem may be if, say, you have squared terms. You may want to rescale the variable (e.g. measure income in thousands of dollars rather than in dollars) both to make the coefficients easier to read and because Stata might have problems if the numbers get really huge.

          Centering can be an aid to interpretation but there are other ways to achieve the same goals. For a discussion of centering, see

          http://www3.nd.edu/~rwilliam/stats2/l53.pdf
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Originally posted by Richard Williams View Post
            For a discussion of centering, see

            http://www3.nd.edu/~rwilliam/stats2/l53.pdf
            That is a very helpful link. I get the impression that you wouldn't center a time variable or any other variable which has a meaningful zero, yes?

            For a logistic regression, would you report just the effect on the interaction term or also include effects on the independent variables as well? If the latter case, what is a standard approach to interpreting the main effects on independent variables? Do you try to interpret them relative to the interaction?

            Comment


            • #7
              Thomas Stiles ,the key is to have meaningful zero points. The mean isn't always meaningful, and even if it is there may be other choices that are better.

              Yes, I would report both main effects and interactions. If anything the main effects will be easier to interpret after centering. The handout linked to goes over that. If you are mean-centering, then the main effect of X is the effect of X for an average person.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment

              Working...
              X