Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Full or partial Standardization?

    Hi everyone,
    Concerning variables standardization, do we have necessarily to do a full standardization (I mean for all numeric variables, except dummies of course)? or it's possible to standardize only variables that are particularly large compared to the rest of variables ?
    Thanks

  • #2
    There is no general need to standardize at all. In my opinion it is done way to often for no good reason.

    Best
    Daniel
    Last edited by daniel klein; 23 Feb 2017, 07:55.

    Comment


    • #3
      yes but, Gujarati, Wooldridge, etc... talk about standardization : "Sometimes, it is useful to obtain regression results when all variables involved, the
      dependent as well as all the independent variables, have been standardized." Wooldridge, 5th ed, p189.

      Comment


      • #4
        Note

        Sometimes
        does not mean you should always do that. I am sure Jeff Wooldridge and colleagues give excellent examples of when it is useful or why. I assume sometimes I would agree and sometimes I would not, but this can really only be judged in the context of the research questions.

        Your initial post implies that you are concerned about the estimation of your model. I would guess that StataCorp writes its estimators in a way that handles potential problems that could arise because of huge differences in the scaling of variables. For regress I know that this is so. But Maarten Buis has repeatedly suggested to re-scale variables in non-linear models so they fall in plausible ranges to facilitate convergence. For that centering would usually suffice, though.

        Best
        Daniel

        Comment


        • #5
          I think the most important word in your quote from Wooldridge is "sometimes," and the next most important word is "useful." And I think you should actually think about each variable and consider what would be gained or lost be standardizing it. The general considerations are:

          1. When the variable X has no natural or conventional units of measurement, and is just on an arbitrary scale, then marginal effects are of dubious value because there is no well-understood meaning of a unit change in X. On the other hand, when the variable X does have natural or conventional units, the marginal effect of a standardized variable will be confusing because, while everybody knows what a unit change in X is, a 1 SD change in X is some mysterious number that only you (the data analyst) know!

          2. Sometimes it is desired to compare the coefficients or marginal effects of two variables that are difficult to compare because they are measured in different units or on different scales. In that situation, it may be helpful to standardize those variables to reduce their incommensurability. This approach works best when the variables concerned have distributions that are of similar overall form, but perhaps differ in location and scale. When the variables have substantially different forms for their distributions, the comparison of standardized coefficients (or marginal effects of standardized variables) remains questionable.

          Certainly there is no reason to just arbitrarily standardize every variable in an equation. Each variable should be judged on its own merits.

          Added: Crossed with Daniel's post. With regard to rescaling variables in non-linear models, I have to disagree with Daniel's final assertion that centering will suffice for the purpose. The convergence difficulties that nonlinear models sometimes experience when the variables' scales differ by several orders of magnitude require an actual rescaling to overcome, in my experience. I hasten to note, however, that this has nothing to do with standardizing. Rescaling usually means things like changing a variable denominated in dollars to one denominated in millions of dollars or something like that.

          That said, just to be clear, I am in general agreement with Daniel that standardization is widely overused in practice and that in most circumstances it does more harm than good. I agree that it should only be done when there is a really good reason to do it. It certainly should not be considered "routine" or "automatic."
          Last edited by Clyde Schechter; 23 Feb 2017, 09:51.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            With regard to rescaling variables in non-linear models, I have to disagree with Daniel's final assertion that centering will suffice for the purpose. The convergence difficulties that nonlinear models sometimes experience when the variables' scales differ by several orders of magnitude require an actual rescaling to overcome, in my experience. I hasten to note, however, that this has nothing to do with standardizing. Rescaling usually means things like changing a variable denominated in dollars to one denominated in millions of dollars or something like that.
            That makes perfect sense. Thanks for pointing this out.

            Best
            Daniel

            Comment


            • #7
              very interesting....thanks you both for all these clarifications.
              I'm aware that we don't have to standardize automatically (Wooldridge said : In a standard OLS equation, it is not possible to simply look at the size of different coefficients and conclude that the explanatory variable with the largest coefficient is “the most important.” We just saw that the magnitudes of coefficients can be changed at will by changing the units of measurement of the xj. But, when each xj has been standardized, comparing the magnitudes of the resulting beta coefficients is more compelling."

              In my model for example, I have panel data with explicative variables such as turnover, equity and age of firms. Do I have to standardize just turnover and equity data? or no need?

              Comment


              • #8
                If these have a "natural" scale that can be interpreted, then I would not standardize. See Clyde's point 1.

                Best
                Daniel

                Comment


                • #9
                  Well, if your research goal is, for example, to specifically compare the effects of turnover and equity on some outcome(s), then consideration to standardizing both could be given. The two variables clearly are not commensurable, so any direct comparison of unstandardized coefficients would be sensitive to the particular units of measurement for each variable. But think beyond that. If you standardize them, your equity variable will now be denominated in standard deviations rather than in dollars (or other currency units). Similarly your turnover variable. If you now conclude that a 1SD change in equity is associated with a larger (resp. smaller) change in expected outcome than is a 1 SD change in turnover, what does that mean? How big a change in equity is a 1SD change in equity? How big a change in turnover is a 1SD change in turnover? Even if you can answer those questions satisfactorily, what are the implications of that? Is each of these equally under control of the firm? If one of them is easily modified and the other is not, then of what importance is it to say that their impacts, in this strange metric, are "equal" or "differ by some ratio?" I do not work in finance or economics, so I do not know the answers to these questions. From my lay perspective, I can only say that this looks like a situation where standardization would only obfuscate rather than clarify the situation, and the very goal of directly comparing their marginal effects seems misguided. But these are questions that you as content expert have to decide, often wisely done in collaboration with colleagues from your own field.

                  Actually, perhaps the best advice you could get would be from the future consumers of your research. Presumably they will be interested in your results to the extent that they can guide their actions based on them. Would standardized or conventional unit results be more understandable? More actionable? If you presented the results both ways, which set of results would they look at and say, "aha, yes, now I understand what to do."

                  Added: Crossed with Daniel again. My response here is basically his terse version elaborated, except that I am leaving open the possibility that I do not understand the situation well enough to be certain.

                  Comment


                  • #10
                    perfectly clear... and I agree, standardization will complicate any interpretation.
                    Thanks a lot, I know what to do now.

                    Comment

                    Working...
                    X