Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does it make sense to do both standardization and logarithm transformation?

    I am running panel regression with one of the independent variables being GDP p.c.. Given that I have decided to standardize GDP p.c., shall I also do logarithm transformation?
    In case that they can be done together, should logarithm transformation be done before or after standardization?

    Thank you very much!
    Last edited by Alex Mai; 14 Nov 2017, 10:10.

  • #2
    You can't possibly do a logarithm transformation after standardization because about half of the standardized values will be 0 or negative,hence have no logarithm.

    You could standardize after log-transforming, but it is really difficult for me to imagine a situation where that would do anything useful, and it is very easy for me to think of situations where it would make things very confusing. Why are you even thinking of doing this?

    Comment


    • #3
      Alex:
      I do share Clyde's wise concerns.
      Logging the dependent variable for the usual reasons (interpreting the contribution of each predictor as 100b1% change in the dependent variable, coeteris paribus) makes sense, logging one or more predictors keeping the continuos dependent variables in its original metric can work and logging the dependent variable (Y) and, say, one predictor (X) is useful for measuring the elasticity of Y vs X, but I cannot envisage the need for standardizing.
      More helpful replies woud probably follow some more details about the regression model you have in mind (linear-log; log-log; log-linear?).
      Kind regards,
      Carlo
      (Stata 15.1 SE)

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        You can't possibly do a logarithm transformation after standardization because about half of the standardized values will be 0 or negative,hence have no logarithm.

        You could standardize after log-transforming, but it is really difficult for me to imagine a situation where that would do anything useful, and it is very easy for me to think of situations where it would make things very confusing. Why are you even thinking of doing this?
        Dear Clyde,
        Many thanks! I see your point (I ignored that standardization generates lots of negative values). Because my independent variables vary very much in terms of scale and unit, regression based on unstandardized variables gives extraordinarily big coefficient for particular variable. For instance, some coeffs are below 0.1, but another one is larger than 5.

        Btw, if the original data of a variable (dependent variable) have some negative values, can I standardize this variable?
        Last edited by Alex Mai; 14 Nov 2017, 10:33.

        Comment


        • #5
          Standardization isn't the only way to deal with awkward-sized coefficients. A simple change of units often suffices.

          Larger than 5 is not what I would call "extraordinarily big".

          It's sometimes forgotten that as gradients, coefficients (typically) have different units any way, namely units of response / units of predictor.
          Last edited by Nick Cox; 14 Nov 2017, 10:42.

          Comment


          • #6
            Because my independent variables vary very much in terms of scale and unit, regression based on unstandardized variables gives some extraordinarily big coefficients. For instance, some coeffs are below 0.1, but another one is larger than 5.
            Well, at least with no context, I don't see why this is a problem. But, assuming it is a problem, log transformation is not an appropriate solution. Log transformation should be used when the relationship between predictors and outcome is log-linear rather than linear. The relative sizes of coefficients has nothing to do with that.

            Standardization does not necessarily result in the coefficients of the regression being more comparable in scale. And standardized variables usually just make things confusing. What does it mean to say that a 1 SD increase in, say, the proportion of the country that is literate, has a certain effect on GDP? Unless the audience is intimately familiar with the distribution of proportions of literate people in countries, you are just obfuscating what should be simple and straightforward. Standardizing variables makes sense primarily when the variables in question have no inherent units and are measured on an arbitrary scale. (Even then, it is only sometimes a good idea.)

            If the real problem you have is coefficients with different orders of magnitude (and, again, I don't see why this is necessarily a problem), then the most direct solution wold be to rescale some of the variables. So, for example, if one of the predictors is in units of Euros and it has an unsatisfactory large coefficient, then you might divide that variable by 1,000, say, so that it is now denominated in thousands of Euros, and the coefficient will decrease by a factor of 1,000.

            Added: Crossed with #5, where Nick Cox makes some of the same points.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              Standardization isn't the only way to deal with awkward-sized coefficients. A simple change of units often suffices.

              Larger than 5 is not what I would call "extraordinarily big".

              It's sometimes forgotten that as gradients, coefficients have different units any way, namely units of response / units of predictor.
              Dear Nick, thank you! actually the one with "extraordinarily big" coefficient has the same unit as some other variables (US dollar). But If I standardize everything, then the coefficient becomes less than 1.
              But I am not sure about if I can standardize dependent variable with negative values.

              Comment


              • #8
                Yes, you can standardize a variable that has negative values. But, please do think carefully about how you will explain your regression results when you use standardized variables. They are usually very confusing, and completely uninformative to those who are not already intimately familiar with the data.

                By contrast, nobody has any difficulty understanding a regression coefficient denominated in meters rather than cm, or thousands of Euros instead of euros, etc.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  Well, at least with no context, I don't see why this is a problem. But, assuming it is a problem, log transformation is not an appropriate solution. Log transformation should be used when the relationship between predictors and outcome is log-linear rather than linear. The relative sizes of coefficients has nothing to do with that.

                  Standardization does not necessarily result in the coefficients of the regression being more comparable in scale. And standardized variables usually just make things confusing. What does it mean to say that a 1 SD increase in, say, the proportion of the country that is literate, has a certain effect on GDP? Unless the audience is intimately familiar with the distribution of proportions of literate people in countries, you are just obfuscating what should be simple and straightforward. Standardizing variables makes sense primarily when the variables in question have no inherent units and are measured on an arbitrary scale. (Even then, it is only sometimes a good idea.)

                  If the real problem you have is coefficients with different orders of magnitude (and, again, I don't see why this is necessarily a problem), then the most direct solution wold be to rescale some of the variables. So, for example, if one of the predictors is in units of Euros and it has an unsatisfactory large coefficient, then you might divide that variable by 1,000, say, so that it is now denominated in thousands of Euros, and the coefficient will decrease by a factor of 1,000.

                  Added: Crossed with #5, where Nick Cox makes some of the same points.
                  Thank you again! Actually, when someone else saw the result, the first reaction is that the "extraordinarily big" result is problematic.
                  I will try to rescale, I did not come up with this idea.
                  Just another question about if standardisation can be done for variable with negative values? My concern is that standardization will change the original system of positive value and negative value and thus some originally positive values may become negative values after standardization.

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    Yes, you can standardize a variable that has negative values. But, please do think carefully about how you will explain your regression results when you use standardized variables. They are usually very confusing, and completely uninformative to those who are not already intimately familiar with the data.

                    By contrast, nobody has any difficulty understanding a regression coefficient denominated in meters rather than cm, or thousands of Euros instead of euros, etc.
                    Thank you! I think the thing is that the variable with "extraordinarily big" coefficient shares the same unit with some other variables that have very small coefficient (Dollar). I know that this may just be the reality, the true result. But I am not able to explain why a particular variable can have such big coefficient. But standardization can make the coefficients look like more "normal".

                    The standard deviation of my dependent variable is very large (coefficient of variance is about 4). Do you think standard panel regression technics can deal with such dependent variable?

                    Thank you again!

                    Comment


                    • #11
                      My concern is that standardization will change the original system of positive value and negative value and thus some originally positive values may become negative values after standardization.
                      Yes, that will happen, but it doesn't matter. If you take any regression model and just add a sufficiently large constant to the outcome variable so that all values are now positive, the only that will change in the results is the constant term. In terms of coefficients that reflect effects, everything remains the same.

                      The standard deviation of my dependent variable is very large (coefficient of variance is about 4). Do you think standard panel regression technics can deal with such dependent variable?
                      Yes, this is not a problem.

                      Comment


                      • #12
                        Originally posted by Clyde Schechter View Post
                        Yes, that will happen, but it doesn't matter. If you take any regression model and just add a sufficiently large constant to the outcome variable so that all values are now positive, the only that will change in the results is the constant term. In terms of coefficients that reflect effects, everything remains the same.


                        Yes, this is not a problem.
                        Dear Clyde,

                        Many thanks again!

                        Comment


                        • #13
                          Originally posted by Clyde Schechter View Post
                          Yes, that will happen, but it doesn't matter. If you take any regression model and just add a sufficiently large constant to the outcome variable so that all values are now positive, the only that will change in the results is the constant term. In terms of coefficients that reflect effects, everything remains the same.


                          Yes, this is not a problem.
                          Dear Clyde,

                          Sorry for bothering you again, but I just have two extra questions about standardization.

                          Firstly, I know that standardized dummy variable is meaningless. So for a logit regression, is it correct to standardize all numeric independent variables but leave the binary dependent variable and any binary/categorical independent variables unstandardized?

                          Secondly, I am not sure if this is a stupid question. But basically people say that standardization deals with differences in unit. So is it fine to standardize variable measured in percentage? My concern is that variable in percentage itself seems not to have "unit".

                          Many thanks!
                          Last edited by Alex Mai; 16 Nov 2017, 12:45.

                          Comment


                          • #14
                            Standardizing a binary variable is not meaningless. It's just unnecessary. A binary predictor is already as easy to think about as it could be, at least so long as values are coded 0 and 1 and there are some of each. .

                            Code:
                            . sysuse auto, clear
                            (1978 Automobile Data)
                            
                            . su foreign
                            
                                Variable |        Obs        Mean    Std. Dev.       Min        Max
                            -------------+---------------------------------------------------------
                                 foreign |         74    .2972973    .4601885          0          1
                            
                            . gen foreign2 = (foreign - r(mean)) / r(sd)
                            
                            . tab foreign2
                            
                               foreign2 |      Freq.     Percent        Cum.
                            ------------+-----------------------------------
                              -.6460338 |         52       70.27       70.27
                               1.526989 |         22       29.73      100.00
                            ------------+-----------------------------------
                                  Total |         74      100.00

                            Comment


                            • #15
                              Originally posted by Nick Cox View Post
                              Standardizing a binary variable is not meaningless. It's just unnecessary. A binary predictor is already as easy to think about as it could be, at least so long as values are coded 0 and 1 and there are some of each. .

                              Code:
                              . sysuse auto, clear
                              (1978 Automobile Data)
                              
                              . su foreign
                              
                              Variable | Obs Mean Std. Dev. Min Max
                              -------------+---------------------------------------------------------
                              foreign | 74 .2972973 .4601885 0 1
                              
                              . gen foreign2 = (foreign - r(mean)) / r(sd)
                              
                              . tab foreign2
                              
                              foreign2 | Freq. Percent Cum.
                              ------------+-----------------------------------
                              -.6460338 | 52 70.27 70.27
                              1.526989 | 22 29.73 100.00
                              ------------+-----------------------------------
                              Total | 74 100.00
                              Dear Nick,
                              Thank you! I see your point. So is it fine to use both standardized variables (for numeric variables) and unstandardized varaibles (dummy dependent variable, like in logistic regression, or dummy independent variables) in the same regression?
                              And is it also unnecessary to standardize categorical variables (e.g. 1, 2, 3)? just like dummy variable.

                              Comment

                              Working...
                              X