Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Log transformation of a centered variable

    So I have a set of predictor variables that I centered first. Because of which, there are a lot of negative values introduced for each variable. So when I log transform, all those negative values turn into missing values (obviously because one can not take a log of a negative value). My question then is that is it not possible to take logs for centered variables?

  • #2
    I don't have much experience in using logarithmic transformation of predictors, but, in order to maintain the functional relationship, wouldn't you take the log first and then center?

    Comment


    • #3
      I tried that- taking log first and then centering. In that case, centering has absolutely no effect on the data, collinearity or regression results.

      Comment


      • #4
        Originally posted by cherry singhal View Post
        I tried that- taking log first and then centering. In that case, centering has absolutely no effect on the data, collinearity or regression results.
        I'm not sure what you're expecting centering to do.

        I wouldn't center variables in order to somehow affect data or regression results, but rather to aid in interpretation.

        As to collinearity, I would center a variable before making a quadratic term of it in order to help avoid inducing collinearity, for example, but I wouldn't expect that two variables that are collinear before centering to be materially affected by subtracting a constant.

        Comment


        • #5
          in general, you should expect that centering will affect the constant only; only if you have a polynomial should you expect an impact on "collinearlty"

          further, why center? a lot of people have trouble with negative numbers; an alternative that is often as effective is just to subtract the mean value from each observations' value (which still has the advantage of making the constant meaningful - sometimes more meaningful than centering using the mean which may not be very generalizable)

          also, in general, I prefer not to log transform; why do you want to do it here? either poisson regression (see Bill Gould's blog: http://blog.stata.com/tag/poisson-regression/) or a glm is a better bet

          Comment


          • #6
            To answer your questions, This is what I am doing -
            1) For my panel dataset, I am running a random coefficient linear regression (OLS) model (hence the log-transformation of the entire equation); specifically I am running the xtrc program.
            2) All my predictors are linear but 3 out of 5 predictors are highly correlated and with high vifs.
            3) The model is not running and throwing an error (one of my other posts on this forum) because of multicollinearity. Based on the error, it seems the multicollinearity is at the panel level.
            4) When I center my variables before taking logs, the model runs successfully; the vifs go down significantly; however because centering generates lots of negative values that are not able to be log-transformed (understandable) & turn into missing values, I end up losing 95% of my observations.
            So either way, I am stuck and I am trying to find a workaround .

            I would appreciate any further suggestions! Thanks much!


            Comment


            • #7
              I am quite happy to take logarithms of predictors depending on the situation, but subtracting the mean first really does make no sense whatsoever. I can't see that there is a positive motive for it, and the downside is indefensible.

              The logarithms of negative numbers are defined but not useful statistically, so as you say you lose much of the data. (Losing any at all would be hard to defend!)

              I am surprised that there is any doubt about that -- this is a point from secondary school mathematics -- but you appear to seek confirmation and my advice is Don't do that then!

              Comment


              • #8
                Just thinking, is there any particular reason to prefer 'xtrc' instead 'mixed'? Have you tried your model with 'mixed' command? Try 'mixed' without doing anything to your independent variables. If the standard errors are consistently estimated, there is nothing to worry about multi-collinearity.

                Probable code for your model (Stata-version:14.2)

                Code:
                mixed outcome ind_var1 ind_var2 ind_var3 || panel_var:, //If using Stata version<=12, then use xtmixed instead 'mixed'


                Roman

                Comment


                • #9
                  Thank you for the suggestion, Mr. Roman. I am going to try mixed. I also did not know that xtmixed is same as mixed.

                  To answer your question, I was looking for panel-data regression models from Stata manual that let you run "random coefficients model". I came across xtrc and I started to use it. My panel data has firms and years so my "panel_var" is firmID and my "time_var" is year. I just want to consider the level-1 random coefficient for now, which mixed command would let me do that. The reason I want random coefficient is that I assume firm heterogeneity; and so coefficients of predictors are non-fixed across firms over time.

                  I have a question regarding the syntax you provided. I modified the command to use random effects instead of fixed effects (hopefully correctly) -
                  Code:
                   mixed outcome || panel_var: ind_var1 ind_var2 ind_var3


                  But my question is that mixed would consider both intercept and slope as random while xtrc considers only the intercept as random. Is there way to specify "only intercept to be considered random" using mixed? Please comment thanks.



                  Thank you Nick for your reply. I was going the wrong way to find a desperate solution to tackle multicollinearity.

                  Comment


                  • #10
                    Originally posted by cherry singhal View Post
                    My panel data has firms and years so my "panel_var" is firmID and my "time_var" is year.
                    Your data suits mixed model. See the help file for mixed before embarking on any analysis. Type
                    Code:
                     help mixed
                    Originally posted by cherry singhal View Post
                    I just want to consider the level-1 random coefficient for now
                    As suggested, read the help file first. There is nothing like level-1 random coefficient. At level-1 we estimate the fixed parameters (coefficients) for our independent variables. At upper level, we estimate the random intercepts and slopes. The scope to discuss all these here is very limited. But any good book or the help file should guide you.

                    Originally posted by cherry singhal View Post
                    The reason I want random coefficient is that I assume firm heterogeneity; and so coefficients of predictors are non-fixed across firms over time.
                    Thats right and that is why you need mixed, which will allow you to fit the random slopes of your predictors for the firms.

                    Originally posted by cherry singhal View Post
                    I have a question regarding the syntax you provided. I modified the command to use random effects instead of fixed effects (hopefully correctly) -
                    Code:
                     mixed outcome || panel_var: ind_var1 ind_var2 ind_var3
                    Wrong, you need them as fixed first and then as random. The correct code is:

                    Code:
                     mixed outcome ind_var1 ind_var2 ind_var3 || panel_var: ind_var1 ind_var2 ind_var3
                    Having several random slopes may encounter convergence problem. Try one-by-one, see the changes in the results.

                    Originally posted by cherry singhal View Post
                    Is there way to specify "only intercept to be considered random" using mixed?
                    This contradicts with your assumption of heterogeneity. However, if you only want random intercepts, just ignore the random slopes. Mixed will estimate the random intercepts.
                    Code:
                     mixed outcome ind_var1 ind_var2 ind_var3 || panel_var:
                    Above all, I think you need to be clear about the whole subject first. By the way, your mention of 'fixed-effect' has a different preservation which allows only within cluster variation and ignores between. If that is something you want, then that is a different story. Mixed won't do that for you.

                    Roman

                    Comment


                    • #11
                      Thank you much for the detailed response. You are right. I am learning by doing; do not have a formal background in statistics. I am going to follow your suggestions.

                      I might be phrasing things incorrectly, but I am aiming to estimate the following equation for firm i and year t with random coefficients for all inputs as well as the intercept -

                      ln(outcome)it = (B0+B0i) + (B1+B1i)*ln(ind_var1)it + (B2+B2i)*ln(ind_var2)it + Ui + Eit

                      Thank you again.


                      Comment


                      • #12
                        This is not a fixed-effect equation, therefore, you are fine with mixed. As I said, read the help file and consult a good book. According to the equation, you will need post-estimation commands like, reffects, rfitted after you fit the mixed model.

                        Roman

                        Comment


                        • #13
                          Thank you much!

                          Comment

                          Working...
                          X