Log transformation of a centered variable

cherry singhal

Join Date: Nov 2016

Posts: 47
#1

Log transformation of a centered variable

20 Nov 2016, 16:52

So I have a set of predictor variables that I centered first. Because of which, there are a lot of negative values introduced for each variable. So when I log transform, all those negative values turn into missing values (obviously because one can not take a log of a negative value). My question then is that is it not possible to take logs for centered variables?
Tags: centering, log_transformation, panel data
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#2

20 Nov 2016, 17:13

I don't have much experience in using logarithmic transformation of predictors, but, in order to maintain the functional relationship, wouldn't you take the log first and then center?
Comment
cherry singhal

Join Date: Nov 2016

Posts: 47
#3

20 Nov 2016, 17:40

I tried that- taking log first and then centering. In that case, centering has absolutely no effect on the data, collinearity or regression results.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#4

20 Nov 2016, 18:58

Originally posted by cherry singhal View Post

I tried that- taking log first and then centering. In that case, centering has absolutely no effect on the data, collinearity or regression results.

I'm not sure what you're expecting centering to do.

I wouldn't center variables in order to somehow affect data or regression results, but rather to aid in interpretation.

As to collinearity, I would center a variable before making a quadratic term of it in order to help avoid inducing collinearity, for example, but I wouldn't expect that two variables that are collinear before centering to be materially affected by subtracting a constant.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#5

20 Nov 2016, 19:02

in general, you should expect that centering will affect the constant only; only if you have a polynomial should you expect an impact on "collinearlty"

further, why center? a lot of people have trouble with negative numbers; an alternative that is often as effective is just to subtract the mean value from each observations' value (which still has the advantage of making the constant meaningful - sometimes more meaningful than centering using the mean which may not be very generalizable)

also, in general, I prefer not to log transform; why do you want to do it here? either poisson regression (see Bill Gould's blog: http://blog.stata.com/tag/poisson-regression/) or a glm is a better bet
Comment
cherry singhal

Join Date: Nov 2016

Posts: 47
#6

20 Nov 2016, 22:28

To answer your questions, This is what I am doing -
1) For my panel dataset, I am running a random coefficient linear regression (OLS) model (hence the log-transformation of the entire equation); specifically I am running the xtrc program.
2) All my predictors are linear but 3 out of 5 predictors are highly correlated and with high vifs.
3) The model is not running and throwing an error (one of my other posts on this forum) because of multicollinearity. Based on the error, it seems the multicollinearity is at the panel level.
4) When I center my variables before taking logs, the model runs successfully; the vifs go down significantly; however because centering generates lots of negative values that are not able to be log-transformed (understandable) & turn into missing values, I end up losing 95% of my observations.
So either way, I am stuck and I am trying to find a workaround .

I would appreciate any further suggestions! Thanks much!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#7

21 Nov 2016, 01:50

I am quite happy to take logarithms of predictors depending on the situation, but subtracting the mean first really does make no sense whatsoever. I can't see that there is a positive motive for it, and the downside is indefensible.

The logarithms of negative numbers are defined but not useful statistically, so as you say you lose much of the data. (Losing any at all would be hard to defend!)

I am surprised that there is any doubt about that -- this is a point from secondary school mathematics -- but you appear to seek confirmation and my advice is Don't do that then!
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#8

21 Nov 2016, 10:48

Just thinking, is there any particular reason to prefer 'xtrc' instead 'mixed'? Have you tried your model with 'mixed' command? Try 'mixed' without doing anything to your independent variables. If the standard errors are consistently estimated, there is nothing to worry about multi-collinearity.

Probable code for your model (Stata-version:14.2)

Code:

mixed outcome ind_var1 ind_var2 ind_var3 || panel_var:, //If using Stata version<=12, then use xtmixed instead 'mixed'

Roman
Comment
cherry singhal

Join Date: Nov 2016

Posts: 47
#9

21 Nov 2016, 13:42

Thank you for the suggestion, Mr. Roman. I am going to try mixed. I also did not know that xtmixed is same as mixed.

To answer your question, I was looking for panel-data regression models from Stata manual that let you run "random coefficients model". I came across xtrc and I started to use it. My panel data has firms and years so my "panel_var" is firmID and my "time_var" is year. I just want to consider the level-1 random coefficient for now, which mixed command would let me do that. The reason I want random coefficient is that I assume firm heterogeneity; and so coefficients of predictors are non-fixed across firms over time.

I have a question regarding the syntax you provided. I modified the command to use random effects instead of fixed effects (hopefully correctly) -

Code:

mixed outcome || panel_var: ind_var1 ind_var2 ind_var3

But my question is that mixed would consider both intercept and slope as random while xtrc considers only the intercept as random. Is there way to specify "only intercept to be considered random" using mixed? Please comment thanks.

Thank you Nick for your reply. I was going the wrong way to find a desperate solution to tackle multicollinearity.
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#10

21 Nov 2016, 14:23

Originally posted by cherry singhal View Post

My panel data has firms and years so my "panel_var" is firmID and my "time_var" is year.

Your data suits mixed model. See the help file for mixed before embarking on any analysis. Type

Code:

help mixed

Originally posted by cherry singhal View Post

I just want to consider the level-1 random coefficient for now

As suggested, read the help file first. There is nothing like level-1 random coefficient. At level-1 we estimate the fixed parameters (coefficients) for our independent variables. At upper level, we estimate the random intercepts and slopes. The scope to discuss all these here is very limited. But any good book or the help file should guide you.

Originally posted by cherry singhal View Post

The reason I want random coefficient is that I assume firm heterogeneity; and so coefficients of predictors are non-fixed across firms over time.

Thats right and that is why you need mixed, which will allow you to fit the random slopes of your predictors for the firms.

Originally posted by cherry singhal View Post

I have a question regarding the syntax you provided. I modified the command to use random effects instead of fixed effects (hopefully correctly) -

Code:

mixed outcome || panel_var: ind_var1 ind_var2 ind_var3

Wrong, you need them as fixed first and then as random. The correct code is:

Code:

mixed outcome ind_var1 ind_var2 ind_var3 || panel_var: ind_var1 ind_var2 ind_var3

Having several random slopes may encounter convergence problem. Try one-by-one, see the changes in the results.

Originally posted by cherry singhal View Post

Is there way to specify "only intercept to be considered random" using mixed?

This contradicts with your assumption of heterogeneity. However, if you only want random intercepts, just ignore the random slopes. Mixed will estimate the random intercepts.

Code:

mixed outcome ind_var1 ind_var2 ind_var3 || panel_var:

Above all, I think you need to be clear about the whole subject first. By the way, your mention of 'fixed-effect' has a different preservation which allows only within cluster variation and ignores between. If that is something you want, then that is a different story. Mixed won't do that for you.

Roman
Comment
cherry singhal

Join Date: Nov 2016

Posts: 47
#11

21 Nov 2016, 14:58

Thank you much for the detailed response. You are right. I am learning by doing; do not have a formal background in statistics. I am going to follow your suggestions.

I might be phrasing things incorrectly, but I am aiming to estimate the following equation for firm i and year t with random coefficients for all inputs as well as the intercept -

ln(outcome)_it = (B₀+B_0i) + (B₁+B_1i)*ln(ind_var1)_it + (B₂+B_2i)*ln(ind_var2)_it + U_i + E_{it

Thank you again.}
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#12

21 Nov 2016, 15:19

This is not a fixed-effect equation, therefore, you are fine with mixed. As I said, read the help file and consult a good book. According to the equation, you will need post-estimation commands like, reffects, rfitted after you fit the mixed model.

Roman
Comment
cherry singhal

Join Date: Nov 2016

Posts: 47
#13

21 Nov 2016, 16:28

Thank you much!
Comment

Announcement

Log transformation of a centered variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment