Transformation of X

Paul Leeds

Join Date: Apr 2014

Posts: 63
#1

Transformation of X

09 Nov 2014, 18:36

Hi There

I am conducting a research whereby a panel data model is constructed and all continuous control variables have been transformed to logs but NOT the main effect!
The result in this case show highly significant coef for the main effect but when transforming the main effect (X) into log the relationship disappear.
Can I still keep X without transformation despite the fact that all other continuous independent variables are in logs?

Many Thanks
Tags: None
ben earnhart

Join Date: May 2014

Posts: 1027
#2

09 Nov 2014, 19:03

yes. it might be highly inappropriate to log that variable.
Comment
Paul Leeds

Join Date: Apr 2014

Posts: 63
#3

10 Nov 2014, 06:23

Thanks ben
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2149
#4

10 Nov 2014, 16:02

It would help to know something about the variables. For example, if the main variable is measured in years or as a percent, but the others are things like population or monetary variables, it would be pretty standard to leave the main variable in levels but use logs for the others. And what is the dependent variable?

You can always use an R-squared to choose among functional forms when you have the same number of regressors.
Comment
Paul Leeds

Join Date: Apr 2014

Posts: 63
#5

11 Nov 2014, 07:59

Hi Jeff

Thanks for your reply.

The main variable is in years 1-10 (it might be more in reality but this is due to data limitation) that denote the tenure in years.
The other variables are mainly monetary such as profit, return and cash flow, among other variables.
The Dependent variable is an index 1-30 which represent the quality scores of accounting disclosures.

Does that sounds reasonable?

Many Thanks
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#6

11 Nov 2014, 08:40

Check the distribution of the # of years. If it's even remotely normal, then don't log it. The reason for logging is mainly to make for a more normal distribution (can have side-benefits for interpretation), but if it were relatively normal to begin with, logging it actively messes things up. -sum varname, detail- and -hist varname- are two common approaches. If it went from significant to non-significant, then I'd be pretty darn certain r^2 is larger for the non-logged version. Sorry I was so brief with my comment before.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2149
#7

11 Nov 2014, 12:35

I don't see what normality should have to do with deciding to take the log of an explanatory variable. For example, we don't worry that a binary variable is not close to normality. The decision to take the log should have to do with interpretation and also fit. If the key variable is in years I would leave it in its level form without logging. As I suspected, the other variables are logs of monetary values, where taking the log to avoid outliers is sensible. I can also understand why you are using a linear model for the ordinal outcome, especially when it ranges over 30 (or so) values. Ordered probit or logit is an alternative, but the partial effects are harder to obtain.
Comment
Paul Leeds

Join Date: Apr 2014

Posts: 63
#8

11 Nov 2014, 14:12

Prof Jeff

Thanks a lot for your reply.

Yes, my main concern was that outliers would disturb the model.
Does winsorisation recommended beside transformation to remedy the effect of outliers? and should we winsor variables before transforming it or it does not matter?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35656
#9

11 Nov 2014, 15:20

Paul: It's not especially clear what you're asking but I doubt that any two experienced users of statistics agree exactly on when they would use transformation.

Winsorization as a universal remedy for awkward data points seems to me oversold and over-practised, and it is a little embarrassing to me that I wrote a Stata program that makes it easier. (Someone asked for one ages ago.)

But if logarithmic transformation makes sense you should not need or use Winsorization too. If logarithmic transformation is not needed Winsorization rarely makes sense at all. It's easy to think of occasional exceptions to any rule, however. The bigger point is not to think that data analysis is like chess or football in which rules are universally agreed, or at least someone is officially designated to declare foul play.

(On a more detailed point Winsorization is surely also a kind of transformation, although not a reversible one without access to the original data.)
Comment
Paul Leeds

Join Date: Apr 2014

Posts: 63
#10

11 Nov 2014, 15:50

Hi Nick,

As a person coming from qualitative research background I would say I am only expecting answers of "good practice" or "commonly used methods of data handling and analysis"
This would only be the starting point to me and many other newcomers like me.
I asked this because I came across many journal articles that seem to employ winsorization and transformation for continuous variables.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3449
#11

12 Nov 2014, 01:58

A log transformation for years might make sense if you had more years in your data. By log transforming you say that a year more tenure has less influence when you move from say 29 to 30 years of tenure compared with going from say 1 to 2 years of tenure. I can see why something like that could be plausible, and worth investigating. However, if you have less years, that flatting the curve may not be visible yet, and using years untransformed would be just fine. If I were to guess, I would put 10 years would be somewhere in the middle: you might see some flatting of the curve already, or you might not. If I understand you correctly, you did not find that flattening, which is a plausible emprical finding.

If the range of a variable is by design limited, as in your case years of tenure, then outliers tend to be less of a problem. I would still check, but if I found outliers in such a variable, that would surprise me greatly. So much so, that my first assumption would be that I performed the test incorrectly, rather than that there are outliers. So I would not play around with winsorizing such variables before I checked, double checked and triple checked that there are actually outliers. Even then I would probably not use winsorize. Instead I would try to understand why these outliers exist and try to use that information in the model.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35656
#12

12 Nov 2014, 03:28

Paul: Thanks for the clarification. Your wording suggested to me that you were seeking a unanimous or standard view, which doesn't really exist on this topic, or so I would assert. If your background is qualitative, that won't be new to you.

Winsorization is especially curious. There are evidently fields in which it is recommended. Posts in this forum often state or imply that the poster regards it as uncontroversial, but I don't know of a methods text that recommends it. This could easily reflect the literature I know and what I can remember. I can readily buy Winsorizing as one way of getting a resistant or robust mean, but that's not the same story.

I agree with Jeff Wooldridge's emphasis. The main reason for taking logarithms in a regression context is because the functional form implied is more suitable for the data. Sometimes there is theory which implies that we should be taking logarithms, although if you lift the stone, the theory typically grows out of some previous recognition that the functional form is more suitable for the data. Marginal normality is never an assumption in regression.
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#13

13 Nov 2014, 10:20

Perhaps risking commenting outside my field, I think folks in finance tend to winsorize data.

As noted above, you might want to transform a variable when it makes more sense (e.g., your theory says) in the transform than the non-transform. I don't think this is controversial. In many cases the difference between one and two has a greater influence on outcomes than the difference between 50 and 51 (e.g., in age or wealth).

However, there is real debate about transforming to handle either highly skewed variables or outliers. In many fields, highly skewed variables like firm size are routinely log transformed. This might fall under the condition noted in the previous paragraph (e.g., difference between 1000 and 5000 population may be more meaningful than the difference between 100,000 and 105,000 population), but I'm not sure that is actually stated very often.

There are also many other ways to handle outliers including various guidelines used to identify unusual observations which are then deleted (e.g., rules based on Cook's D, leverage, etc.) and estimators that are relatively insensitive to outliers.

My, perhaps outdated, understanding of the econometrics is that we don't have a strong consensus on how to handle outliers.
Comment
Paul Leeds

Join Date: Apr 2014

Posts: 63
#14

04 Dec 2014, 18:26

Thanks everyone for your contributions..
Comment

Announcement

Transformation of X

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment