You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
How to interpret regression results if dependent variable is logged and independent variables are standardized?
In my fixed effects panel regression analysis the DV (Triadic Patent Families) enters in logarithmic form. To counteract very small coefficients because of different scales in my IVs and enable the comparison of relative importance of the individual IVs, these enter in standardized form.
Is this combination possible (DV log; IVs standardized)?
How do I interpret the results?
Yes it is possible. You've done it. The question is whether anyone will ever be able to understand what it means.
The interpretation of, for example, the result for zIPRT2 is that a 1 standard deviation difference in values of zIPRT2 is associated with a -.134033 difference in values of logTPF. Since exp(-.134033) = 0.8745612, this could also be stated as a 1 standard deviation difference in zIPRT2 is associated with multiplying TPF by a factor of 0.875 (to 3 decimal places), which would also sometimes be referred to as a 12.5% decrease in TPF. You can perform similar calculations for all of the coefficients.
So the question becomes, will anybody have any idea what a 1 standard deviation difference in zIPRT2 means? Probably not, except possibly you because you will have worked with the data and have some sense of it. The use of standardized variables is usually a bad idea for just this reason: the results are just mystifying. If you take the position, as I do, that the purpose of statistical analysis is to clarify our understanding of the data, standardized variables are counterproductive--they obfuscate the results.
By the way, the notion that you can "enable the comparison of relative importance of the individual IVs" by using standardized variables is, in general, not true. It is true, in a highly restricted and typically not useful sense if the IV's in question all have the same distributions except for their scale factors and centers. But that, in turn, is rarely true in the real world. So usually standardization just muddies the waters and accomplishes nothing useful in return.
Thank you very very much for your response Clyde! I am always fascinated by your quick and still elaborate explanations!! That's great!
Actually, from what I have read so far, I would interpret the results in a different way.
Taking into consideration zEdSharet2 as another example, I would state:
the coefficient implies that a one unit increase (1 unit = 1 SD) in the standardized variable (zEdShare) or a 1.241 unit increase in EdShare (unstandardized), respectively, is associated with a 28.1 percent (100*0.281%) increase in TPF-ENVTECH.
No, that's wrong. Many people will interpret it as you have, because there is wide spread misunderstanding about interpreting coefficients in regressions with log-transformed outcomes. Don't follow the lemmings into the sea.
The coefficient shown for that variable in #1 is, to three decimal places, 0.281. So a 1 SD increase in EShare (= 1.24 increase in EShare in natural units) is associated with a 0.281 increase in lnTPF. An increase of 0.281 increase in lnTPF is the same ting as a multiplication of TPF by exp(0.281) = 1.324, again to 3 decimal places. So you get a 32.4% increase in TPF, not a 28.1% increase. There's nothing complicated here: this is high school algebra.
You were probably taught somewhere along the way, that when you have a logged outcome variable, the coefficient translates directly into a percentage change in the outcome, so that a coefficient of 0.281 would be a 28.1% increase. Most of us were given that disinformation at some point in our training. But it is only approximately true, and the approximation is a good one only if the coefficient is less than 0.1 in magnitude, or, if you're willing to accept a little sloppiness maybe up to 0.15. But when you have a coefficient as large as 0.281, as you can see here, that approximation is very substantially off. Don't use it, except perhaps with the three variables in your output where the coefficient is, in magnitude, less than 0.1.
Comment