Running multiple regression models on different combinations of explanatory variables

sam khanna started a topic Running multiple regression models on different combinations of explanatory variables

09 Oct 2019, 21:27
Running multiple regression models on different combinations of explanatory variables

Hi,

Before I post my question, I would like to mention that I already looked through the forums here on using for loops to run multiple regression models (https://www.statalist.org/forums/for...-the-same-time, https://www.statalist.org/forums/for...le-regressions, https://www.statalist.org/forums/for...ssions-at-once). Unfortunately, the existing discussion posts don't quite answer my question.

My dependent variable is A. My independent variables are B,C,D,E,F,G,H. All these variables have been log transformed as well. So, I have ln_A and ln_B - ln_H. Now, I need to try different combinations of logged and non-logged variables and run multiple regressions. For instance,

Model 1: reg A ln_B ln_C D E F G ln_H
Model 2: reg ln_A B ln_C D ln_E F ln_G ln_H

and so on...

As you can tell, there are far too many combinations possible (P_8,8 precisely?). I need to run at least a few different specifications. The reason for doing this exercise is to make sure that the polarity of the coefficients is not an artifact of the log-transformation. Is doing it manually the only way or is there a way to automate this? Please help!

Thanks in advance!

Sam

Last edited by sam khanna; 09 Oct 2019, 21:30. Reason: added tags
Tags: foreach, regression
sam khanna replied

11 Oct 2019, 15:22
Thanks Bruce! That helped a lot. I have 7,633 observations and the residual plots have a normal distribution. So, I think that answers my question!
Leave a comment:
Bruce Weaver replied

10 Oct 2019, 12:15
Hello Sam. I have two questions:
How large is your sample?

Have you looked at residual plots from a model that uses the original variables only?

Here's what motivates those questions. For the F- and t-tests from OLS regression to be reasonably valid, the sampling distributions of the coefficients need to be approximately normal. A sufficient condition for approximate normality of those sampling distributions is (approximate) normality of the errors (where error = deviation of an actual Y-value from the true regression expression in the population). But as Jeff Wooldridge says in his well-known econometrics textbook, as n increases, normality of the errors becomes less important.* As n increases, the sampling distributions of the coefficients converge on the normal distribution, even if the errors are not normal. (That's why I described normality of the errors as a sufficient rather than a necessary condition.)

Putting it all together, if your sample size is quite large, or if residual plots suggest that the errors are approximately normal, you probably don't need to bother with log transformations.

* I put together a few slides summarizing what Jeff says about the assumptions for OLS regression. You can view them here.

HTH.
1 like
Leave a comment:
sam khanna replied

10 Oct 2019, 11:36
Hi Joseph, thank you for your response. You raise great questions. The reason why I log transformed the variables in the first place was to reduce their skewness. I selected the variables for transformation based on their histograms. Log transforming all the variables and running the regressions produces results that make sense to me. However, I am concerned this process may not qualify as "robust". So, I decided it may be best to run different model specifications and check if the qualitative relationship between the independent and dependent variables remains the same. Does that make sense?
Leave a comment:
Mike Lacy replied

10 Oct 2019, 08:39
A command of possible use here, pun intended, is -allpossible-. See -ssc describe allpossible-.
2 likes
Leave a comment:
Joseph Coveney replied

10 Oct 2019, 04:27
I can't find it now, but there was once a user-written command for exhaustive specification searches. It's ancient (1990s, maybe), and its name was something like -all-, or started or ended with "all", as I recall.

But maybe you can help me out on a couple of questions that your post raises in my mind.

First, logarithmic transformations are monotonic. I wouldn't have expected any volatility in coefficient sign, if that's what you mean by "polarity of the coefficients".

Second, even if the polarity of the coefficients does change, why would you consider such a phenomenon to be an "artifact of the log transformation", or anything to worry about at all? If the research question calls for logarithmic transformation, then the coefficients you get are what they are, what they are supposed to be.
Leave a comment:

Announcement

Running multiple regression models on different combinations of explanatory variables

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: