No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running multiple regression models on different combinations of explanatory variables


    Before I post my question, I would like to mention that I already looked through the forums here on using for loops to run multiple regression models (,, Unfortunately, the existing discussion posts don't quite answer my question.

    My dependent variable is A. My independent variables are B,C,D,E,F,G,H. All these variables have been log transformed as well. So, I have ln_A and ln_B - ln_H. Now, I need to try different combinations of logged and non-logged variables and run multiple regressions. For instance,

    Model 1: reg A ln_B ln_C D E F G ln_H
    Model 2: reg ln_A B ln_C D ln_E F ln_G ln_H

    and so on...

    As you can tell, there are far too many combinations possible (P8,8 precisely?). I need to run at least a few different specifications. The reason for doing this exercise is to make sure that the polarity of the coefficients is not an artifact of the log-transformation. Is doing it manually the only way or is there a way to automate this? Please help!

    Thanks in advance!

    Last edited by sam khanna; 09 Oct 2019, 21:30. Reason: added tags

  • #2
    I can't find it now, but there was once a user-written command for exhaustive specification searches. It's ancient (1990s, maybe), and its name was something like -all-, or started or ended with "all", as I recall.

    But maybe you can help me out on a couple of questions that your post raises in my mind.

    First, logarithmic transformations are monotonic. I wouldn't have expected any volatility in coefficient sign, if that's what you mean by "polarity of the coefficients".

    Second, even if the polarity of the coefficients does change, why would you consider such a phenomenon to be an "artifact of the log transformation", or anything to worry about at all? If the research question calls for logarithmic transformation, then the coefficients you get are what they are, what they are supposed to be.


    • #3
      A command of possible use here, pun intended, is -allpossible-. See -ssc describe allpossible-.


      • #4
        Hi Joseph, thank you for your response. You raise great questions. The reason why I log transformed the variables in the first place was to reduce their skewness. I selected the variables for transformation based on their histograms. Log transforming all the variables and running the regressions produces results that make sense to me. However, I am concerned this process may not qualify as "robust". So, I decided it may be best to run different model specifications and check if the qualitative relationship between the independent and dependent variables remains the same. Does that make sense?


        • #5
          Hello Sam. I have two questions:
          1. How large is your sample?
          2. Have you looked at residual plots from a model that uses the original variables only?
          Here's what motivates those questions. For the F- and t-tests from OLS regression to be reasonably valid, the sampling distributions of the coefficients need to be approximately normal. A sufficient condition for approximate normality of those sampling distributions is (approximate) normality of the errors (where error = deviation of an actual Y-value from the true regression expression in the population). But as Jeff Wooldridge says in his well-known econometrics textbook, as n increases, normality of the errors becomes less important.* As n increases, the sampling distributions of the coefficients converge on the normal distribution, even if the errors are not normal. (That's why I described normality of the errors as a sufficient rather than a necessary condition.)

          Putting it all together, if your sample size is quite large, or if residual plots suggest that the errors are approximately normal, you probably don't need to bother with log transformations.

          * I put together a few slides summarizing what Jeff says about the assumptions for OLS regression. You can view them here.

          Bruce Weaver
          Stata version: 16.1 IC (Windows)


          • #6
            Thanks Bruce! That helped a lot. I have 7,633 observations and the residual plots have a normal distribution. So, I think that answers my question!