No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • three-way-interaction between two continuous and one binary variable using OLS stata

    Dear all,
    I'm facing difficulties in interpreting a three-way-interaction term. I'm using OLS and the three-way-interaction term is significant. Y is the response variable (continuous), X the predictor (independent) variable (continuous) and Z and W being moderator variables one is continuous and one binary. I would like to plot this interaction as calculating the effect on the dependent variable seems very hard.

    I included all the variables in my regression (additionally there are more control variables included) The three-way-interaction term is significant. However, the other coefficients differ in significance, so that most of them are statistically insignificant.
    Y = b0 + b1X + b2Z + b3W + b4XZ + b5XW + b6ZW + b7XZW
    Does this influence the interpretation of the results?

    I would really appreciate if someone could provide some codes in stata to plot this three-way-interaction term. Unfortunately, I only found information for a three-way interaction term consisting of continuous variables. Thank you!
    Last edited by Katharina Mueller; 04 Jan 2015, 12:45. Reason: three-way

  • #2
    Hi, please change your name by contacting the administrator. The etiquette in this forum is that we use our first and last names . Hit the 'contact us' tab and contact the admin. Read the FAQ section on how to post and that will increase your chance of getting a good reply.

    The insignificant variables should be out of the model as they are not adding anything. But do it with caution. Check whether the variable on its own and significant or not, does it have interaction effect with any other variable or not. If no significance captured, the variable should be out of the model. However, you should not remove the lower order interactions even though insignificant if the higher order interaction is significant.

    For prediction of and visualisation of the interactions "help margins" should help you. Have a go at the "margins" manual. A quick elementary idea can be obtained from this page: . You are not telling us which version of Stata you are using (a prerequisite, FAQ) . If you prefer more help, put exactly the codes you have used and the output Stata gave you. Also provide a small sub-sample of your data. Read on the FAQ on how to copy and paste Stata codes and sample data, and hopefully we can help you out.
    Last edited by Roman Mostazir; 04 Jan 2015, 13:02.


    • #3
      The insignificant variables should be out of the model as they are not adding anything.
      I kind of disagree with this statement. Which variables to include in the model should be guided by economic theory and reasoning, not statistical significance. The same goes for interaction effects. As correctly pointed out, removing the lower order terms even messes up the interpretation of the higher order interactions.

      The given link is a good start. Mitchell (2012) has many examples and I highly recommend this book if you are planing on doing more research with Stata.

      You might also be interested in coefplot (Jann, SSC), especially if you are using a Stata release prior to version 12.


      Mitchell, Michael N. (2012). Interpreting and Visualizing Regression Models Using Stata. College Station ,TX: Stata Press.
      Last edited by daniel klein; 04 Jan 2015, 14:20.


      • #4
        My apology for being restrictive. Daniel is right on theory driven approach. The decision also largely depends on which context the analysis is being undertaken. For example, in clinical trial outcomes, it is a wide practice to exclude the insignificant variables as any underlying assumptions bears high risk. On the other hand, in sociological studies such assumptions may provide interesting indications.


        • #5
          Roman, thanks for getting back at this interesting topic. I totally agree that the choice of variables to include in the model highly depend on the context. However when faced with an argument like

          [...] exclude the insignificant variables as any underlying assumptions bears high risk.
          I wonder whether people are aware of the fact, that excluding a variable from a model (post-hoc) is also based on a strong(er?) assumption. I never really understood what kind of assumptions they are trying not to make, by simply excluding variables.



          • #6
            I think a lot depends how the analysis evolves. If you have a variable largely insignificant, it is obvious it fails to be responsible for any change in y. If it is significant but gets insignificant later on further development of the model, it requires further investigations and theoretical support to be or not to be in the model. Mathematically, I think the bottom line is to base your decision whether the function of 'x' is able to derive any change in y or not.


            • #7
              Just a couple of thoughts that crossed my mind reading this thread. Not exactly on point, but not totally tangential either.

              1. Remember that once you have an interaction term in a model, the main effects terms no longer mean what they mean in a no-interaction model. So if you have, to keep it simple, just Y = b0 + b1X + b2Z + b3X*Z (+ covariates + error), the coefficient of b1 is not the effect of X, and b2 is not the effect of Z on Y. In fact, by using an interaction term you are explicitly denying the existence of any such thing as "the effect of X (resp. Z) on Y." Rather, b1 is the effect of X on Y conditional on Z = 0. Similarly, b2 is the effect of Z on y conditional on X = 0. (Note, by the way, that it follows that if 0 is outside the meaningful range of values of Z (resp. x), then b1 is really quite meaningless altogether.) The effect of X condtional on Z = z is b1 + b2*z. So, the effect(s) of X on Y are not represented by a single term in the model but are distributed between the X and the X*Z terms. Therefore, to decide whether Y is (bi-)linearly associated with X you cannot just test the X term. You must jointly test the X and X*Z terms for that. In particular, eliminating the X term from the model just because it is not statistically significant could be a serious blunder. In fact, if the X and X*Z terms are jointly significant, you probably should not eliminate either of them even if both of them are, on their own, non-significant.

              (To be clear, what I'm saying here in no way contradicts what Daniel has said about eliminating model terms based on non-significance being a bad idea in general. I'm just pointing out that in this particular case, it's an even worse idea.)

              Your situation is even more complicated because you have a three-way interaction, so that the effects of any of the involved variables are spread over even more terms. But the reasoning is analogous.

              2. As for understanding and graphing a three-way interaction, that gets complicated. A three-way interaction has several different interpretations. You could see it as representing how Z modifies the X*W interaction, or how W modifies the X*Z interaction, or as a piece of how W and Z jointly modify the X effect, or several other ways of putting this all together. The best way to interpret this (and correspondingly graph it) really depends on what these variables represent in the world. Some of these interpretations could be nonsensical or confusing in some contexts, and ideal in others.

              So, what I'm saying is that this question is not really a statistical one, it is a science-based question that must be answered based on the meaning of the variables and the model in the world.


              • #8
                It is such an interesting discussion that I just felt I should try to add something.

                With regards to the inclusion or exclusion of variables based on the theorethical assumptions or the statistical significance, I agree with both arguments.

                But I wish to add a third factor: maybe the inclusion or exclusion of such interactions could be "put to a test" by comparing AICs, BICs, R-square, LR and things like that.

                Now changing a little bit the subject.

                Daniel mentioned an excellent book on the matter (Mitchell, Michael N. (2012). Interpreting and Visualizing Regression Models Using Stata. College Station ,TX: Stata Press).

                And I see "M_Kath" describes a three-way interaction, Indeed, according to what was mentioned it is a continuous by continous by categorical interaction.

                There is a thought-provoking note in Michael's book, where the author comments a 3-way interaction between "female", "educ" and "age". The discussion was over the need to include - or not - lower order effects (female#age, female#educ, etc). The answer was: "It is important to include these effects in the model to preserve the interpretation of the female#age#educ interaction. However, there is little to gain by trying to interpret these effects".

                Hopefully that was of some help to the discussion.


                Best regards,