Tests with interaction variable

Dario Maimone Ansaldo Patti

Join Date: Aug 2014

Posts: 505
#1

Tests with interaction variable

18 Sep 2018, 11:09

Dear All,

I have a question, which is not strictly related to the usage of Stata. Suppose I have a model like the following:

y=c+a1x1+a2(x1)^2+error term

I want to study whether a non-linear impact of x1 on y exists. This is not a problem, of course. The problem arises if I suspect that non-linearity emerges because of a second variable, which "mediates" the impact of x on y. Hence I think of estimating the following:

y=c+a1x1+a2(x1)^2+b1m1+b2x1*m1+b3(x1)^2*m1+error term

If I rearrange the above term, I get:

y=c+[a1+b2*m1]*x1+[a2+b3*m1]*(x1)^2+error term

m1 is a continuous variable.

Suppose that I want to test the significance of [a2+b3*m1] to see if the quadratic term remains significant. The problem is that m1 is itself a variable. Hence the significance of the tests changes according to its values.

How can I ran the test? I thought of using lincom to get confidence interval too. But how I can deal with the problem of the variable m1 in the test? Should I take the mean of it (I am not sure about this)? Or should I run the test at the min and max values of m1?

Thanks in advance for your help.
Tags: lincom
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#2

18 Sep 2018, 13:25

So, let's draw a little on your good understanding of how interaction terms work.

The coefficient of x1^2 depends on m1: as you have seen it is a2 + b3*m1. Now, unless you have b3 = 0, there is always a solution to the equation a2 + b3*m1 = 0, namely m1 = -a2/b3. So for that value of m1, the coefficient of x1^2 will necessarily be 0. And a zero coefficient is necessarily not statistically significant. It will also be true that for values of m1 near -a2/b3, that coefficient will not be statistically significant. So the question is whether or not the actual realistic values of m1 in the world include, or are close to the value -a2/b3.

If -a2/b3 is far from the range of values of m1 that actually happen in the world, then there is no issue of the coefficient of x1^2 being close to zero. If they do, then the range of values of m1 you have to be concerned with include some where the coefficient of x1^2 is negligible, and if the range extends far away from -a2/b3, it also includes values of m1 where the coefficient of x1^2 is large enough that the x1^2 term contributes meaningfully to your model. I suppose you could, if you really wanted to, use two separate models for these regions of m1, one which is only linear in x1, and the other containing the quadratic. But why bother? After all, in the regions of m1 where the coefficient of x1^2 is close to zero, including the quadratic term doesn't hurt you in any way: it becomes a nearly-zero correction to the model.

So the point is, that the question you are asking here is one that should not be asked because its answer cannot lead you to do anything useful. The significance of the coefficient of x1^2 at any particular value of m1, or averaged over all values of m1, is simply not a reasonable basis for deciding whether to include a quadratic term in the model or not.

Rather you should base that decision first on whether there is a reasonable scientific basis for expecting this kind of non-linearity. If so, fit the model with a quadratic. If you really feel some need to "test" whether the quadratic is needed, I would say look at the most extreme value of the coefficient of x1^2 that realistic values of m1 afford you. If any of those extreme values leads to a contribution of the quadratic term that is substantively meaningful (i.e. not, in practical terms, negligible) then I would retain the quadratic.

If you insist on having a formal statistical significance test (which I strongly discourage) you can do a test on all of the terms in the model that incorporate x1^2. So, assuming your regression looks like

Code:

regress y c.m1##c.x1##c.x1

you can run

Code:

test x1#x1 m1#x1#x1

and if that comes out non-significant you could omit the quadratics from the model. Personally, I dislike using statistical significance tests to select models, but if you insist on it, this is how you would do it.

Note: If you are not familiar with the ## notation in the above code, read -help fvvarlist-.

Finally, from the pedantry corner: your model involves moderation by m1, not mediation. Mediation is something entirely different, that does not involve interaction terms, and is best analyzed using structural equations modeling.
Comment
Dario Maimone Ansaldo Patti

Join Date: Aug 2014

Posts: 505
#3

20 Sep 2018, 13:58

Thanks Clyde
Comment

Announcement

Tests with interaction variable

Comment

Comment