You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
Interaction between variables changes the results fundamentally!
Dear All,
I would appreciate your help on the following please:
The correlation between y and x1 x2 is negative but between y and the interaction between x1 and x2 is positive, that's strange! Could somebody explain this to me, please?
There is nothing unusual or surprising about this kind of result. It happens often.
Let's review how interactions work and what they mean. The key thing to remember is that in an interaction model, there is no such thing as "the effect of x1 on y," nor "the effect of x2 on y." Rather, in an interaction model, there are infinitely many different effects of x1 on y, and those effects depend on the value of x2. (And vice versa.)
The coefficient of the interaction term between two continuous variables represents how much the y:x1 slope changes per unit change in x2 (or, equivalently, how much the y:x2 slop changes per unit change in x1.) The coefficient of x1 in this kind of model is the slope of the y:x1 relationship conditional on x2 = 0, and the coefficient of x2 is the slope of the y:x2 relationship conditional on x1 = 0.
So when the coefficients of x1 and x2 are both positive and the coefficient of c.x1#c.x2 is negative it means:
1. When x2 = 0, y is an increasing function of x1 (i.e. the y:x1 slope is positive).
2. As we look at larger values of x2, the y:x1 slope gets smaller. If x2 becomes large enough, the y:x1 slope will eventually cross zero and turn negative (although that might happen beyond the range of values of x2 in the data).
3. When x1 = 0, the y:x2 slope is positive. As we look at larger values of x1, the y:x2 slope gets smaller. If x1 becomes large enough, the y:x1 slope will eventually cross zero and turn negative (although that might happen beyhond the range of values of x2 in the data).
As for your post's title "Interaction between variables changes the results fundamentally," all I can say is, of course! The whole point of an interaction term is to provide a more fine-grained look at the relationships in the data and tease out differences that are not apparent in a non-interaction model. If including the interaction term doesn't change anything, then there is no reason to even keep it in the model.
Thanks a lot, Clyde, that was such an elegant explanation! So I am I am writing my first economics paper on the effect of oil rents (%GDP) on social capital (social trust like in the question: Can people be trusted?) and I should investigate whether the corruption matter when added to the analysis? Based on your explanation above, can I assume that oil rents have a negative effect on social capital (trust), corruption has a negative effect on social capital, but the interaction of those two has a positive effect on the social capital? This seems counter-intuitive, and that's why I am not sure what to make of the result? I spent a long time double checking my panel data both with software and manually, so I am pretty sure that the data is good, I declared the data to be a panel and ran the xtreg y x1 x2 x1*x2, fe r and that was the result I got. Am I doing something wrong?
I also would like to ask your advice please on what would be the best model for my panel data analysis, how can I make my analysis stands out? And if you have a reference(s) that you would recommend regarding that type of analysis?
Thanks a lot for your help I appreciate it!
Regards
Moh'd
Based on your explanation above, can I assume that oil rents have a negative effect on social capital (trust), corruption has a negative effect on social capital, but the interaction of those two has a positive effect on the social capital?
No. None of these conclusions is correct, nor even meaningful. Let's go over how interactions work in the specific context of your results.
When corruption is 0 (which, I imagine, doesn't actually occur in the data, but is a theoretical possibility) the effect of oil rents on social trust is negative: a unit incraese in oil rents is associated with a 0.08 decrease in social trust in this situation. But that only holds when corruption is 0. I don't know on what scale your corruption variable is measured. But, just to illustrate, consider now a situation where corruption = 1. Then in that case, the association between oil rents and social trust is still negative: it is -0.0812094 + 0.0672 = -.01409094, but notice that it is much less negative than in the corruption = 0 case. Now consider another situation, where corruption = 2. Now the slope of the social trust:oil rents relationship is -0.0812094 + 2*0.0672 = +0.05310906. So at this level of corruption, we see that social trust has a positive association with oil rents.
Now, I don't know what the scale of your corruption measure is. Perhaps it only ranges between 0 and 1, so that the corruption = 2 scenario is not even theoretically possible. Perhaps even within the 0 and 1 range, actually observed values are in a more restricted range. Or maybe it runs from 1 to 10, in which case the relationship between oil rents and social trust is just barely negative even at the low end of corruption and becomes massively positive (-0.08129094 + 10*0.0672 = +0.59070906) at the high end. So you need to calculate these social trust:oil rent slopes at a representative range of values of your corruption variable. The -margins- command will help you do that. To illustrate the command, let me assume that your corruption variable ranges from 0 to 10. Then you can follow your regression with:
The output of -margins- will be a table of the various marginal effects of oil rents on social trust at corr = 0, 2, 4, 6, 8, and 10. And the -marginsplot- command will put this all on a nice graph showing the initially negative marginal effect when corr = 0 and how the marginal effect increases as corr increases. I think if you do that and spend some time pondering those results, you will have a better understanding of what your data are telling you.
I spent a long time double checking my panel data both with software and manually, so I am pretty sure that the data is good,
Excellent! This is always a good idea, and one that too few people do.
I also would like to ask your advice please on what would be the best model for my panel data analysis, how can I make my analysis stands out? And if you have a reference(s) that you would recommend regarding that type of analysis?
I can tell you that you are correctly using Stata to implement your model, though you have not been interpreting the output correctly. But whether this model is appropriate, or the choice of other models to study this, is a science question within your discipline, and not one that I can advise you about. For that, you need to consult colleagues in your own discipline.
Thank you very much Clyde. I appreciate your help!
*In case you are interested to know, the control for corruption measure is based on the "world governance indicators" and it's from -2.5 to 2.5. If we want to calculate the corruption, we multiply the number from the control of corruption by minus one "gen corr = ccorr*-1. https://info.worldbank.org/governance/wgi/#home
Comment