Calculating and plotting interaction effects between two dummies

Ferenc Gulinsky

Join Date: Dec 2019

Posts: 5
#1

Calculating and plotting interaction effects between two dummies

14 Dec 2019, 03:30

Dear Stata-users,

This is my first question and I hope I'm posting the question correctly. I need some advice as to how to calculate and graph two interaction effects between two dummies in nonlinear models in Stata.

This is my basic model:

Code:

logit y a b i.c .i.d i.e, cluster(id)

Now, the model with two interaction terms c_d and c_e between two dummies:

Code:

logit y a b i.c i.d i.e c_d c_e, cluster (id) // created interactions: 1) gen c_d=c*d; 2) gen c_e=c*e

I use this model in an article that I sent to a journal. I was asked by one reviewer to properly graph and discuss the interaction results since:

"interaction effects in non-linear models cannot be understood by looking at the coefficients only (see Ai's and Nortons publications)".

To adress this I calculate the average marginal effects and graph the results as follows ( please bear in mind that I'm new to Stata):

Code:

logit y a b i.c i.d i.e c_d c_e, cluster (id) margins, dydx(*) predict (pr) post coefplot, keep(c_d c_e) xline(0)

My question is whether this is a sound way of doing this, statistically speaking and with regard to my Stata code. My interpretation of this article https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3447245/ is that it's ok to use margins (AMEs) for interaction effects. They also suggest "inteff" command but it's not so intuitive to me and the graphs are hard to read. In general, i find interaction between two dummies hard but it's necessarily for my project.

What is your advice here? My aim is to simply see the interaction effect when c=1 & d=1/e=1 and, no less important, to address the concerns of reviewers.

Sincerely,
Ferenc Gulinsky
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

14 Dec 2019, 10:50

No, you can't do it with your c_d and d_e variables because -margins- has no way of knowing that those are interactions, so it gets everyhthing wrong. You must use factor-variable notation for the interactions:

Code:

logt y a b i.c##I.(d e), cluster(id) margins, dydx(*)

The issue of whether average marginal effects in a logistic model are an appropriate way to express interactions is somewhat controversial. You might want to take a look at other articles published recently in the same journal to see what approach they take.

FWIW, my approach to this is actually not to look at marginal effects, but to look at actual predicted probabilities. I think this is typically the most understandable way to present such things.

Code:

margins c#d#e marginsplot
1 like
Comment
Ferenc Gulinsky

Join Date: Dec 2019

Posts: 5
#3

15 Dec 2019, 08:22

Thank you for your reply, Clyde. It was very useful!

If I choose to go with AMEs, is this OK way to graph the effects of interaction between two dummy variable (see the code below)? Note that I want to show the effects of when c=1 and when d is 0 and 1.

Code:

logit y a b i.c##i.d, cluster (id) margins, dydx(c) at(d=0) at(d=1) post marginsplot

Is it ok to run a dummy variable like this: "dydx(c)"? It seems like Stata understands that the "c" variable is a dummy so when running 'marginsplot' it says "average marginal effects when c=1". But I want to be sure that I do this correctly in Stata.

Best,
Ferenc

Last edited by Ferenc Gulinsky; 15 Dec 2019, 08:27.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

15 Dec 2019, 09:47

Yes, it is correct. You can also simplify the -margins- command slightly to get the same result:

Code:

margins d, dydx(c) post

And yes, it is perfectly fine to calculate -margins, dydx(c)- where c is a dichotomous variable. In fact, because you have marked c as a discrete variable by using the i. prefix in your logit command, Stata will know that it should calculate the marginal effect is the discrete difference between E(y | c = 1) and E(y | c = 0), rather than as d E(y)/dc (which would be correct for a continuous variable but not for a dichotomy). So, yes, you are leveraging the strengths of the -margins- command this way.
1 like
Comment
Ferenc Gulinsky

Join Date: Dec 2019

Posts: 5
#5

15 Dec 2019, 15:14

Thanks again, Clyde. Very helpful!

Last question: based on AMEs, can I say something about the relative strength of effects in the model or do I need to standardize the coefficients? This question does not only relate to the interaction effects but in general. In other words, does the presentation of AMEs in non-linear models allow for interpreting relative effect size?

All the best,
Ferenc Gulinsky
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

15 Dec 2019, 16:05

There is nothing you can say about the relative strength of effects, whether you standardize the coefficients or not. The notion that standardized coefficients can be compared is a commonly held belief, but it is mistaken.
Comment
Ferenc Gulinsky

Join Date: Dec 2019

Posts: 5
#7

16 Dec 2019, 02:45

Dear Clyde: I'm not sure if I follow the logic now.

Do you mean that If I state the effects in odd ratios or AMEs, for example, I can't tell whether the effect size of x1 is bigger than the x2 in the same model? Assuming that: 1) that x1 has higher OR/AMEs than x2, 2) that both x1 and x2 are dummy variables and 3) there are other independent variables in the model, not only dummy variables.

What can I do, then, to say something about the relative influence of independent variables on the dependent variable in logistic regression?

One solution, at least for logistic regression, might be here. For example, Mood (2010) states: "To conclude, AME and APE are not (at least not more than marginally) affected by unobserved hetero- geneity that is unrelated to the independent variables in the model, and can thus be compared across models, groups, samples, years etc." What I am not sure here, is whether I need to standardize the coefficients before calculating AMEs/APEs or if it's fine to do the calculations on unstandardized coefficients..

Mood (2010): https://www.su.se/polopoly_fs/1.3411...%20%281%29.pdf

Last edited by Ferenc Gulinsky; 16 Dec 2019, 03:31.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#8

17 Dec 2019, 13:19

Do you mean that If I state the effects in odd ratios or AMEs, for example, I can't tell whether the effect size of x1 is bigger than the x2 in the same model? Assuming that: 1) that x1 has higher OR/AMEs than x2, 2) that both x1 and x2 are dummy variables and 3) there are other independent variables in the model, not only dummy variables.

You can say that one OR is bigger than another, or that one difference in AME is bigger than another. But it doesn't mean that in the real world the one with the bigger OR (AME diff) has greater impact on the outcome. First of all, one the variables might be something that can be controlled, whereas the other might be immutable. Then there is, more important, the fact that your two dichotomous variables probably have different distributions in the population. If one of those variables is nearly always 1, for example, then even if it has a very high odds ratio associated with it, it has very little real world impact on the outcome because it is nearly a constant and thus accounts for very little of the variation in the outcome.

In other words, the odds ratio or difference in AME is a measure of strength of association between two variables, but that is not the same as the broader concept of how much one variable affects the other in the world. The strength of association is a part of that, but only one part. The other parts are not things that you get from a logistic (or any other) regression.

What can I do, then, to say something about the relative influence of independent variables on the dependent variable in logistic regression?

Nothing. It can't be done and the things you have seen that purport to solve this problem (such as standardized regressions) are just illusory. Influence is, as noted above, a complex construct, of which the strength of association is only one piece.

What I am not sure here, is whether I need to standardize the coefficients before calculating AMEs/APEs or if it's fine to do the calculations on unstandardized coefficients..

Standardizing variables usually accomplishes nothing except obfuscation. The only circumstance where standardization can be helpful is when you are working with a continuous variable that is novel, or at least unfamiliar to the audience you are targeting, so that the notion of a unit change in that variable has no meaning for them. In that case, a one SD change in the variable is no less meaningful than a unit change, and is, in some sense, more familiar. But when you are talking about continuous variables that have conventionally understood scales and units, such as age in years, or height in inches or in meters, or sales volume in dollars, or..... going to standard deviation units just makes things obscure and incomprehensible. Nobody but the person running the analysis will know what one standard deviation of any of those things is. Similarly, for categorical variables, nobody knows what a 1 SD difference in any dichotomous variable means--it's just an artifact of what the probability distribution of that variable is anyway. Whereas everybody understands what a unit change in a dichotomous variable is: it's just the change from one category to its opposite. So standardization here is cryptic to the point of being entirely perverse.

If you have dichotomous predictors and you want to compare their strength of association with an outcome, take them as they are, do not standardize them, and compare their odds ratios. But forget about influence: it is not possible to estimate that.
1 like
Comment
Ferenc Gulinsky

Join Date: Dec 2019

Posts: 5
#9

19 Dec 2019, 07:40

Thanks again, Clyde.

If you have dichotomous predictors and you want to compare their strength of association with an outcome, take them as they are, do not standardize them, and compare their odds ratios. But forget about influence: it is not possible to estimate that.

Would it be possible for you to cite something published that reflects what you stated above? I would be happy to read a bit more about it and indicate this to my reviewers. Thanks!
Comment

Announcement

Calculating and plotting interaction effects between two dummies

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment