Interaction term non-significant should always be excluded?

Jose Alcides Santos

Join Date: Jul 2017

Posts: 29
#1

Interaction term non-significant should always be excluded?

04 Aug 2017, 09:08

Dear all,

I used a logit model for estimating the health probabilities by gender (and so gender inequality) conditional to the socioeconomic context. I used a binary variable “top”, code 1, equal to privileged jobs, versus the rest of the social structure, code 0. For gender the binary variable was "fem" (female code 1 e male code 0). The sample has more than 86 thousand cases. The syntax was:

svy: logit notgoodhealth top##fem \\ covariates

Clyde Schechter wrote the following comment in a post: “If the interaction term's coefficient is small, and if there is no, or only very weak, a priori reason to believe that the effects of gender depend on education (and vice versa), then you could consider the no-interaction model to be a reasonable, and simpler, way to view the data”.
I estimated the model without an interactive term between top and female because it was statistically non-significant.
This can be seen in the output:
[-----------------------------------------------------------------------------------
| Linearized
notgoodhealth | Coef. Std. Err. t P>|t| [99% Conf. Interval]
------------------+----------------------------------------------------------------
1.top | -.8158712 .0759739 -10.74 0.000 -1.011635 -.6201072
1.fem | .3289449 .0277096 11.87 0.000 .257545 .4003449
|
top#fem |
1 1 | -.1862472 .1216822 -1.53 0.126 -.4997889 .1272945

I ask: Interaction term non-significant should always be excluded?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

04 Aug 2017, 09:51

Jose('?):
it partly depends on the goal of your research.
In case of a submission to a technical journal, I would probably omit the unsignificant interaction.
If my audience were students or people attending to a workshop/congress session, I could choose to keep the unsignificant interaction in just to spur the discussion.
Obviously, these are personal beliefs (and I might well change my mind about them).
That said, in your case I would also ask myself whether the regression model gives a fair and true view of the data generating process.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#3

04 Aug 2017, 10:01

Well, I would definitely not subscribe to a general principle of excludling non-significant interaction terms. I think you need to take it on a case-by-case basis.

In the example you show, the interaction term is not statistically significant. But bear in mind that the statistical power of tests of interaction terms is drawn not from the entire sample size but from the size of the smallest of the intersecting groups it represents. So you have four top X fem groups, and whichever of those has the fewest observations is the one that limits the power of the test. If, say half of your respondents are top and half are fem, independently of each other, than your test is effectively powered by only 1/4th of your sample.

So I prefer to look at the magnitude of the interaction coefficient, particularly relative to those of the constituent effects. Here, the effect of fem when top == 0 is 0.33 (to two decimal plces). But when top = 1, the effect of fem drops to .14. That is over a 50% difference in the effect of fem. Does that matter from a substance/science/practical perspective? If so, I would retain the interaction, notwithstanding its lack of statistical significance. If it doesn't matter (perhaps the effect of fem is not really important in your research question and is included only because of its possible modification of the top effect, or perhaps the effect of fem is important, but for practical purposes .14 and .33 are equivalent.)

Another way to look at it is, that being in the top = 1 fem = 1 category is associated with a reduction in expected log odds of outcome of 0.19. Is that meaningful? How does 0.19 compare to the overall range of log-odds outcomes in the sample? Is this a meaningful difference that you would want to know about if you were using this model for prediction? Also important, how does the difference of 0.19 on the log odds scale translate to the expected probability scale? Depending on where the distribution of predicted outcome probabilities is, a difference of 0.19 log odds could be a very large difference in probability or could be the difference between 0.9990 and 0.9988.

Finally, since this is a logit model, I would be concerned with measures of discrimination (ROC curve area) and calibration (Hosmer-Lemeshow or Pearson) of the model with and without this term.

That's how I would approach the issue. Only if all of the above considerations were toss-ups would I allow statistical significance to weigh heavily in my decision about including this term or not.

Added: Crossed with Carlo's response.
1 like
Comment
Jose Alcides Santos

Join Date: Jul 2017

Posts: 29
#4

04 Aug 2017, 10:49

Carlo and Clyde, thanks you for the comments.
I was speculating about the behavior of the variables that make up the interactive term and the interactive term itself in terms of the coeficient size and/or statistical significance.
Clyde's comment shows that there are several aspects involved.
In substantive terms the results are more internally coherent without the interactive term. Gender inequality in health is at the top and at bottom of the social structure. In addition, in the gender inequality in health among jobs (top and off top) income plays a mediating role and education a (partial) suppressor role. Without the income advantage associated with jobs, gender inequality in health would be lower. Without the educational advantage of women, health inequality would be greater. The mediating role of income supplants the suppressor role of education. The results seem to relate better.
However, with the interactive term (not significant) gender inequality in health becomes non-significant at the top of the privileged jobs…
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#5

04 Aug 2017, 10:55

Jose:
having to do with similar stuff, I was wondering whether a risk of reversal causation exists between individual (not necessary women) health state and income (other things being equal),

Kind regards,
Carlo
(Stata 19.0)
Comment
Jose Alcides Santos

Join Date: Jul 2017

Posts: 29
#6

04 Aug 2017, 11:41

These are the results in probability of not having good health (without the interactive term between top and female):

--------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [99% Conf. Interval]
-------------+----------------------------------------------------------------
top |
0 | .2461312 .0030647 80.31 0.000 .2382372 .2540253
1 | .1247116 .0053904 23.14 0.000 .1108269 .1385964
|
fem |
0 | .2117345 .0031538 67.14 0.000 .2036109 .2198581
1 | .2641054 .0040893 64.58 0.000 .253572 .2746387
|
top#fem |
0 0 | .2227773 .0033394 66.71 0.000 .2141756 .231379
0 1 | .2775324 .0043397 63.95 0.000 .2663542 .2887107
1 0 | .1107141 .0050582 21.89 0.000 .0976849 .1237433
1 1 | .1440833 .006224 23.15 0.000 .1280512 .1601154
------------------------------------------------------------------------------
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5007
#7

04 Aug 2017, 15:51

First off, your output would be much easier to read if you used code tags. See pt. 12 of the FAQ.

You could just as easily say, "Should ANY non-significant term be excluded?" You can't make a global rule like that. A lot of junk terms will increase your standard errors. If the term was pretty dubious in the first place you may want to kill it.

But, sometimes you want to include a term precisely to show that it is NOT significant, e.g people suspect gender is important and your results show it isn't. Also, it is kind of cheating if you test a bunch of terms and you only present the significant ones, maybe making you look at lot smarter when you formed your theory than you really were. If you don't show the insignificant terms in the tables, you might at least mention that they were tested and then dropped because they weren't important.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Jose Alcides Santos

Join Date: Jul 2017
Posts: 29

04 Aug 2017, 17:23

Dear Richard Williams, thanks you for the comments.
I reproduce below a table with three additional models aiming to understand the gender inequality in health by social position.
Gender matters for health especially when looking at relative differences or prevalence ratio.

Code:

 Expect Probability of Not Good Health by Gender and Social Class.
Brazil, 2013.

Code:

  Categories
Prob. Male
Prob. Female
Abs. Dif.*
Rel. Dif. **

Top Jobs
0,111
0,144
0,033
1,297

+ Education
0,157
0,214
0,057
1,363

+ Earnings
0,165
0,184
0,019
1,155

+ Education & Earnings
0,185
0,223
0,038
1,205

Non top Jobs
0,223
0,278
0,055
1,246

+ Education
0,210
0,279
0,069
1,328

+ Earnings
0,229
0,252
0,023
1,100

+ Education & Earnings
0,218
0,261
0,043
1,197


  Notes: + Non-cumulative additional controls * Absolute difference
** Relative Difference

Comment

Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#9

04 Aug 2017, 22:21

I just want to post in support of those saying not to make a general rule. Just because something isn't significant doesn't mean it should be excluded. It should very much be a case-by-case decision.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5007
#10

05 Aug 2017, 05:03

Remember too that there are an endless number of interactions, squared terms, logged terms, etc. that could potentially be in your model. You can't include all of them so there should be some sort of theory or reason for the ones that you do (even if it is just to humor some reviewer).

I always encourage my students to try to present theory/ counter-theory, so their hypotheses don't seem obvious and the results appear to be foregone conclusions. (Also so they don't look stupid if nothing comes out the way they predicted.) That way you can make your results seem interesting no matter how they come out, e.g. you have refuted those claims that there would be an interaction between gender and SES.

Finally, there has been a lot of criticism of the publication process because mostly significant results get printed. So, one study that shows a gender effect may get into print, while 20 others that don't never see the light of day or wind up in lesser journals.

Of course some variables may seem to have no effect because they are flawed measures of a concept, e.g. poorly worded, don't really measure what you want, etc. I might drop those variables but add an explanation about their problems.

Hence, I think insignificant results should at least be acknowledged, and maybe even be included in the tables if they don't drive your other standard errors up too much.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#11

05 Aug 2017, 09:12

I agree with Carlo's, Clyde's and Richard's remarks. They are really food for thought.

I just wish to share a few words.

To start, when a question has "always" or "never", well, there is much room to spot gaps where exceptions could be defended.

Furthermore, when there is much expertise involved, I mean, when skilled professionals deal with the matter, "rules of thumb" are usually given a pass, i.e, other aspects become more relevant.

But we need to "contemplate" the "mean person" , sorry for the pun, let's change it to the "median person".

All in all, in the specific scenario presented in #1, there is much at a stake if we insist including interaction terms in spite of tiny (centered) coefficients, for example, just for the sake of the rationale.

With regards to p-values, "significance" could be somewhat relaxed, so to speak. i.e, there is no reason to "throw away" an interaction term just on account of a p-value = 0.051.

That said, if we include interaction terms only on account of our impressions (instead of the data profile), we may do it out of honesty, but, well, conflicts of interest may bias our decisions.

For example: if getting "non-significant" p value for a given variable is "interesting" or dovetails with the much-desired results, somebody could follow this sort of "modeling", up to the wishful output.

What is more, we'd have thrown a blind eye to a precious principle, the parsimony, and we rely on an excessively complex model, that could entail different sources of costs, be it in the preventive strategies,let alone the difficulty in terms of a comprehensive approach.

Needless to say, interactions terms can be less pleasant to explain to a broad audience. Thanks to - margins - and - marginsplot - this obstacle can be overcome, but to a certain point.

Not to forget, adding (dubious or not strictly necessary) interaction terms would eventually be like adding extra variables, and that may increase the false-positive rate, on one side, and decrease the power, on the other side.

Scylla and Caribdis at once...

On account of these reasons, and just as a general principle, I gather we could avoid including an interaction term, provided: coefficients are irrelevant; p-values are quite large; likelihood ratio shows a non-significant p-value: AIC as well as BIC won't decrease with such interactions.

Below, a toy example, where the "basic" model gave quite reasonable information (in terms of rational) but the "extra" interaction terms "spoiled" it all, since it led to a decrease in power (and 3 of 4 coefficients became non-signficant), plus the lrtest with high p-values, let alone the increase in AIC and BICs.

This, IMHO, is a situation where I would avoid including an interaction term:

Code:

. sysuse auto (1978 Automobile Data) . quiet:logit foreign mpg price gear turn . estimates store model1 . quiet: logit foreign mpg c.price##c.gear turn . estimates store model2 . quiet: logit foreign mpg c.price##c.gear##c.turn . estimates store model3 . lrtest model1 model2 Likelihood-ratio test LR chi2(1) = 0.07 (Assumption: model1 nested in model2) Prob > chi2 = 0.7954 . lrtest model1 model3 Likelihood-ratio test LR chi2(4) = 7.00 (Assumption: model1 nested in model3) Prob > chi2 = 0.1357 . estimates table model1 model2 model3, star drop(_cons) stats(aic bic) -------------------------------------------------------------- Variable | model1 model2 model3 -------------+------------------------------------------------ mpg | -.15018565 -.13949485 -.10603218 price | .00069437* .00018456 .07756841 gear_ratio | 6.7847829** 5.6896461 125.52289 turn | -.75473056* -.73077683* 11.164306 | c.price#| c.gear_ratio | .00017709 -.01721783 | c.price#| c.turn | -.0018575 | c. | gear_ratio#| c.turn | -2.9765685 | c.price#| c. | gear_ratio#| c.turn | .00040598 -------------+------------------------------------------------ aic | 34.411885 36.34465 35.407487 bic | 45.93221 50.169041 56.144073 -------------------------------------------------------------- legend: * p<0.05; ** p<0.01; *** p<0.001

Last edited by Marcos Almeida; 05 Aug 2017, 09:47.

Best regards,

Marcos
Comment
Jose Alcides Santos

Join Date: Jul 2017

Posts: 29
#12

05 Aug 2017, 12:06

Dear Marcos Almeida, thanks for your comments and suggestions.
Note that in this sample there are 3,164 cases of women who have a job at the top and, among those, 528 cases who are not in good health. In this case, I suppose that the first check proposed by Clyde Schechter would not affect the issue of statistical significance: "Whichever of those groups has the fewest observations is the one that limits the power of the test".
The same problem of statistical significance also exists when using the binary variable: college versus no-college.
Following is an illustration:

PHP Code:

--------------------------------------------------------------------------------- | Linearized notgoodhealth | Coef. Std. Err. t P>|t| [99% Conf. Interval] --------------------------------------------------------------------------------- 1.college | -.8396729 .0695279 -12.08 0.000 -1.018827 -.6605186 1.fem | .3776949 .0277782 13.60 0.000 .3061181 .4492717 | college#fem | 1 1 | -.0838936 .0905793 -0.93 0.354 -.3172918 .1495046
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#13

06 Aug 2017, 06:07

Lack of statistically significant interactions (plus tiny coefficients) cannot be taken as a curse. Quite on the contrary, generally speaking.

Had the result (without the interaction term) matched with the (theoretical) expectations, no further discussion would be needed, I fear say.

Perhaps the unwanted results relate to the way the DV was used or measured, the survey setting parameters, the model on which the study question is grounded, etc.

My impression, specifically in this case: to interact or not to interact, this is surely not the question.

Hopefully that helps.

Best regards,

Marcos
Comment

Announcement

Interaction term non-significant should always be excluded?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment