Insignificant interaction term

Lan Chu

Join Date: Aug 2018

Posts: 10
#1

Insignificant interaction term

04 Aug 2018, 09:52

Dear Statalist,

I have a question that is not so much about Stata command but rather about statistics in general.

I am working on comparing the treatment effect of an intervention on women's empowerment in Uganda and Tanzania. The intervention is exactly the same. In order to do so, I run a regression model in which I include a country dummy variable (1 for Tanzania and 0 for Uganda) and an interaction term between country and treatment in order to capture the heterogeneity of the treatment effect.

The output seems to be strange for me. Here is the treatment effect in Tanzania (when i run the separate regression for each country):

And below is the treatment effect in Uganda:

From the output I can say that the intervention does not have impact on share of time on reproductive and productive work in Uganda but there is significant impact in Tanzania. When I include the interaction term to see the heterogeneity of treatment effect, this is what get:

My question is:

The p-value of interaction term for productive work is not significant. What could I conclude from this?Does this mean that there is no heterogeneity in the treatment effect between the 2 countries? If I look at the separate model, there should be stronger impact in Tanzania than in Uganda?

Thank you very much !

Lan.

Attached Files

Last edited by Lan Chu; 04 Aug 2018, 10:30.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

04 Aug 2018, 10:27

The difference between a statistically significant effect and a non-statistically significant effect is not, itself, necessarily statistically significant. In fact, the observation that an effect is statistically significant in one subset of the data and not in another is usually quite meaningless altogether. Be that as it may, if you want to make inferences about the difference between Uganda and Tanzania, you must do so based on the interaction terms. Ignore the p-values associated with the effects in Uganda and Tanzania separately. They are a distraction.
Comment
Lan Chu

Join Date: Aug 2018

Posts: 10
#3

04 Aug 2018, 10:45

Dear Clyde,

Thanks a lot for your reply.

Ok, I understand that I need to make conclusion based on the interaction term. In this case, can I conclude that there is no difference in the treatment effect between the 2 countries in term of time share over productive work? In case of reproductive work, the beta coefficient of interaction term is negative and significant, meaning that the effect on timeshare over reproductive work in Tanzania is smaller than in Uganda.

Is that what I should conclude? and what could be the reason for a insignificant interaction term, assuming that the sample size is big enouh

Thank you so much!

Last edited by Lan Chu; 04 Aug 2018, 10:55.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#4

04 Aug 2018, 11:09

In this case, can I conclude that there is no difference in the treatment effect between the 2 countries in term of time share over productive work?

No, that is not a correct interpretation of the statistical significance. When you have a non-statistically significant result it does not imply that there is "no difference." It implies that the difference is too small relative to the noisiness of the data to conclude whether it is positive, negative, or zero. The difference could be quite large, but your data simply lack the precision to affirm that. Or it could be that there really is no difference. (However, there actually being no difference at all is very unusual in the real world.) So the conclusion would be that the data do not have sufficient information to tell us whether the intervention effects in Uganda and Tanzania are different, nor in which direction the difference goes if there is one. Better data are needed to draw a more specific conclusion.

Better data refers not just to sample size, but to data quality. If the outcome measure is noisy and poorly reproducible, that reduces statistical power. If the outcome measure varies widely among the people, that, too, decreases statistical power. So an improved measure of the outcome is often the key. Other approaches to increasing statistical power rely on more precise study designs: for example matched-pair designs, or within-person designs can greatly reduce outcome variance from extraneous factors. (Evidently a within-person design is not possible when the goal is to compare different countries--I'm just speaking in general terms here.) It is sometimes possible to do stratified sampling in a way that reduces outcome variance (which then requires an analysis that reflects the stratified design.)
Comment
Lan Chu

Join Date: Aug 2018

Posts: 10
#5

04 Aug 2018, 12:39

Thank you so much for your time and great answer, Clyde!

Regards,

Lan.
Comment
Lan Chu

Join Date: Aug 2018

Posts: 10
#6

13 Aug 2018, 04:33

Dear Clyde,

I would need to get back to this topic for one further question. If I still want to make a conclusion on the difference of treatment effect ( taking into account that i can do nothing more to improve data), can I calculate the effect size for the interaction term (based on regression coefficient and standard errors of interaction term) to draw a conclusion. Perhaps I may see that the effect size is too small (let say < 0.001) to care about. Does that even make any sense? Thanks a lot !
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#7

13 Aug 2018, 08:48

Well, it definitely makes a great deal of sense to talk about whether the difference in effect across the countries is large enough to care about. But the statistic you are proposing, coefficient/standard error, is not the right one to do that. And it is not an effect size.

You may be confusing it with Cohen's d, which is the coefficient (or, in this case, difference between coefficients) divided by a standard deviation of the outcome variable. There is some unclarity about exactly which set of observations to use in calculating the standard deviation for Cohen's d. In the classical version, as presented by Cohen himself, way back when, it was the standard deviation calculated in the control group. Sometimes, however, we see variations of that in which the pooled standard deviation is used. If you are dealing with longitudinal data, it also get confusing as to whether to look at between- or within- standard deviation, etc.
Comment
Lan Chu

Join Date: Aug 2018

Posts: 10
#8

13 Aug 2018, 11:21

Yes, you are very right. i was talking about the effect size cohen's d. I myself also use the pooled standard deviation when calculating the effect size but dont really understand what is the difference when using the standard deviation of the control group.

Anw, I do a quick search and find that I can use equivalence test, like two one-side test (TOST), with an upper and lower equivalence bound based on the smallest effect size of interest in which I select myself. This test can be used to statistically reject the presence of a large enough difference. Does this also work in my case? And can you please suggest me what can be the best value of an effect size which is too small to care about.

Thank you so much! Really appreciate your answers.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#9

13 Aug 2018, 12:23

As for re-analyzing the data using an equivalence test, you can do that if you like. I am not familiar with the -tost- package, so I cannot be certain it will do what you want, nor can I help you with it if you encounter difficulties. But from a quick read of its help file it sounds like it might be what you want, with appropriately selected equivalence bounds.

Personally, for this kind of thing I usually take a simple approach. I just look at the upper and lower confidence limits for the interaction coefficient, pick the one that is larger in absolute value, and if that absolute value isn't large enough to care about, I assert that the data do not suggest any materially important difference in effects between the two countries.

And can you please suggest me what can be the best value of an effect size which is too small to care about.

I'm afraid I can't. That is neither a Stata nor a statistical issue. It is a substantive question in your scientific discipline. If you do not feel confident deciding on your own how much of a difference in your empowerment measure (that is your outcome, correct?) is large enough to care about*, then I suggest you either see what others have done previously in studies using the same outcome measure, or consult with colleagues in your discipline. I haven't worked with this kind of construct in about 3 decades now, and the one empowerment measure I ever did work with was from a study carried out in modern (ca. 1980) US corporate workplaces, so what little experience I have with this would be of no use in your setting.

*This is, of course, a subjective decision. What could be more subjective than what one cares about? Even so, within a scientific discipline, there is often fairly general agreement about what overall size differences in effects are large enough to care about. Forming these judgments relies on an understanding of the outcome measures themselves and an appraisal of the implications for study participants are of different values of the outcome measures for their life experiences.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4457
#10

13 Aug 2018, 12:44

just want to second Clyde's suggestion that CI's be used instead of a test; I am not familiar with -tost-, but years ago I wrote my own equivalence test programs (-equim- and -equip- in STB 17; do not download them as there are bugs which I will not be fixing) but turned from them to using CI's; I especially like CI's where there is doubt about the appropriate delta
Comment
Lan Chu

Join Date: Aug 2018

Posts: 10
#11

13 Aug 2018, 14:56

Thank you very much Clyde and Rich! I think I know what I should do now !
Comment

Announcement

Insignificant interaction term

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment