Interaction terms (# vs ##) in linear regression and factorial ANOVA

Alan Jeddi

Join Date: Jun 2018

Posts: 42
#1

Interaction terms (# vs ##) in linear regression and factorial ANOVA

06 Jul 2018, 17:22

Hi All,

I am trying to assess the effect of my IV (h_score) on my DV (lop_score), but I want to see if sex is an effect modifier of this association.

IV is continuous
DV is continuous
Sex (0 = male, 1 = female)

What I have done so far is the following code:

regress lop_score h_score
regress iop_score h_score if sex==0
regress iop_score h_score if sex==1

Which gives me the crude lop_score, as well as the lop_score for each sex.

Then, I realized the slopes and the coefficient are different between sexes

So I then assessed formally for an interaction by running:

regress lop_score h_score lop_score#c.h_score

--------------------------------------------------------------------------
lop_score | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
h_score | .3087371 .074398 4.15 0.000 .1628855 .4545888
sex | -.4225724 .1000699 -4.22 0.000 -.6187519 -.2263929
|
sex#c.h_score |
1 | -.0682006 .0999594 -0.68 0.495 -.2641637 .1277624
|
_cons | 17.08114 .0748511 228.20 0.000 16.9344 17.22788
------------------------------------------------------------------------------

Then I did a Factorial ANOVA:

anova lop_score##c.h_score

Number of obs = 5,146 R-squared = 0.0075
Root MSE = 3.45848 Adj R-squared = 0.0069

Source | Partial SS df MS F Prob>F
-----------+----------------------------------------------------
Model | 466.29203 3 155.43068 12.99 0.0000
|
h_score | 361.16054 1 361.16054 30.19 0.0000
sex | 213.28778 1 213.28778 17.83 0.0000
sex#h_score| 5.5680037 1 5.5680037 0.47 0.4951
|
Residual | 61503.882 5,142 11.961082
-----------+----------------------------------------------------
Total | 61970.174 5,145 12.044737

I am hoping somebody can clarify for me:
1. Is running a factorial ANOVA technically the same thing as a linear regression, in terms of a p value? The p value is interestingly the same for my Beta coefficient for interaction term in my Lin Reg and the for the Prob>F value in my ANOVA corresponding to the interaction term .

2. What is the difference between # and ## if any?

3. This is a hybrid interaction as one term is continuous and the other is not; would I interpret this as:
"for every one unit increase in h_score, in females, the lop_score decreases -0.068 (-.2641637 - 0.1277624, 95%CI)?"

4. Is it possible that you can have an interaction term in a regression that ends up not significant, even if you've run univariable regressions, seperated by sex, and seen that the B values are different from each other?

Thanks all for any clarification whatsoever; most of this is new to me and I'm trying my best to become as knowledgeable about this as possible.

Last edited by Alan Jeddi; 06 Jul 2018, 17:28.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#2

06 Jul 2018, 18:21

1. Yes, they are equivalent.

2. a#b causes Stata to include the interaction term between a and b in the model, but it does not include each of a and b separately (so you have to write out a and b separately to have a valid model). a##b causes Stata to include a, and b, and the interaction term.

3. Well, you don't say whether female is coded 0 or 1. Either way, though, your interpretation is not correct. What you can say is that for sex = 0, a unit difference in h_score is associated with a 0.309 difference in the expected value of lop_score, whereas for sex = 1, a unit difference in h_score is associated with a 0.309 - 0.068 = 0.241 difference in the expected value of lop_score. You can get these numbers more directly and more easily by running -margins sex, dydx(h_score)- after the regression.

4. Yes it is possible. It depends on how your visual perception of difference between the coefficients aligns with statistical significance. For most people that's not particularly well, so guessing the statistical significance of the difference from looking at the separate outputs is usually a losing game. Then again, in an interaction model, particularly where one of the variables is continuous, the statistical significance of the interaction term is, usually unimportant, and often misleading. What really matters is how different the predicted values of the dependent variable are at values of the continuous variable that are important. So, assuming that the most important values of h_score are, for sake of discussion, 2 through 5, you would be better off looking at

Code:

margins sex, at(h_score = (2 3 4 5)) marginsplot

and seeing whether the sex = 0 and sex = 1 plots are separated by a meaningful amount.
1 like
Comment

Alan Jeddi

Join Date: Jun 2018
Posts: 42

07 Jul 2018, 10:29

Originally posted by Clyde Schechter View Post

1. Yes, they are equivalent.

2. a#b causes Stata to include the interaction term between a and b in the model, but it does not include each of a and b separately (so you have to write out a and b separately to have a valid model). a##b causes Stata to include a, and b, and the interaction term.

3. Well, you don't say whether female is coded 0 or 1. Either way, though, your interpretation is not correct. What you can say is that for sex = 0, a unit difference in h_score is associated with a 0.309 difference in the expected value of lop_score, whereas for sex = 1, a unit difference in h_score is associated with a 0.309 - 0.068 = 0.241 difference in the expected value of lop_score. You can get these numbers more directly and more easily by running -margins sex, dydx(h_score)- after the regression.

4. Yes it is possible. It depends on how your visual perception of difference between the coefficients aligns with statistical significance. For most people that's not particularly well, so guessing the statistical significance of the difference from looking at the separate outputs is usually a losing game. Then again, in an interaction model, particularly where one of the variables is continuous, the statistical significance of the interaction term is, usually unimportant, and often misleading. What really matters is how different the predicted values of the dependent variable are at values of the continuous variable that are important. So, assuming that the most important values of h_score are, for sake of discussion, 2 through 5, you would be better off looking at

Code:

margins sex, at(h_score = (2 3 4 5))
marginsplot

and seeing whether the sex = 0 and sex = 1 plots are separated by a meaningful amount.

Hi Mr. Schechter,
Thank you for your reply; this is very helpful. Would you know how to to interpret a marginsplot?

Is it the case that if the standard errors of the two groups (ie sex) overlap, they are not significant?

Here is an example of my output, but I am rather confused. I regressed my DV (iopcc_out against c.edlevel09, and included an interaction between edlevel09 and sex)> the interaction was signifiant (p=0.039)

Then I plotted the margins, and I saw that the Confidence Intervals overlap - is this possible while the actual interaction remains significant??

regress iopcc_out c.edlevel09 sex c.edlevel09#sex


iopcc_out	Coef.	Std. Err.	t	P>t	[95% Conf.	Interval]

edlevel09	.0536598	.071542	0.75	0.453	-.0865931	.1939126
sex	.0066763	.1783541	0.04	0.970	-.3429736	.3563263

sex#c.edlevel09
1	-.1932287	.0935574	-2.07	0.039	-.3766409	-.0098164

_cons	16.90736	.1409995	119.91	0.000	16.63094	17.18378

margins, dydx(c.edlevel09) over(sex)

edlevel09
sex
0	.0536598	.071542	0.75	0.453	-.0865931	.1939126
1	-.1395689	.0602886	-2.32	0.021	-.2577603	-.0213775

Click image for larger version

Name: Screen Shot 2018-07-07 at 5.26.46 PM.png
Views: 1
Size: 49.2 KB
ID: 1452317

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#4

07 Jul 2018, 10:54

Yes, it is entirely possible for two things to each be imprecisely estimated from the data, but the difference between them to be precisely estimated. See https://www.cscu.cornell.edu/news/statnews/stnews73.pdf for an explanation and example in a simpler context. The same principles apply to regression slopes.

The margins plot that you did is probably not helpful in any case. It contains no information that isn't directly shown in the -margins- output itself, and using -over(sex)-, while harmless in this very simple model, could give you some very unhelpful statistics (conditional marginal effects, whereas what is usually needed are adjusted marginal effects) if your model included other covariates. It should be -margins sex, dydx(edlevel09)-.

Let me again emphasize that focusing on statistical significance of an interaction term involving a continuous variable is generally not helpful. Please refer to my advice in numbered paragraph 4 of post #2.
Comment

Announcement