Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction terms (# vs ##) in linear regression and factorial ANOVA

    Hi All,

    I am trying to assess the effect of my IV (h_score) on my DV (lop_score), but I want to see if sex is an effect modifier of this association.

    IV is continuous
    DV is continuous
    Sex (0 = male, 1 = female)

    What I have done so far is the following code:

    regress lop_score h_score
    regress iop_score h_score if sex==0
    regress iop_score h_score if sex==1

    Which gives me the crude lop_score, as well as the lop_score for each sex.

    Then, I realized the slopes and the coefficient are different between sexes

    So I then assessed formally for an interaction by running:

    regress lop_score h_score lop_score#c.h_score

    --------------------------------------------------------------------------
    lop_score | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    h_score | .3087371 .074398 4.15 0.000 .1628855 .4545888
    sex | -.4225724 .1000699 -4.22 0.000 -.6187519 -.2263929
    |
    sex#c.h_score |
    1 | -.0682006 .0999594 -0.68 0.495 -.2641637 .1277624
    |
    _cons | 17.08114 .0748511 228.20 0.000 16.9344 17.22788
    ------------------------------------------------------------------------------



    Then I did a Factorial ANOVA:

    anova lop_score##c.h_score


    Number of obs = 5,146 R-squared = 0.0075
    Root MSE = 3.45848 Adj R-squared = 0.0069

    Source | Partial SS df MS F Prob>F
    -----------+----------------------------------------------------
    Model | 466.29203 3 155.43068 12.99 0.0000
    |
    h_score | 361.16054 1 361.16054 30.19 0.0000
    sex | 213.28778 1 213.28778 17.83 0.0000
    sex#h_score| 5.5680037 1 5.5680037 0.47 0.4951
    |
    Residual | 61503.882 5,142 11.961082
    -----------+----------------------------------------------------
    Total | 61970.174 5,145 12.044737




    I am hoping somebody can clarify for me:
    1. Is running a factorial ANOVA technically the same thing as a linear regression, in terms of a p value? The p value is interestingly the same for my Beta coefficient for interaction term in my Lin Reg and the for the Prob>F value in my ANOVA corresponding to the interaction term .

    2. What is the difference between # and ## if any?

    3. This is a hybrid interaction as one term is continuous and the other is not; would I interpret this as:
    "for every one unit increase in h_score, in females, the lop_score decreases -0.068 (-.2641637 - 0.1277624, 95%CI)?"

    4. Is it possible that you can have an interaction term in a regression that ends up not significant, even if you've run univariable regressions, seperated by sex, and seen that the B values are different from each other?

    Thanks all for any clarification whatsoever; most of this is new to me and I'm trying my best to become as knowledgeable about this as possible.









    Last edited by Alan Jeddi; 06 Jul 2018, 17:28.

  • #2
    1. Yes, they are equivalent.

    2. a#b causes Stata to include the interaction term between a and b in the model, but it does not include each of a and b separately (so you have to write out a and b separately to have a valid model). a##b causes Stata to include a, and b, and the interaction term.

    3. Well, you don't say whether female is coded 0 or 1. Either way, though, your interpretation is not correct. What you can say is that for sex = 0, a unit difference in h_score is associated with a 0.309 difference in the expected value of lop_score, whereas for sex = 1, a unit difference in h_score is associated with a 0.309 - 0.068 = 0.241 difference in the expected value of lop_score. You can get these numbers more directly and more easily by running -margins sex, dydx(h_score)- after the regression.

    4. Yes it is possible. It depends on how your visual perception of difference between the coefficients aligns with statistical significance. For most people that's not particularly well, so guessing the statistical significance of the difference from looking at the separate outputs is usually a losing game. Then again, in an interaction model, particularly where one of the variables is continuous, the statistical significance of the interaction term is, usually unimportant, and often misleading. What really matters is how different the predicted values of the dependent variable are at values of the continuous variable that are important. So, assuming that the most important values of h_score are, for sake of discussion, 2 through 5, you would be better off looking at
    Code:
    margins sex, at(h_score = (2 3 4 5))
    marginsplot
    and seeing whether the sex = 0 and sex = 1 plots are separated by a meaningful amount.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      1. Yes, they are equivalent.

      2. a#b causes Stata to include the interaction term between a and b in the model, but it does not include each of a and b separately (so you have to write out a and b separately to have a valid model). a##b causes Stata to include a, and b, and the interaction term.

      3. Well, you don't say whether female is coded 0 or 1. Either way, though, your interpretation is not correct. What you can say is that for sex = 0, a unit difference in h_score is associated with a 0.309 difference in the expected value of lop_score, whereas for sex = 1, a unit difference in h_score is associated with a 0.309 - 0.068 = 0.241 difference in the expected value of lop_score. You can get these numbers more directly and more easily by running -margins sex, dydx(h_score)- after the regression.

      4. Yes it is possible. It depends on how your visual perception of difference between the coefficients aligns with statistical significance. For most people that's not particularly well, so guessing the statistical significance of the difference from looking at the separate outputs is usually a losing game. Then again, in an interaction model, particularly where one of the variables is continuous, the statistical significance of the interaction term is, usually unimportant, and often misleading. What really matters is how different the predicted values of the dependent variable are at values of the continuous variable that are important. So, assuming that the most important values of h_score are, for sake of discussion, 2 through 5, you would be better off looking at
      Code:
      margins sex, at(h_score = (2 3 4 5))
      marginsplot
      and seeing whether the sex = 0 and sex = 1 plots are separated by a meaningful amount.
      Hi Mr. Schechter,
      Thank you for your reply; this is very helpful. Would you know how to to interpret a marginsplot?

      Is it the case that if the standard errors of the two groups (ie sex) overlap, they are not significant?

      Here is an example of my output, but I am rather confused. I regressed my DV (iopcc_out against c.edlevel09, and included an interaction between edlevel09 and sex)> the interaction was signifiant (p=0.039)

      Then I plotted the margins, and I saw that the Confidence Intervals overlap - is this possible while the actual interaction remains significant??





      regress iopcc_out c.edlevel09 sex c.edlevel09#sex
      iopcc_out Coef. Std. Err. t P>t [95% Conf. Interval]
      edlevel09 .0536598 .071542 0.75 0.453 -.0865931 .1939126
      sex .0066763 .1783541 0.04 0.970 -.3429736 .3563263
      sex#c.edlevel09
      1 -.1932287 .0935574 -2.07 0.039 -.3766409 -.0098164
      _cons 16.90736 .1409995 119.91 0.000 16.63094 17.18378
      margins, dydx(c.edlevel09) over(sex)
      edlevel09
      sex
      0 .0536598 .071542 0.75 0.453 -.0865931 .1939126
      1 -.1395689 .0602886 -2.32 0.021 -.2577603 -.0213775


      Click image for larger version

Name:	Screen Shot 2018-07-07 at 5.26.46 PM.png
Views:	1
Size:	49.2 KB
ID:	1452317




      Comment


      • #4
        Yes, it is entirely possible for two things to each be imprecisely estimated from the data, but the difference between them to be precisely estimated. See https://www.cscu.cornell.edu/news/statnews/stnews73.pdf for an explanation and example in a simpler context. The same principles apply to regression slopes.

        The margins plot that you did is probably not helpful in any case. It contains no information that isn't directly shown in the -margins- output itself, and using -over(sex)-, while harmless in this very simple model, could give you some very unhelpful statistics (conditional marginal effects, whereas what is usually needed are adjusted marginal effects) if your model included other covariates. It should be -margins sex, dydx(edlevel09)-.

        Let me again emphasize that focusing on statistical significance of an interaction term involving a continuous variable is generally not helpful. Please refer to my advice in numbered paragraph 4 of post #2.

        Comment

        Working...
        X