Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between # and ## with two binary variables

    Hi,

    I was experimenting with interaction terms using binary variables and found an unexpected result when using # and ## for binary variables. To preface I understand the difference between # and ## when using categoric/binary and continuous variables; it is just the case of two binary variables that is problematic.

    In the code below, I run two regressions, and the coefficient on the interaction term differs between the two. To me, both regressions include high_mpg and domestic on their own, and both have corresponding identical coefficients to support that. But the interaction coefficient differs. What is happening that means the coefficient is different?


    sysuse auto, clear

    tab foreign, g(foreign_)

    ren foreign_1 domestic

    gen low_mpg = 0
    replace low_mpg = 1 if mpg <= 20
    gen high_mpg = 0
    replace high_mpg = 1 if mpg > 20

    reg price i.high_mpg#i.domestic gear_ratio
    reg price i.high_mpg##i.domestic gear_ratio



    Results for single #

    reg price high_mpg#domestic gear_ratio

    -----------------------------------------------------------------------------------
    price | Coefficient Std. err. t P>|t| [95% conf. interval]
    ------------------+----------------------------------------------------------------
    high_mpg#domestic |
    0 1 | -4427.653 1330.882 -3.33 0.001 -7082.69 -1772.617
    1 0 | -2053.396 1393.149 -1.47 0.145 -4832.652 725.8594
    1 1 | -4914.193 1322.372 -3.72 0.000 -7552.253 -2276.133
    |
    gear_ratio | -3862.311 1040.507 -3.71 0.000 -5938.065 -1786.557
    _cons | 21517.58 3500.157 6.15 0.000 14534.95 28500.2
    -----------------------------------------------------------------------------------


    Results for double ##

    reg price high_mpg##domestic gear_ratio
    -----------------------------------------------------------------------------------
    price | Coefficient Std. err. t P>|t| [95% conf. interval]
    ------------------+----------------------------------------------------------------
    1.high_mpg | -2053.396 1393.149 -1.47 0.145 -4832.652 725.8594
    1.domestic | -4427.653 1330.882 -3.33 0.001 -7082.69 -1772.617
    |
    high_mpg#domestic |
    1 1 | 1566.856 1528.869 1.02 0.309 -1483.154 4616.866
    |
    gear_ratio | -3862.311 1040.507 -3.71 0.000 -5938.065 -1786.557
    _cons | 21517.58 3500.157 6.15 0.000 14534.95 28500.2
    -----------------------------------------------------------------------------------

  • #2
    The interpretation is different.

    In the first regression, the interaction tells you the difference between 1 1 and 0 0, i.e. the difference between being high mpg and domestic, vs being low mpg and foreign.
    In the second regression, the interaction tells you the difference between 1 1 and 1 0, or 1 1 and 0 1 (which are both identical), i.e. the marginal effect of being domestic instead of foreign, for high mpg cars OR the marginal effect of being high mpg instead of low, for domestic cars.

    You can confirm that both models have the same marginal effects by doing
    Code:
    margins high_mpg#domestic
    marginsplot
    after each regression.
    Last edited by Hemanshu Kumar; 01 Sep 2022, 10:39.

    Comment


    • #3
      Great that makes sense, thank so much!

      Comment

      Working...
      X