Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changed coding of binary variable in interaction

    Dear Statlist,


    i calculate the following regression:

    Code:
    reg TOT sex##altKi
    TOT is continuous, altKi is binary.

    then I calculate the same regression again with
    Code:
    reg TOT sex##neuKi
    The coding of altKi is 1 and 2
    The coding of neuKi is 0 and 1.


    So with the interpretation of effects of course the direction of effects changes because its also coded the other way around.
    But what I do not get is that also the main effect of sex changes.

    I attach the output.

    Thank you for your help!
    Attached Files

  • #2
    The coding of altKi is 1 and 2
    The coding of neuKi is 0 and 1.
    How you code the indicator does not affect the coefficient estimates, but you will prefer 0/1 coding with the name of the variable representing the 1 category. This makes reading and interpreting the results easier.

    So with the interpretation of effects of course the direction of effects changes because its also coded the other way around.
    But what I do not get is that also the main effect of sex changes.
    The "main effect" represents different things in these two contexts. It is the difference between total for males and females when the indicator is at its set base level. If you change the base, the difference thus changes. Consider:

    Code:
    sysuse auto, clear
    set seed 09212023
    gen newvar= runiformint(1,2)
    
    regress price foreign##ib1.newvar
    sum price if foreign & newvar==1
    sum price if !foreign & newvar==1
    display 7499.667-5823.607
    
    regress price foreign##ib2.newvar
    sum price if foreign & newvar==2
    sum price if !foreign & newvar==2
    display  5612.769-6362.708
    Res.:

    Code:
    . regress price foreign##ib1.newvar
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(3, 70)        =      0.92
           Model |  24197976.2         3  8065992.06   Prob > F        =    0.4337
        Residual |   610867420        70  8726677.43   R-squared       =    0.0381
    -------------+----------------------------------   Adj R-squared   =   -0.0031
           Total |   635065396        73  8699525.97   Root MSE        =    2954.1
    
    --------------------------------------------------------------------------------
             price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    ---------------+----------------------------------------------------------------
           foreign |
          Foreign  |    1676.06   1131.944     1.48   0.143    -581.5322    3933.651
          2.newvar |   539.1012   821.7534     0.66   0.514    -1099.834    2178.037
                   |
    foreign#newvar |
        Foreign#2  |  -2425.999   1521.904    -1.59   0.115    -5461.341    609.3435
                   |
             _cons |   5823.607   558.2715    10.43   0.000      4710.17    6937.045
    --------------------------------------------------------------------------------
    
    .
    . sum price if foreign & newvar==1
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
           price |          9    7499.667    3197.297       4499      12990
    
    .
    . sum price if !foreign & newvar==1
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
           price |         28    5823.607    3092.334       3291      15906
    
    .
    . display 7499.667-5823.607
    1676.06
    
    .
    .
    .
    . regress price foreign##ib2.newvar
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(3, 70)        =      0.92
           Model |  24197976.2         3  8065992.06   Prob > F        =    0.4337
        Residual |   610867420        70  8726677.43   R-squared       =    0.0381
    -------------+----------------------------------   Adj R-squared   =   -0.0031
           Total |   635065396        73  8699525.97   Root MSE        =    2954.1
    
    --------------------------------------------------------------------------------
             price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    ---------------+----------------------------------------------------------------
           foreign |
          Foreign  |  -749.9391   1017.298    -0.74   0.463    -2778.875    1278.997
          1.newvar |  -539.1012   821.7534    -0.66   0.514    -2178.037    1099.834
                   |
    foreign#newvar |
        Foreign#1  |   2425.999   1521.904     1.59   0.115    -609.3435    5461.341
                   |
             _cons |   6362.708   603.0021    10.55   0.000     5160.059    7565.358
    --------------------------------------------------------------------------------
    
    . 
    . sum price if foreign & newvar==2
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
           price |         13    5612.769    1907.153       3748       9735
    
    . 
    . sum price if !foreign & newvar==2
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
           price |         24    6362.708     3143.32       3667      13594
    
    . 
    . display  5612.769-6362.708
    -749.939
    
    . 
    
    
    .
    More importantly, in the presence of an interaction, you want to use margins to ease the interpretation of the results. Trying to interpret the coefficients on the variables involved in the interaction in isolation is particularly not useful.

    Code:
    reg TOT i.sex##i.altKi
    margins i.sex##i.altKi
    Last edited by Andrew Musau; 21 Sep 2023, 02:55.

    Comment


    • #3
      Thank you for your detailled answer. I think I got part of it, but not the full point. You write
      The "main effect" represents different things in these two contexts. It is the difference between total for males and females when the indicator is at its set base level. If you change the base, the difference thus changes. Consider:
      so in one model the base is 0 and in one model the base is 1. But the point is not about changing the values from 0 to 1 but the point that in one case the base is children and in the other one the base is teens? is that correct? and because the effect of sex is different between children and teens, the main effect of sex is different then?

      or do i get it wrong here?

      Comment


      • #4
        In #2, notice that I put main effect in quotation marks because that is how you refer to it. As I understand it, the sample contains only kids and teens, so the variables "altKi" and "neuKi" are binary, where one level is teens and the other is kids. So changing the base from 0 to 1 is changing the base from teens to kids, assuming kids is coded 1. Then the coefficient will be the difference in "TOT" between teenage boys and girls if the base is 0 in one case and the difference in "TOT" between young boys and girls if the base is 1 in the other case (young here meaning kids). These differences are distinct. If you use margins as I suggested, you will see this. If you want only the effect of gender:

        Code:
        reg TOT i.sex##i.altKi
        margins sex

        Comment

        Working...
        X