Multiple omitted categories when interacting?

Jaycob Applegate

Join Date: Jan 2023
Posts: 40

Multiple omitted categories when interacting?

18 Mar 2023, 16:43

Possibly a rudimentary question, but I am unsure how my interaction is behaving this way. When I use the following code,

Code:

 logit vaccinated male age ib0.race_rc native ib1.marstat numchild h14hhres proxy  work insurance  ib1.wealthquint1 ib0.cenreg masks_toomuch malechild  reduccat##keducat, or allbaselevels

I receive the following output

Code:

. logit vaccinated male age ib0.race_rc native ib1.marstat numchild h14hhres proxy  work insurance  ib1.wealthquint1 ib0.cenreg masks_toomuch malechild  reduccat##keducat, or allbaselevels

Iteration 0:   log likelihood = -1695.4695  
Iteration 1:   log likelihood = -1512.5555  
Iteration 2:   log likelihood = -1487.5511  
Iteration 3:   log likelihood =   -1487.39  
Iteration 4:   log likelihood =   -1487.39  

Logistic regression                                     Number of obs =  4,225
                                                        LR chi2(31)   = 416.16
                                                        Prob > chi2   = 0.0000
Log likelihood = -1487.39                               Pseudo R2     = 0.1227

------------------------------------------------------------------------------------
        vaccinated | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------------+----------------------------------------------------------------
              male |   1.194619   .1258765     1.69   0.091     .9717147    1.468657
               age |   1.032606   .0068076     4.87   0.000     1.019349    1.046035
                   |
           race_rc |
         NH-White  |          1  (base)
         NH-Black  |   1.482211   .2115324     2.76   0.006     1.120551    1.960598
         NH-Other  |   .6913389   .1741085    -1.47   0.143     .4220095    1.132556
         Hispanic  |   2.474591   .5055626     4.43   0.000     1.658065     3.69322
                   |
            native |   .6683936   .1335477    -2.02   0.044      .451813    .9887939
                   |
           marstat |
          Married  |          1  (base)
      Sep/Divorce  |   .6733267   .0915843    -2.91   0.004     .5157598     .879031
          Widowed  |   .4842703   .0656398    -5.35   0.000     .3712896    .6316301
    Never-married  |   .8865486   .2507104    -0.43   0.670      .509317    1.543181
                   |
          numchild |   .9417637   .0249689    -2.26   0.024     .8940753    .9919957
          h14hhres |   .9002094   .0360627    -2.62   0.009     .8322315    .9737398
             proxy |   .2817702    .082959    -4.30   0.000     .1582282    .5017721
              work |   .9485741   .1070232    -0.47   0.640     .7603861    1.183337
         insurance |   1.623376   .3575462     2.20   0.028     1.054249    2.499741
                   |
      wealthquint1 |
                1  |          1  (base)
                2  |   1.166197   .1657533     1.08   0.279     .8826515    1.540829
                3  |   1.137073   .1702812     0.86   0.391     .8478471    1.524963
                4  |   1.289984   .2071102     1.59   0.113       .94172    1.767042
                5  |   2.729461   .5490092     4.99   0.000     1.840198    4.048454
                   |
            cenreg |
               NE  |          1  (base)
               MW  |   .6457031   .1185107    -2.38   0.017     .4506149    .9252525
            South  |   .4683402    .080053    -4.44   0.000     .3350166    .6547215
             West  |   .5105215   .0963674    -3.56   0.000     .3526461     .739076
                   |
     masks_toomuch |   .2888079    .030219   -11.87   0.000     .2352583    .3545465
         malechild |   .9438558   .0907005    -0.60   0.548     .7818237    1.139469
                   |
          reduccat |
               HS  |          1  (base)
         Some Col  |   1.199936   .3153607     0.69   0.488     .7168853    2.008474
             Col+  |   1.658133    .775249     1.08   0.279     .6632017    4.145653
                   |
           keducat |
               HS  |          1  (base)
         Some Col  |   1.021394   .1725488     0.13   0.900     .7334933    1.422299
             Col+  |    1.68241    .279711     3.13   0.002     1.214547    2.330501
                   |
  reduccat#keducat |
            HS#HS  |          1  (base)
      HS#Some Col  |          1  (base)
          HS#Col+  |          1  (base)
      Some Col#HS  |          1  (base)
Some Col#Some Col  |   .8684592   .2829902    -0.43   0.665     .4585458    1.644812
    Some Col#Col+  |   .8416305   .2599964    -0.56   0.577      .459376    1.541965
          Col+#HS  |          1  (base)
    Col+#Some Col  |   .8178248   .4299102    -0.38   0.702     .2918802    2.291479
        Col+#Col+  |   .7731643    .382879    -0.52   0.603     .2929193    2.040778
                   |
             _cons |   1.207807   .7189185     0.32   0.751     .3761315    3.878422
------------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

I have multiple omitted categories. I am attempting to interaction respondent's education (HS, some college, college) with their child's education (HS, some college, college). How do I interpret my coefficients? Is plotting necessary, if so, a margins plot?

Part of my confusion is that when I run the code with only the interaction and no main effects, I receive output that I would expect. What is the correct approach and why is there a difference?

Code:

logit vaccinated male age ib0.race_rc native ib1.marstat numchild h14hhres proxy  work insurance  ib1.wealthquint1 ib0.cenreg masks_toomuch malechild  reduccat#keducat, or allbaselevels

Code:

. logit vaccinated male age ib0.race_rc native ib1.marstat numchild h14hhres proxy  work insurance  ib1.wealthquint1 ib0.cenreg masks_toomuch malechild  reduccat#keducat, or allbaselevels

Iteration 0:   log likelihood = -1695.4695  
Iteration 1:   log likelihood = -1512.5555  
Iteration 2:   log likelihood = -1487.5511  
Iteration 3:   log likelihood =   -1487.39  
Iteration 4:   log likelihood =   -1487.39  

Logistic regression                                     Number of obs =  4,225
                                                        LR chi2(31)   = 416.16
                                                        Prob > chi2   = 0.0000
Log likelihood = -1487.39                               Pseudo R2     = 0.1227

------------------------------------------------------------------------------------
        vaccinated | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------------+----------------------------------------------------------------
              male |   1.194619   .1258765     1.69   0.091     .9717147    1.468657
               age |   1.032606   .0068076     4.87   0.000     1.019349    1.046035
                   |
           race_rc |
         NH-White  |          1  (base)
         NH-Black  |   1.482211   .2115324     2.76   0.006     1.120551    1.960598
         NH-Other  |   .6913389   .1741085    -1.47   0.143     .4220095    1.132556
         Hispanic  |   2.474591   .5055626     4.43   0.000     1.658065     3.69322
                   |
            native |   .6683936   .1335477    -2.02   0.044      .451813    .9887939
                   |
           marstat |
          Married  |          1  (base)
      Sep/Divorce  |   .6733267   .0915843    -2.91   0.004     .5157598     .879031
          Widowed  |   .4842703   .0656398    -5.35   0.000     .3712896    .6316301
    Never-married  |   .8865486   .2507104    -0.43   0.670      .509317    1.543181
                   |
          numchild |   .9417637   .0249689    -2.26   0.024     .8940753    .9919957
          h14hhres |   .9002094   .0360627    -2.62   0.009     .8322315    .9737398
             proxy |   .2817702    .082959    -4.30   0.000     .1582282    .5017721
              work |   .9485741   .1070232    -0.47   0.640     .7603861    1.183337
         insurance |   1.623376   .3575462     2.20   0.028     1.054249    2.499741
                   |
      wealthquint1 |
                1  |          1  (base)
                2  |   1.166197   .1657533     1.08   0.279     .8826515    1.540829
                3  |   1.137073   .1702812     0.86   0.391     .8478471    1.524963
                4  |   1.289984   .2071102     1.59   0.113       .94172    1.767042
                5  |   2.729461   .5490092     4.99   0.000     1.840198    4.048454
                   |
            cenreg |
               NE  |          1  (base)
               MW  |   .6457031   .1185107    -2.38   0.017     .4506149    .9252525
            South  |   .4683402    .080053    -4.44   0.000     .3350166    .6547215
             West  |   .5105215   .0963674    -3.56   0.000     .3526461     .739076
                   |
     masks_toomuch |   .2888079    .030219   -11.87   0.000     .2352583    .3545465
         malechild |   .9438558   .0907005    -0.60   0.548     .7818237    1.139469
                   |
  reduccat#keducat |
            HS#HS  |          1  (base)
      HS#Some Col  |   1.021394   .1725488     0.13   0.900     .7334933    1.422299
          HS#Col+  |    1.68241    .279711     3.13   0.002     1.214547    2.330501
      Some Col#HS  |   1.199936   .3153607     0.69   0.488     .7168853    2.008474
Some Col#Some Col  |    1.06439   .2106292     0.32   0.753     .7222025     1.56871
    Some Col#Col+  |   1.699069   .3044057     2.96   0.003     1.195941    2.413863
          Col+#HS  |   1.658133    .775249     1.08   0.279     .6632017    4.145653
    Col+#Some Col  |   1.385074   .3465041     1.30   0.193     .8482579    2.261612
        Col+#Col+  |   2.156864   .4052279     4.09   0.000     1.492453     3.11706
                   |
             _cons |   1.207807   .7189185     0.32   0.751     .3761315    3.878422
------------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

Example data

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(vaccinated male) float(age race_rc) double(native marstat) byte(numchild h14hhres) double proxy byte work float insurance byte(wealthquint1 cenreg masks_toomuch malechild)
1 0 75 0 1 2 2 2 0 0 1 5 0 0 1
1 0 74 0 1 2 1 1 0 0 1 5 0 0 0
1 1 85 0 1 1 2 2 0 0 1 5 0 0 0
1 0 78 0 1 1 2 2 0 0 1 5 0 0 0
1 0 86 0 1 1 3 2 0 0 1 5 0 0 1
1 0 86 1 1 4 2 1 0 0 1 1 0 0 1
0 0 82 0 0 3 3 5 0 0 1 4 0 0 0
1 0 83 2 0 1 3 2 0 0 1 4 3 1 0
1 1 81 2 0 1 3 2 0 0 1 4 3 0 0
1 1 81 0 1 1 1 2 0 0 1 4 3 0 1
1 0 81 0 1 1 1 2 0 0 1 4 3 0 1
1 0 81 1 0 3 4 1 0 0 1 3 0 0 1
1 0 74 1 1 3 5 4 0 1 1 1 2 0 1
1 1 83 3 1 1 . 2 0 0 1 4 3 0 1
1 0 72 0 1 1 . 2 0 0 1 4 3 0 1
1 1 85 0 1 1 4 6 0 0 1 4 3 1 0
1 0 80 0 1 1 4 6 0 1 1 4 3 0 0
1 1 89 0 0 1 5 2 0 0 1 5 3 1 1
0 0 82 1 1 4 1 1 0 1 1 1 3 0 0
1 0 82 0 1 3 3 2 0 0 1 3 2 0 1
1 0 81 0 0 2 2 1 0 0 1 3 1 0 0
1 0 87 0 1 3 4 1 0 0 1 5 1 0 1
0 0 84 0 1 . . . . . . . . 0 0
1 1 81 0 1 3 5 2 0 1 1 5 1 0 1
0 0 91 0 1 3 3 5 1 0 1 4 1 0 0
1 0 90 0 1 3 4 2 0 0 1 3 1 0 1
1 1 87 1 1 1 4 2 0 1 1 3 3 0 1
1 0 65 1 1 1 . . . . . . . 0 1
1 1 84 0 1 1 3 2 0 0 1 2 2 0 1
1 0 80 0 1 3 5 1 0 0 1 1 2 0 0
1 0 89 0 1 1 4 2 0 0 1 4 1 0 1
1 0 85 1 1 3 8 1 0 0 1 1 1 0 0
1 0 82 1 0 1 1 3 0 0 1 2 1 0 0
1 1 85 2 1 1 1 3 0 1 . 2 1 0 0
1 0 80 0 1 1 5 2 0 0 1 2 1 1 0
1 1 84 0 1 1 2 2 0 0 1 3 1 0 0
1 0 78 0 1 1 2 2 0 0 1 3 1 0 0
1 0 89 0 1 3 4 3 0 0 1 3 1 0 1
1 0 80 0 1 2 1 1 0 0 1 3 1 1 0
1 0 90 1 1 3 2 1 0 1 1 3 1 0 0
1 0 76 1 1 3 3 1 0 0 1 1 1 0 0
1 0 85 1 1 3 4 1 0 0 1 1 1 0 1
1 0 83 1 1 3 2 1 0 0 1 1 1 0 0
1 0 68 0 1 3 1 1 0 0 1 5 3 0 1
1 1 84 0 1 1 2 2 0 0 . 4 3 . 1
1 0 78 0 1 1 2 2 0 0 1 4 3 0 1
0 0 84 0 1 3 7 2 0 0 1 5 3 1 1
1 0 90 0 1 1 2 2 0 0 1 2 3 0 1
1 1 90 0 1 1 2 2 0 0 1 2 3 0 1
0 0 66 1 1 3 . 1 0 0 1 1 3 0 1
1 1 81 1 1 1 4 2 0 0 1 4 3 0 1
1 0 71 1 1 1 4 2 0 0 1 4 3 0 1
1 0 65 1 1 2 2 1 0 1 1 1 2 0 1
1 0 86 1 1 2 3 2 0 0 0 3 2 0 0
1 0 86 0 1 3 3 1 0 0 1 3 2 0 0
1 0 80 1 1 2 3 2 0 1 1 3 2 0 1
1 0 90 0 1 1 2 2 0 0 1 3 2 0 1
1 1 82 1 1 2 2 1 0 0 1 4 2 0 0
1 0 83 0 1 3 3 3 0 0 1 3 2 0 0
0 0 74 0 1 2 2 3 0 0 1 5 2 1 0
1 1 87 0 1 1 3 2 0 0 1 3 2 1 1
1 0 75 0 1 1 3 2 0 0 1 3 2 0 1
1 1 80 1 1 3 1 1 0 0 1 2 2 0 1
1 0 72 1 1 3 2 2 0 0 1 2 2 0 1
1 0 80 0 1 . . . . . . . . 1 0
1 0 79 1 1 3 3 1 0 0 1 3 2 0 0
1 0 77 0 1 1 2 2 0 0 1 5 0 0 1
1 0 80 0 1 2 2 1 0 0 1 5 0 0 0
0 0 95 0 1 3 3 1 0 0 1 4 0 0 0
1 0 75 3 0 3 5 1 0 0 1 1 0 0 0
1 0 84 0 1 3 1 1 0 1 1 4 0 0 1
1 1 82 3 0 1 1 2 0 0 1 3 2 0 0
1 0 84 3 0 1 1 2 0 0 1 3 2 0 0
1 1 90 0 1 3 4 6 0 0 1 4 3 0 1
1 0 86 0 1 3 7 1 0 0 1 5 1 0 0
1 0 78 0 1 . . . . . . . . 0 1
1 0 77 0 1 3 4 2 0 0 1 5 1 0 1
0 1 87 0 1 1 3 2 0 0 1 2 1 0 1
0 0 87 0 1 1 3 2 0 0 1 2 1 0 1
1 1 82 0 1 1 2 2 0 0 1 4 0 0 1
1 0 80 0 1 1 2 2 0 0 1 4 0 . 1
1 1 87 0 1 1 3 2 0 0 1 4 0 0 1
1 0 86 0 1 1 3 2 0 0 1 4 0 0 1
1 1 80 0 1 1 2 2 0 1 1 4 0 0 1
1 0 77 0 1 1 2 2 0 0 1 4 0 0 1
1 1 92 0 1 2 3 1 0 0 1 2 2 0 0
1 0 87 0 1 3 2 2 0 0 1 3 0 0 1
1 0 82 0 1 3 4 1 0 0 1 2 0 0 1
1 1 80 0 1 1 4 2 0 0 1 4 0 0 1
1 0 79 0 1 1 4 2 0 0 1 4 0 1 1
1 1 82 0 1 1 2 2 0 0 1 5 2 0 0
1 0 77 0 1 1 2 2 0 1 1 5 2 0 0
1 0 93 1 1 . . . . . . . . 1 1
1 0 82 1 1 3 6 4 0 0 1 2 2 0 1
1 1 84 0 1 3 3 1 0 0 1 2 2 0 0
1 0 76 0 1 3 3 2 0 0 1 5 0 0 1
1 0 80 0 1 2 2 3 0 0 1 2 2 0 0
1 1 90 1 1 . . . . . . . . 1 1
1 1 83 0 1 1 3 2 0 0 1 5 1 0 0
1 0 82 0 1 1 3 2 0 0 1 5 1 0 0
end
label values race_rc race
label def race 0 "NH-White", modify
label def race 1 "NH-Black", modify
label def race 2 "NH-Other", modify
label def race 3 "Hispanic", modify
label values marstat mar
label def mar 1 "Married", modify
label def mar 2 "Sep/Divorce", modify
label def mar 3 "Widowed", modify
label def mar 4 "Never-married", modify
label values cenreg cen
label def cen 0 "NE", modify
label def cen 1 "MW", modify
label def cen 2 "South", modify
label def cen 3 "West", modify

Tags: None

Ken Chui

Join Date: Aug 2014
Posts: 1058

18 Mar 2023, 17:59

They are not "omitted", their combinations have already been expressed by the separate main effect ORs (Notice all of them have HH in it). It's just an illusion by how you require the base levels, try this three models, that should clear up your confusion:

Code:

sysuse nlsw88, clear
logit married i.race##i.collgrad, nolog or
logit married i.race##i.collgrad, nolog base or
logit married i.race##i.collgrad, nolog allbaselevel or

Results:

Code:

. logit married i.race##i.collgrad, nolog or

Logistic regression                                     Number of obs =  2,246
                                                        LR chi2(5)    =  99.61
                                                        Prob > chi2   = 0.0000
Log likelihood = -1415.1288                             Pseudo R2     = 0.0340

-------------------------------------------------------------------------------------
            married | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
--------------------+----------------------------------------------------------------
               race |
             Black  |   .3603538   .0400423    -9.19   0.000     .2898305    .4480373
             Other  |   .9883991   .5297951    -0.02   0.983     .3456821    2.826101
                    |
           collgrad |
      College grad  |   .8985446   .1101404    -0.87   0.383     .7066468    1.142554
                    |
      race#collgrad |
Black#College grad  |   1.199904   .2994077     0.73   0.465      .735782    1.956788
Other#College grad  |   .9274257   .8286632    -0.08   0.933     .1609618    5.343617
                    |
              _cons |   2.428169   .1531286    14.07   0.000     2.145849    2.747632
-------------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. logit married i.race##i.collgrad, nolog base or

Logistic regression                                     Number of obs =  2,246
                                                        LR chi2(5)    =  99.61
                                                        Prob > chi2   = 0.0000
Log likelihood = -1415.1288                             Pseudo R2     = 0.0340

-------------------------------------------------------------------------------------
            married | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
--------------------+----------------------------------------------------------------
               race |
             White  |          1  (base)
             Black  |   .3603538   .0400423    -9.19   0.000     .2898305    .4480373
             Other  |   .9883991   .5297951    -0.02   0.983     .3456821    2.826101
                    |
           collgrad |
  Not college grad  |          1  (base)
      College grad  |   .8985446   .1101404    -0.87   0.383     .7066468    1.142554
                    |
      race#collgrad |
Black#College grad  |   1.199904   .2994077     0.73   0.465      .735782    1.956788
Other#College grad  |   .9274257   .8286632    -0.08   0.933     .1609618    5.343617
                    |
              _cons |   2.428169   .1531286    14.07   0.000     2.145849    2.747632
-------------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. logit married i.race##i.collgrad, nolog allbaselevel or

Logistic regression                                     Number of obs =  2,246
                                                        LR chi2(5)    =  99.61
                                                        Prob > chi2   = 0.0000
Log likelihood = -1415.1288                             Pseudo R2     = 0.0340

-----------------------------------------------------------------------------------------
                married | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
------------------------+----------------------------------------------------------------
                   race |
                 White  |          1  (base)
                 Black  |   .3603538   .0400423    -9.19   0.000     .2898305    .4480373
                 Other  |   .9883991   .5297951    -0.02   0.983     .3456821    2.826101
                        |
               collgrad |
      Not college grad  |          1  (base)
          College grad  |   .8985446   .1101404    -0.87   0.383     .7066468    1.142554
                        |
          race#collgrad |
White#Not college grad  |          1  (base)
    White#College grad  |          1  (base)
Black#Not college grad  |          1  (base)
    Black#College grad  |   1.199904   .2994077     0.73   0.465      .735782    1.956788
Other#Not college grad  |          1  (base)
    Other#College grad  |   .9274257   .8286632    -0.08   0.933     .1609618    5.343617
                        |
                  _cons |   2.428169   .1531286    14.07   0.000     2.145849    2.747632
-----------------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#3

18 Mar 2023, 18:16

Both approaches are correct and the differences between the results are apparent, but not real. The two models are algebraic transformations of each other, and if you know the right algebra, each set of results can be readily calculated from the other. The easiest way to see that you have two different ways of representing the same model here is to run -margins reduccat#keducat- after both, or the -predict- command. You will see that those results are the same (allowing for the possibility of differences in far decimal places due to rounding errors). I would have demonstrated that for you, but I cannot as your example data does not include the key variables reduccat and keducat.

What you need to understand about these models is that in the first model, you have coefficients corresponding to reduccat and keducat separately, and these are substituting for the some of the terms that are marked out as base outcomes in your interaction output. If you count up the total number of non-base terms among reduccat, keducat and their interaction, you will see that it is exactly the same as the number of non-base terms in the interaction in the model that does not include reduccat and keducat separately. In fact, in the ## model, notice that the odds ratio for SomeCol in reduccat is 1.99936. Now look at the odds ratio of SomeCol#HS in the other model: it, too, is 1.99936. That is not a coincidence. Each of the separate odds ratios for a level of reduccat alone or keducat alone in the ## model is exactly equal to a corresponding interaction odds ratios in the # model--and that "same" interaction term is marked out as a base level in the ## model.

Now, the other interaction odds ratios in the # model are not equal to the corresponding interaction odds ratios in the ## model. But, they are, in fact, certain products of odds ratios from the ## model. For example, look at Col+#Col+ in the # model: its odds ratio is 2.156864. Now look at Col+#Col+, and the separate Col+ odds ratios for reduccat and keducat in the ## model. Those are, .7731643, 1.6658133, and 1.68241, respectively. If you multiply those together you will see that the product is 2.1668555. There is a slight difference due to rounding errors along the way; if we were working with exact, rather than floating point, arithmetic, the results would be exactly equal. Similar relationships exist for all of the interaction terms in the # model: they are either equal to an odds ratio of reduccat or keducat alone, or are products of odds ratios in the ## model.

If you plan to interpret the logistic regression results directly, those of the # model are easier to work with, as you do not need to figure out what to multiply with what, and the variables mean what they are called. By contrast, in the ## model, the results are a bit more opaque because the results that are said to be for reduccat and keducat are, in fact, not what they appear to be, but are actually only that restricted to each other's base categories.

If you work, instead, with -margins- output, then you get the same results either way and don't have to worry about any of this.

Added: Crossed with #2

Last edited by Clyde Schechter; 18 Mar 2023, 18:20.
Comment
Jaycob Applegate

Join Date: Jan 2023

Posts: 40
#4

18 Mar 2023, 18:30

Thanks Clyde, this makes a lot of sense. I went back through the data and it is now clear. I think my confusion stems from the fact that I've always been told to include both the product term and main effects in the model. Can I know use the # model and compare coefficients to one another? The other problem is that I prefer to impute this data. I learned about mimrgns on Statalist, but I am still not certain if I can use this command to also examine and graph the interactions.
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1058
#5

18 Mar 2023, 20:07

I think my confusion stems from the fact that I've always been told to include both the product term and main effects in the model.

Correct, if "do these two variable interact?" is your research question. Essentially, for categorical-categorical interaction, it's just modeling the combinations with (k-1) dummies where k = total number of possible combinations. You have a 3x3 = 9, so totally 9-1 = 8 dummies. You can count in both approaches in #1, they both have 8 dummies to capture anything related to the two education variables.

While overall models are the same, their associated p-values test different things. In the first model the p-value of, say, Col # Col tests if both having a college degree do anything extra above the individual contribution of parental edu and children edu.; while in the second model the p-values tests if the effect of Col # Col is different than HS # HS.

That is to say, if plotting margins is all you want, then either way should be fine. But if you ever want to do a hypothesis test for the actual interaction, then your first approach (with ## that includes main effects) is preferred as it'd be easier to test it. To show that again with a built-in data set:

The follow testparm statement tests if there is an interaction between race and college graduation:

Code:

sysuse nlsw88, clear logit married i.race##i.collgrad, nolog base or testparm race#collgrad

The following testparm statement tests if the logit of married across all 2 x 3 = 6 combinations are the same. And this is NOT testing interaction:

Code:

logit married i.race#i.collgrad, nolog base or testparm race#collgrad

Lastly, this equivalence reported in #1 is unique in categorical x categorical, if one of them in continuous, they will no longer be equivalent. Check out:

Code:

logit married c.age##i.collgrad logit married c.age#i.collgrad
Comment
Jaycob Applegate

Join Date: Jan 2023

Posts: 40
#6

23 Mar 2023, 08:24

Hi all,

Thanks for the replies, this really helps. One further question I have is how should I best test group differences? So I know from stratifying my model that children's education is only impacting vaccination behavior among respondents with an education of high school or less. Should I run the margins of my interaction term and then plot? Run average marginal effects?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#7

23 Mar 2023, 11:46

I would probably present these findings with

Code:

margins reduccat#keducat marginsplot margins reduccat, dydx(keducat) margins reduccat, dydx(keducat) pwcompare
Comment

Jaycob Applegate

Join Date: Jan 2023
Posts: 40

23 Mar 2023, 12:48

Thanks Clyde, this works perfectly! One additional question. When I run "margins reduccat, dydx(keducat)" I receive the following output:

Code:

 margins reduccat, dydx(keducat)

Average marginal effects                                 Number of obs = 4,225
Model VCE: OIM

Expression: Pr(vaccinated), predict()
dy/dx wrt:  1.keducat 2.keducat

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.keducat    |  (base outcome)
-------------+----------------------------------------------------------------
1.keducat    |
    reduccat |
         HS  |   .0027725   .0221331     0.13   0.900    -.0406076    .0461526
   Some Col  |   -.014765   .0340187    -0.43   0.664    -.0814404    .0519104
       Col+  |  -.0186124   .0498055    -0.37   0.709    -.1162293    .0790045
-------------+----------------------------------------------------------------
2.keducat    |
    reduccat |
         HS  |   .0592532   .0196187     3.02   0.003     .0208012    .0977053
   Some Col  |   .0373556   .0306291     1.22   0.223    -.0226763    .0973875
       Col+  |   .0235581   .0454043     0.52   0.604    -.0654327    .1125489
------------------------------------------------------------------------------

When I run "margins reduccat, dydx(keducat) pwcompare" but add "effects" to see p-values I receive the following:

Code:

. margins reduccat, dydx(keducat) pwcompare( effects  )

Pairwise comparisons of average marginal effects

Model VCE: OIM                                           Number of obs = 4,225

Expression: Pr(vaccinated), predict()
dy/dx wrt:  1.keducat 2.keducat

-----------------------------------------------------------------------------------
                  |   Contrast Delta-method    Unadjusted           Unadjusted
                  |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
------------------+----------------------------------------------------------------
0.keducat         |  (base outcome)
------------------+----------------------------------------------------------------
1.keducat         |
         reduccat |
  Some Col vs HS  |  -.0175375   .0404763    -0.43   0.665    -.0968697    .0617947
      Col+ vs HS  |  -.0213849    .054413    -0.39   0.694    -.1280325    .0852627
Col+ vs Some Col  |  -.0038474   .0602788    -0.06   0.949    -.1219918     .114297
------------------+----------------------------------------------------------------
2.keducat         |
         reduccat |
  Some Col vs HS  |  -.0218976   .0355793    -0.62   0.538    -.0916318    .0478365
      Col+ vs HS  |  -.0356951   .0490036    -0.73   0.466    -.1317404    .0603502
Col+ vs Some Col  |  -.0137975   .0542168    -0.25   0.799    -.1200605    .0924655
-----------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

I can interpret the first output, and there is significance in the category I expected. But I am having trouble with the second output, and I no longer see significance. Lastly, is the "margins" command showing just predicted probability while adding "dydx" showing average marginal effects? Thanks again for all the help!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#9

23 Mar 2023, 13:20

The two results are looking at different things. The first set of results gives you the marginal effect of being in a given category of keducat, relative to the base category (HS), conditional on your value of reduccat.

But the second results give you the differences between marginal effects of being in a given category of keducat, relative to the base category, compared across all possible pairs of categories of reduccat.

So, the marginal effect of being in 2.keducat (vs 0.keducat) is about 0.06 if reduccat == "HS".

Now, if you want to contrast that with, say, the marginal effect of 2.keducat (vs 0.keducat) when reduccat == "Col+", you could look at this particular marginal effect in the first set of results (about 0.02) and subtract them, which would give you about 0.03. But easier is to look in the second table at the Col+ vs HS row under 2.keducat to find the exact same number, along with test statistics.

If you are thinking that the difference between a "statistically significant" number and a "not statistically significant" number should, itself, be "statistically significant," then that is just a widespread fallacy. It is one of the reasons that a substantial segment of the statistical community (including me) has largely or totally abandoned the use of the concept of statistical significance altogether. Be that as it may, there is no reason to expect such a difference to be "statistically significant"; sometimes it is, and sometimes it isn't.

If you are interested in more information about abandoning the concept of statistical significance, see https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr.
1 like
Comment
Jaycob Applegate

Join Date: Jan 2023

Posts: 40
#10

23 Mar 2023, 13:26

Very interesting, I will take a look. Thanks for all of the help. This solved all of my original problems.
Comment
Jaycob Applegate

Join Date: Jan 2023

Posts: 40
#11

26 Mar 2023, 18:15

As a follow up, can I do this with imputed data?
For example, this is the code before imputation.

Code:

margins reduccat, dydx(keducat)

Can I use mimrgns to do the following? The output and everything seems similar, but are these reliable estimates?

Code:

mimrgns reduccat, dydx(keducat)
Comment

Announcement

Multiple omitted categories when interacting?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment