Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple omitted categories when interacting?

    Possibly a rudimentary question, but I am unsure how my interaction is behaving this way. When I use the following code,
    Code:
     logit vaccinated male age ib0.race_rc native ib1.marstat numchild h14hhres proxy  work insurance  ib1.wealthquint1 ib0.cenreg masks_toomuch malechild  reduccat##keducat, or allbaselevels
    I receive the following output
    Code:
    . logit vaccinated male age ib0.race_rc native ib1.marstat numchild h14hhres proxy  work insurance  ib1.wealthquint1 ib0.cenreg masks_toomuch malechild  reduccat##keducat, or allbaselevels
    
    Iteration 0:   log likelihood = -1695.4695  
    Iteration 1:   log likelihood = -1512.5555  
    Iteration 2:   log likelihood = -1487.5511  
    Iteration 3:   log likelihood =   -1487.39  
    Iteration 4:   log likelihood =   -1487.39  
    
    Logistic regression                                     Number of obs =  4,225
                                                            LR chi2(31)   = 416.16
                                                            Prob > chi2   = 0.0000
    Log likelihood = -1487.39                               Pseudo R2     = 0.1227
    
    ------------------------------------------------------------------------------------
            vaccinated | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -------------------+----------------------------------------------------------------
                  male |   1.194619   .1258765     1.69   0.091     .9717147    1.468657
                   age |   1.032606   .0068076     4.87   0.000     1.019349    1.046035
                       |
               race_rc |
             NH-White  |          1  (base)
             NH-Black  |   1.482211   .2115324     2.76   0.006     1.120551    1.960598
             NH-Other  |   .6913389   .1741085    -1.47   0.143     .4220095    1.132556
             Hispanic  |   2.474591   .5055626     4.43   0.000     1.658065     3.69322
                       |
                native |   .6683936   .1335477    -2.02   0.044      .451813    .9887939
                       |
               marstat |
              Married  |          1  (base)
          Sep/Divorce  |   .6733267   .0915843    -2.91   0.004     .5157598     .879031
              Widowed  |   .4842703   .0656398    -5.35   0.000     .3712896    .6316301
        Never-married  |   .8865486   .2507104    -0.43   0.670      .509317    1.543181
                       |
              numchild |   .9417637   .0249689    -2.26   0.024     .8940753    .9919957
              h14hhres |   .9002094   .0360627    -2.62   0.009     .8322315    .9737398
                 proxy |   .2817702    .082959    -4.30   0.000     .1582282    .5017721
                  work |   .9485741   .1070232    -0.47   0.640     .7603861    1.183337
             insurance |   1.623376   .3575462     2.20   0.028     1.054249    2.499741
                       |
          wealthquint1 |
                    1  |          1  (base)
                    2  |   1.166197   .1657533     1.08   0.279     .8826515    1.540829
                    3  |   1.137073   .1702812     0.86   0.391     .8478471    1.524963
                    4  |   1.289984   .2071102     1.59   0.113       .94172    1.767042
                    5  |   2.729461   .5490092     4.99   0.000     1.840198    4.048454
                       |
                cenreg |
                   NE  |          1  (base)
                   MW  |   .6457031   .1185107    -2.38   0.017     .4506149    .9252525
                South  |   .4683402    .080053    -4.44   0.000     .3350166    .6547215
                 West  |   .5105215   .0963674    -3.56   0.000     .3526461     .739076
                       |
         masks_toomuch |   .2888079    .030219   -11.87   0.000     .2352583    .3545465
             malechild |   .9438558   .0907005    -0.60   0.548     .7818237    1.139469
                       |
              reduccat |
                   HS  |          1  (base)
             Some Col  |   1.199936   .3153607     0.69   0.488     .7168853    2.008474
                 Col+  |   1.658133    .775249     1.08   0.279     .6632017    4.145653
                       |
               keducat |
                   HS  |          1  (base)
             Some Col  |   1.021394   .1725488     0.13   0.900     .7334933    1.422299
                 Col+  |    1.68241    .279711     3.13   0.002     1.214547    2.330501
                       |
      reduccat#keducat |
                HS#HS  |          1  (base)
          HS#Some Col  |          1  (base)
              HS#Col+  |          1  (base)
          Some Col#HS  |          1  (base)
    Some Col#Some Col  |   .8684592   .2829902    -0.43   0.665     .4585458    1.644812
        Some Col#Col+  |   .8416305   .2599964    -0.56   0.577      .459376    1.541965
              Col+#HS  |          1  (base)
        Col+#Some Col  |   .8178248   .4299102    -0.38   0.702     .2918802    2.291479
            Col+#Col+  |   .7731643    .382879    -0.52   0.603     .2929193    2.040778
                       |
                 _cons |   1.207807   .7189185     0.32   0.751     .3761315    3.878422
    ------------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    I have multiple omitted categories. I am attempting to interaction respondent's education (HS, some college, college) with their child's education (HS, some college, college). How do I interpret my coefficients? Is plotting necessary, if so, a margins plot?

    Part of my confusion is that when I run the code with only the interaction and no main effects, I receive output that I would expect. What is the correct approach and why is there a difference?

    Code:
    logit vaccinated male age ib0.race_rc native ib1.marstat numchild h14hhres proxy  work insurance  ib1.wealthquint1 ib0.cenreg masks_toomuch malechild  reduccat#keducat, or allbaselevels
    Code:
    . logit vaccinated male age ib0.race_rc native ib1.marstat numchild h14hhres proxy  work insurance  ib1.wealthquint1 ib0.cenreg masks_toomuch malechild  reduccat#keducat, or allbaselevels
    
    Iteration 0:   log likelihood = -1695.4695  
    Iteration 1:   log likelihood = -1512.5555  
    Iteration 2:   log likelihood = -1487.5511  
    Iteration 3:   log likelihood =   -1487.39  
    Iteration 4:   log likelihood =   -1487.39  
    
    Logistic regression                                     Number of obs =  4,225
                                                            LR chi2(31)   = 416.16
                                                            Prob > chi2   = 0.0000
    Log likelihood = -1487.39                               Pseudo R2     = 0.1227
    
    ------------------------------------------------------------------------------------
            vaccinated | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -------------------+----------------------------------------------------------------
                  male |   1.194619   .1258765     1.69   0.091     .9717147    1.468657
                   age |   1.032606   .0068076     4.87   0.000     1.019349    1.046035
                       |
               race_rc |
             NH-White  |          1  (base)
             NH-Black  |   1.482211   .2115324     2.76   0.006     1.120551    1.960598
             NH-Other  |   .6913389   .1741085    -1.47   0.143     .4220095    1.132556
             Hispanic  |   2.474591   .5055626     4.43   0.000     1.658065     3.69322
                       |
                native |   .6683936   .1335477    -2.02   0.044      .451813    .9887939
                       |
               marstat |
              Married  |          1  (base)
          Sep/Divorce  |   .6733267   .0915843    -2.91   0.004     .5157598     .879031
              Widowed  |   .4842703   .0656398    -5.35   0.000     .3712896    .6316301
        Never-married  |   .8865486   .2507104    -0.43   0.670      .509317    1.543181
                       |
              numchild |   .9417637   .0249689    -2.26   0.024     .8940753    .9919957
              h14hhres |   .9002094   .0360627    -2.62   0.009     .8322315    .9737398
                 proxy |   .2817702    .082959    -4.30   0.000     .1582282    .5017721
                  work |   .9485741   .1070232    -0.47   0.640     .7603861    1.183337
             insurance |   1.623376   .3575462     2.20   0.028     1.054249    2.499741
                       |
          wealthquint1 |
                    1  |          1  (base)
                    2  |   1.166197   .1657533     1.08   0.279     .8826515    1.540829
                    3  |   1.137073   .1702812     0.86   0.391     .8478471    1.524963
                    4  |   1.289984   .2071102     1.59   0.113       .94172    1.767042
                    5  |   2.729461   .5490092     4.99   0.000     1.840198    4.048454
                       |
                cenreg |
                   NE  |          1  (base)
                   MW  |   .6457031   .1185107    -2.38   0.017     .4506149    .9252525
                South  |   .4683402    .080053    -4.44   0.000     .3350166    .6547215
                 West  |   .5105215   .0963674    -3.56   0.000     .3526461     .739076
                       |
         masks_toomuch |   .2888079    .030219   -11.87   0.000     .2352583    .3545465
             malechild |   .9438558   .0907005    -0.60   0.548     .7818237    1.139469
                       |
      reduccat#keducat |
                HS#HS  |          1  (base)
          HS#Some Col  |   1.021394   .1725488     0.13   0.900     .7334933    1.422299
              HS#Col+  |    1.68241    .279711     3.13   0.002     1.214547    2.330501
          Some Col#HS  |   1.199936   .3153607     0.69   0.488     .7168853    2.008474
    Some Col#Some Col  |    1.06439   .2106292     0.32   0.753     .7222025     1.56871
        Some Col#Col+  |   1.699069   .3044057     2.96   0.003     1.195941    2.413863
              Col+#HS  |   1.658133    .775249     1.08   0.279     .6632017    4.145653
        Col+#Some Col  |   1.385074   .3465041     1.30   0.193     .8482579    2.261612
            Col+#Col+  |   2.156864   .4052279     4.09   0.000     1.492453     3.11706
                       |
                 _cons |   1.207807   .7189185     0.32   0.751     .3761315    3.878422
    ------------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.


    Example data

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(vaccinated male) float(age race_rc) double(native marstat) byte(numchild h14hhres) double proxy byte work float insurance byte(wealthquint1 cenreg masks_toomuch malechild)
    1 0 75 0 1 2 2 2 0 0 1 5 0 0 1
    1 0 74 0 1 2 1 1 0 0 1 5 0 0 0
    1 1 85 0 1 1 2 2 0 0 1 5 0 0 0
    1 0 78 0 1 1 2 2 0 0 1 5 0 0 0
    1 0 86 0 1 1 3 2 0 0 1 5 0 0 1
    1 0 86 1 1 4 2 1 0 0 1 1 0 0 1
    0 0 82 0 0 3 3 5 0 0 1 4 0 0 0
    1 0 83 2 0 1 3 2 0 0 1 4 3 1 0
    1 1 81 2 0 1 3 2 0 0 1 4 3 0 0
    1 1 81 0 1 1 1 2 0 0 1 4 3 0 1
    1 0 81 0 1 1 1 2 0 0 1 4 3 0 1
    1 0 81 1 0 3 4 1 0 0 1 3 0 0 1
    1 0 74 1 1 3 5 4 0 1 1 1 2 0 1
    1 1 83 3 1 1 . 2 0 0 1 4 3 0 1
    1 0 72 0 1 1 . 2 0 0 1 4 3 0 1
    1 1 85 0 1 1 4 6 0 0 1 4 3 1 0
    1 0 80 0 1 1 4 6 0 1 1 4 3 0 0
    1 1 89 0 0 1 5 2 0 0 1 5 3 1 1
    0 0 82 1 1 4 1 1 0 1 1 1 3 0 0
    1 0 82 0 1 3 3 2 0 0 1 3 2 0 1
    1 0 81 0 0 2 2 1 0 0 1 3 1 0 0
    1 0 87 0 1 3 4 1 0 0 1 5 1 0 1
    0 0 84 0 1 . . . . . . . . 0 0
    1 1 81 0 1 3 5 2 0 1 1 5 1 0 1
    0 0 91 0 1 3 3 5 1 0 1 4 1 0 0
    1 0 90 0 1 3 4 2 0 0 1 3 1 0 1
    1 1 87 1 1 1 4 2 0 1 1 3 3 0 1
    1 0 65 1 1 1 . . . . . . . 0 1
    1 1 84 0 1 1 3 2 0 0 1 2 2 0 1
    1 0 80 0 1 3 5 1 0 0 1 1 2 0 0
    1 0 89 0 1 1 4 2 0 0 1 4 1 0 1
    1 0 85 1 1 3 8 1 0 0 1 1 1 0 0
    1 0 82 1 0 1 1 3 0 0 1 2 1 0 0
    1 1 85 2 1 1 1 3 0 1 . 2 1 0 0
    1 0 80 0 1 1 5 2 0 0 1 2 1 1 0
    1 1 84 0 1 1 2 2 0 0 1 3 1 0 0
    1 0 78 0 1 1 2 2 0 0 1 3 1 0 0
    1 0 89 0 1 3 4 3 0 0 1 3 1 0 1
    1 0 80 0 1 2 1 1 0 0 1 3 1 1 0
    1 0 90 1 1 3 2 1 0 1 1 3 1 0 0
    1 0 76 1 1 3 3 1 0 0 1 1 1 0 0
    1 0 85 1 1 3 4 1 0 0 1 1 1 0 1
    1 0 83 1 1 3 2 1 0 0 1 1 1 0 0
    1 0 68 0 1 3 1 1 0 0 1 5 3 0 1
    1 1 84 0 1 1 2 2 0 0 . 4 3 . 1
    1 0 78 0 1 1 2 2 0 0 1 4 3 0 1
    0 0 84 0 1 3 7 2 0 0 1 5 3 1 1
    1 0 90 0 1 1 2 2 0 0 1 2 3 0 1
    1 1 90 0 1 1 2 2 0 0 1 2 3 0 1
    0 0 66 1 1 3 . 1 0 0 1 1 3 0 1
    1 1 81 1 1 1 4 2 0 0 1 4 3 0 1
    1 0 71 1 1 1 4 2 0 0 1 4 3 0 1
    1 0 65 1 1 2 2 1 0 1 1 1 2 0 1
    1 0 86 1 1 2 3 2 0 0 0 3 2 0 0
    1 0 86 0 1 3 3 1 0 0 1 3 2 0 0
    1 0 80 1 1 2 3 2 0 1 1 3 2 0 1
    1 0 90 0 1 1 2 2 0 0 1 3 2 0 1
    1 1 82 1 1 2 2 1 0 0 1 4 2 0 0
    1 0 83 0 1 3 3 3 0 0 1 3 2 0 0
    0 0 74 0 1 2 2 3 0 0 1 5 2 1 0
    1 1 87 0 1 1 3 2 0 0 1 3 2 1 1
    1 0 75 0 1 1 3 2 0 0 1 3 2 0 1
    1 1 80 1 1 3 1 1 0 0 1 2 2 0 1
    1 0 72 1 1 3 2 2 0 0 1 2 2 0 1
    1 0 80 0 1 . . . . . . . . 1 0
    1 0 79 1 1 3 3 1 0 0 1 3 2 0 0
    1 0 77 0 1 1 2 2 0 0 1 5 0 0 1
    1 0 80 0 1 2 2 1 0 0 1 5 0 0 0
    0 0 95 0 1 3 3 1 0 0 1 4 0 0 0
    1 0 75 3 0 3 5 1 0 0 1 1 0 0 0
    1 0 84 0 1 3 1 1 0 1 1 4 0 0 1
    1 1 82 3 0 1 1 2 0 0 1 3 2 0 0
    1 0 84 3 0 1 1 2 0 0 1 3 2 0 0
    1 1 90 0 1 3 4 6 0 0 1 4 3 0 1
    1 0 86 0 1 3 7 1 0 0 1 5 1 0 0
    1 0 78 0 1 . . . . . . . . 0 1
    1 0 77 0 1 3 4 2 0 0 1 5 1 0 1
    0 1 87 0 1 1 3 2 0 0 1 2 1 0 1
    0 0 87 0 1 1 3 2 0 0 1 2 1 0 1
    1 1 82 0 1 1 2 2 0 0 1 4 0 0 1
    1 0 80 0 1 1 2 2 0 0 1 4 0 . 1
    1 1 87 0 1 1 3 2 0 0 1 4 0 0 1
    1 0 86 0 1 1 3 2 0 0 1 4 0 0 1
    1 1 80 0 1 1 2 2 0 1 1 4 0 0 1
    1 0 77 0 1 1 2 2 0 0 1 4 0 0 1
    1 1 92 0 1 2 3 1 0 0 1 2 2 0 0
    1 0 87 0 1 3 2 2 0 0 1 3 0 0 1
    1 0 82 0 1 3 4 1 0 0 1 2 0 0 1
    1 1 80 0 1 1 4 2 0 0 1 4 0 0 1
    1 0 79 0 1 1 4 2 0 0 1 4 0 1 1
    1 1 82 0 1 1 2 2 0 0 1 5 2 0 0
    1 0 77 0 1 1 2 2 0 1 1 5 2 0 0
    1 0 93 1 1 . . . . . . . . 1 1
    1 0 82 1 1 3 6 4 0 0 1 2 2 0 1
    1 1 84 0 1 3 3 1 0 0 1 2 2 0 0
    1 0 76 0 1 3 3 2 0 0 1 5 0 0 1
    1 0 80 0 1 2 2 3 0 0 1 2 2 0 0
    1 1 90 1 1 . . . . . . . . 1 1
    1 1 83 0 1 1 3 2 0 0 1 5 1 0 0
    1 0 82 0 1 1 3 2 0 0 1 5 1 0 0
    end
    label values race_rc race
    label def race 0 "NH-White", modify
    label def race 1 "NH-Black", modify
    label def race 2 "NH-Other", modify
    label def race 3 "Hispanic", modify
    label values marstat mar
    label def mar 1 "Married", modify
    label def mar 2 "Sep/Divorce", modify
    label def mar 3 "Widowed", modify
    label def mar 4 "Never-married", modify
    label values cenreg cen
    label def cen 0 "NE", modify
    label def cen 1 "MW", modify
    label def cen 2 "South", modify
    label def cen 3 "West", modify


  • #2
    They are not "omitted", their combinations have already been expressed by the separate main effect ORs (Notice all of them have HH in it). It's just an illusion by how you require the base levels, try this three models, that should clear up your confusion:

    Code:
    sysuse nlsw88, clear
    logit married i.race##i.collgrad, nolog or
    logit married i.race##i.collgrad, nolog base or
    logit married i.race##i.collgrad, nolog allbaselevel or
    Results:

    Code:
    . logit married i.race##i.collgrad, nolog or
    
    Logistic regression                                     Number of obs =  2,246
                                                            LR chi2(5)    =  99.61
                                                            Prob > chi2   = 0.0000
    Log likelihood = -1415.1288                             Pseudo R2     = 0.0340
    
    -------------------------------------------------------------------------------------
                married | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    --------------------+----------------------------------------------------------------
                   race |
                 Black  |   .3603538   .0400423    -9.19   0.000     .2898305    .4480373
                 Other  |   .9883991   .5297951    -0.02   0.983     .3456821    2.826101
                        |
               collgrad |
          College grad  |   .8985446   .1101404    -0.87   0.383     .7066468    1.142554
                        |
          race#collgrad |
    Black#College grad  |   1.199904   .2994077     0.73   0.465      .735782    1.956788
    Other#College grad  |   .9274257   .8286632    -0.08   0.933     .1609618    5.343617
                        |
                  _cons |   2.428169   .1531286    14.07   0.000     2.145849    2.747632
    -------------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    
    . logit married i.race##i.collgrad, nolog base or
    
    Logistic regression                                     Number of obs =  2,246
                                                            LR chi2(5)    =  99.61
                                                            Prob > chi2   = 0.0000
    Log likelihood = -1415.1288                             Pseudo R2     = 0.0340
    
    -------------------------------------------------------------------------------------
                married | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    --------------------+----------------------------------------------------------------
                   race |
                 White  |          1  (base)
                 Black  |   .3603538   .0400423    -9.19   0.000     .2898305    .4480373
                 Other  |   .9883991   .5297951    -0.02   0.983     .3456821    2.826101
                        |
               collgrad |
      Not college grad  |          1  (base)
          College grad  |   .8985446   .1101404    -0.87   0.383     .7066468    1.142554
                        |
          race#collgrad |
    Black#College grad  |   1.199904   .2994077     0.73   0.465      .735782    1.956788
    Other#College grad  |   .9274257   .8286632    -0.08   0.933     .1609618    5.343617
                        |
                  _cons |   2.428169   .1531286    14.07   0.000     2.145849    2.747632
    -------------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    
    . logit married i.race##i.collgrad, nolog allbaselevel or
    
    Logistic regression                                     Number of obs =  2,246
                                                            LR chi2(5)    =  99.61
                                                            Prob > chi2   = 0.0000
    Log likelihood = -1415.1288                             Pseudo R2     = 0.0340
    
    -----------------------------------------------------------------------------------------
                    married | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    ------------------------+----------------------------------------------------------------
                       race |
                     White  |          1  (base)
                     Black  |   .3603538   .0400423    -9.19   0.000     .2898305    .4480373
                     Other  |   .9883991   .5297951    -0.02   0.983     .3456821    2.826101
                            |
                   collgrad |
          Not college grad  |          1  (base)
              College grad  |   .8985446   .1101404    -0.87   0.383     .7066468    1.142554
                            |
              race#collgrad |
    White#Not college grad  |          1  (base)
        White#College grad  |          1  (base)
    Black#Not college grad  |          1  (base)
        Black#College grad  |   1.199904   .2994077     0.73   0.465      .735782    1.956788
    Other#Not college grad  |          1  (base)
        Other#College grad  |   .9274257   .8286632    -0.08   0.933     .1609618    5.343617
                            |
                      _cons |   2.428169   .1531286    14.07   0.000     2.145849    2.747632
    -----------------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.

    Comment


    • #3
      Both approaches are correct and the differences between the results are apparent, but not real. The two models are algebraic transformations of each other, and if you know the right algebra, each set of results can be readily calculated from the other. The easiest way to see that you have two different ways of representing the same model here is to run -margins reduccat#keducat- after both, or the -predict- command. You will see that those results are the same (allowing for the possibility of differences in far decimal places due to rounding errors). I would have demonstrated that for you, but I cannot as your example data does not include the key variables reduccat and keducat.

      What you need to understand about these models is that in the first model, you have coefficients corresponding to reduccat and keducat separately, and these are substituting for the some of the terms that are marked out as base outcomes in your interaction output. If you count up the total number of non-base terms among reduccat, keducat and their interaction, you will see that it is exactly the same as the number of non-base terms in the interaction in the model that does not include reduccat and keducat separately. In fact, in the ## model, notice that the odds ratio for SomeCol in reduccat is 1.99936. Now look at the odds ratio of SomeCol#HS in the other model: it, too, is 1.99936. That is not a coincidence. Each of the separate odds ratios for a level of reduccat alone or keducat alone in the ## model is exactly equal to a corresponding interaction odds ratios in the # model--and that "same" interaction term is marked out as a base level in the ## model.

      Now, the other interaction odds ratios in the # model are not equal to the corresponding interaction odds ratios in the ## model. But, they are, in fact, certain products of odds ratios from the ## model. For example, look at Col+#Col+ in the # model: its odds ratio is 2.156864. Now look at Col+#Col+, and the separate Col+ odds ratios for reduccat and keducat in the ## model. Those are, .7731643, 1.6658133, and 1.68241, respectively. If you multiply those together you will see that the product is 2.1668555. There is a slight difference due to rounding errors along the way; if we were working with exact, rather than floating point, arithmetic, the results would be exactly equal. Similar relationships exist for all of the interaction terms in the # model: they are either equal to an odds ratio of reduccat or keducat alone, or are products of odds ratios in the ## model.

      If you plan to interpret the logistic regression results directly, those of the # model are easier to work with, as you do not need to figure out what to multiply with what, and the variables mean what they are called. By contrast, in the ## model, the results are a bit more opaque because the results that are said to be for reduccat and keducat are, in fact, not what they appear to be, but are actually only that restricted to each other's base categories.

      If you work, instead, with -margins- output, then you get the same results either way and don't have to worry about any of this.

      Added: Crossed with #2
      Last edited by Clyde Schechter; 18 Mar 2023, 18:20.

      Comment


      • #4
        Thanks Clyde, this makes a lot of sense. I went back through the data and it is now clear. I think my confusion stems from the fact that I've always been told to include both the product term and main effects in the model. Can I know use the # model and compare coefficients to one another? The other problem is that I prefer to impute this data. I learned about mimrgns on Statalist, but I am still not certain if I can use this command to also examine and graph the interactions.

        Comment


        • #5
          I think my confusion stems from the fact that I've always been told to include both the product term and main effects in the model.
          Correct, if "do these two variable interact?" is your research question. Essentially, for categorical-categorical interaction, it's just modeling the combinations with (k-1) dummies where k = total number of possible combinations. You have a 3x3 = 9, so totally 9-1 = 8 dummies. You can count in both approaches in #1, they both have 8 dummies to capture anything related to the two education variables.

          While overall models are the same, their associated p-values test different things. In the first model the p-value of, say, Col # Col tests if both having a college degree do anything extra above the individual contribution of parental edu and children edu.; while in the second model the p-values tests if the effect of Col # Col is different than HS # HS.

          That is to say, if plotting margins is all you want, then either way should be fine. But if you ever want to do a hypothesis test for the actual interaction, then your first approach (with ## that includes main effects) is preferred as it'd be easier to test it. To show that again with a built-in data set:

          The follow testparm statement tests if there is an interaction between race and college graduation:

          Code:
          sysuse nlsw88, clear
          logit married i.race##i.collgrad, nolog base or
          testparm race#collgrad
          The following testparm statement tests if the logit of married across all 2 x 3 = 6 combinations are the same. And this is NOT testing interaction:

          Code:
          logit married i.race#i.collgrad, nolog base or
          testparm race#collgrad
          Lastly, this equivalence reported in #1 is unique in categorical x categorical, if one of them in continuous, they will no longer be equivalent. Check out:

          Code:
          logit married c.age##i.collgrad
          logit married c.age#i.collgrad

          Comment


          • #6
            Hi all,

            Thanks for the replies, this really helps. One further question I have is how should I best test group differences? So I know from stratifying my model that children's education is only impacting vaccination behavior among respondents with an education of high school or less. Should I run the margins of my interaction term and then plot? Run average marginal effects?

            Comment


            • #7
              I would probably present these findings with
              Code:
              margins reduccat#keducat
              marginsplot
              margins reduccat, dydx(keducat)
              margins reduccat, dydx(keducat) pwcompare

              Comment


              • #8
                Thanks Clyde, this works perfectly! One additional question. When I run "margins reduccat, dydx(keducat)" I receive the following output:
                Code:
                 margins reduccat, dydx(keducat)
                
                Average marginal effects                                 Number of obs = 4,225
                Model VCE: OIM
                
                Expression: Pr(vaccinated), predict()
                dy/dx wrt:  1.keducat 2.keducat
                
                ------------------------------------------------------------------------------
                             |            Delta-method
                             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                0.keducat    |  (base outcome)
                -------------+----------------------------------------------------------------
                1.keducat    |
                    reduccat |
                         HS  |   .0027725   .0221331     0.13   0.900    -.0406076    .0461526
                   Some Col  |   -.014765   .0340187    -0.43   0.664    -.0814404    .0519104
                       Col+  |  -.0186124   .0498055    -0.37   0.709    -.1162293    .0790045
                -------------+----------------------------------------------------------------
                2.keducat    |
                    reduccat |
                         HS  |   .0592532   .0196187     3.02   0.003     .0208012    .0977053
                   Some Col  |   .0373556   .0306291     1.22   0.223    -.0226763    .0973875
                       Col+  |   .0235581   .0454043     0.52   0.604    -.0654327    .1125489
                ------------------------------------------------------------------------------
                When I run "margins reduccat, dydx(keducat) pwcompare" but add "effects" to see p-values I receive the following:
                Code:
                . margins reduccat, dydx(keducat) pwcompare( effects  )
                
                Pairwise comparisons of average marginal effects
                
                Model VCE: OIM                                           Number of obs = 4,225
                
                Expression: Pr(vaccinated), predict()
                dy/dx wrt:  1.keducat 2.keducat
                
                -----------------------------------------------------------------------------------
                                  |   Contrast Delta-method    Unadjusted           Unadjusted
                                  |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
                ------------------+----------------------------------------------------------------
                0.keducat         |  (base outcome)
                ------------------+----------------------------------------------------------------
                1.keducat         |
                         reduccat |
                  Some Col vs HS  |  -.0175375   .0404763    -0.43   0.665    -.0968697    .0617947
                      Col+ vs HS  |  -.0213849    .054413    -0.39   0.694    -.1280325    .0852627
                Col+ vs Some Col  |  -.0038474   .0602788    -0.06   0.949    -.1219918     .114297
                ------------------+----------------------------------------------------------------
                2.keducat         |
                         reduccat |
                  Some Col vs HS  |  -.0218976   .0355793    -0.62   0.538    -.0916318    .0478365
                      Col+ vs HS  |  -.0356951   .0490036    -0.73   0.466    -.1317404    .0603502
                Col+ vs Some Col  |  -.0137975   .0542168    -0.25   0.799    -.1200605    .0924655
                -----------------------------------------------------------------------------------
                Note: dy/dx for factor levels is the discrete change from the base level.
                I can interpret the first output, and there is significance in the category I expected. But I am having trouble with the second output, and I no longer see significance. Lastly, is the "margins" command showing just predicted probability while adding "dydx" showing average marginal effects? Thanks again for all the help!

                Comment


                • #9
                  The two results are looking at different things. The first set of results gives you the marginal effect of being in a given category of keducat, relative to the base category (HS), conditional on your value of reduccat.

                  But the second results give you the differences between marginal effects of being in a given category of keducat, relative to the base category, compared across all possible pairs of categories of reduccat.

                  So, the marginal effect of being in 2.keducat (vs 0.keducat) is about 0.06 if reduccat == "HS".

                  Now, if you want to contrast that with, say, the marginal effect of 2.keducat (vs 0.keducat) when reduccat == "Col+", you could look at this particular marginal effect in the first set of results (about 0.02) and subtract them, which would give you about 0.03. But easier is to look in the second table at the Col+ vs HS row under 2.keducat to find the exact same number, along with test statistics.

                  If you are thinking that the difference between a "statistically significant" number and a "not statistically significant" number should, itself, be "statistically significant," then that is just a widespread fallacy. It is one of the reasons that a substantial segment of the statistical community (including me) has largely or totally abandoned the use of the concept of statistical significance altogether. Be that as it may, there is no reason to expect such a difference to be "statistically significant"; sometimes it is, and sometimes it isn't.

                  If you are interested in more information about abandoning the concept of statistical significance, see https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr.

                  Comment


                  • #10
                    Very interesting, I will take a look. Thanks for all of the help. This solved all of my original problems.

                    Comment


                    • #11
                      As a follow up, can I do this with imputed data?
                      For example, this is the code before imputation.
                      Code:
                      margins reduccat, dydx(keducat)
                      Can I use mimrgns to do the following? The output and everything seems similar, but are these reliable estimates?

                      Code:
                      mimrgns reduccat, dydx(keducat)

                      Comment

                      Working...
                      X