Different ways of specifying interaction lead to different AMEs which is causing issues with predictnl

Sarah Thea Smith

Join Date: Mar 2023

Posts: 14
#1

Different ways of specifying interaction lead to different AMEs which is causing issues with predictnl

16 Mar 2023, 07:16

I am currently in the process of figuring out the predictnl command. I am trying to present the results of a logistic regression in AME format in my regression table, and because my regression includes an interaction term, I will need the predictnl command to be able to calculate an AME "coefficient" and standard error for the interaction term in the table. In trying to do so, I have run into an issue regarding different ways of specifying interactions leading to different AMEs, however.

To illustrate my question/dilemma, I have created the following example. As you can see, all variables in question are dummy variables. I am using StataSE version 16.1.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(outcome dummy1 dummy2) 0 0 0 . 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 . 0 0 1 0 0 0 . . 1 0 0 0 . . 0 1 . 1 . . 0 . 1 0 0 0 1 0 1 1 0 0 . 0 0 . 0 1 0 0 0 1 1 1 0 1 1 . 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 . 0 . . 0 0 . 1 . 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 . 0 0 0 0 0 1 . . 1 1 1 1 . 0 0 0 0 1 0 0 1 0 0 1 1 . 1 0 1 0 0 . 1 . . . . . 0 1 0 0 0 0 0 0 0 0 0 0 1 . 0 1 0 . 0 0 1 0 1 0 0 0 0 1 0 0 . 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 1 1 0 0 1 0 0 . 1 . . . 0 0 0 0 0 0 0 . 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 . 0 1 0 0 0 1 . . 1 0 1 . . 0 1 1 0 0 1 1 . 1 0 0 . . 1 0 0 1 0 1 0 end label values outcome _statalist label values dummy1 _statalist label values dummy2 _statalist label def _statalist 0 "0", modify label def _statalist 1 "1", modify

As far as I can tell, in order to use predictnl, I need to create a new variable for my interaction:

Code:

gen interaction = dummy1*dummy2

Doing so does not change the logistic regression output when I compare models run with this "interaction" variable to models run specifying the interaction with # or ##. However, and this is where I am very confused at this point, I have noticed that different ways of specifying the interaction lead to different AMEs being predicted (I have added the value Stata predicts after each margins command below):

Code:

//SPECIFICATION 1 - using "interaction" variable WITHOUT "i" in front of dummies logit outcome dummy1 dummy2 interaction margins, dydx(dummy1) at (dummy2=1) // .1562664 //SPECIFICATION 2 - using "interaction" variable WITH "i" in front of dummies logit outcome i.dummy1 i.dummy2 interaction margins, dydx(dummy1) at (dummy2=1) //.195811 //SPECIFICATION 3 - using ## logit outcome i.dummy1##i.dummy2 margins, dydx(dummy1) at (dummy2=1) //.2918876 //SPECIFICATION 4 - using # logit outcome i.dummy1 i.dummy2 i.dummy1#i.dummy2 margins, dydx(dummy1) at (dummy2=1) //.2918876

In class, I was taught that all of these ways of specifying the interaction should lead to the same results and they do when it comes to the log-odds predictions, but as you can see, the AMEs are different when I use the "interaction" variable (specifications 1 and 2), and also change depending on whether I add "i." in front of the dummy variables when using the interaction variable. I am inclined to trust the results of specifications 3 and 4 most, as this is how I have usually seen interactions be specified here, and because they yield the same results, but this presents me with the issue that I have to use specification 1 to be able to use predictnl in the way I want:

Code:

logit outcome dummy1 dummy2 interaction predictnl phat = (_b[dummy1] + _b[interaction]) * /// (1/(1+exp(- (_b[_cons] + _b[dummy1]* dummy1 + _b[dummy2] + _b[interaction]*dummy1))))* /// (1 - (1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1 + _b[dummy2] + _b[interaction]*dummy1))))) /// - _b[dummy1]*(1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1))))* /// (1-(1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1))))), se(phat_se) sum phat*

(Code taken from page 272 of Karaca-Mandic, P., Norton, E. C. and Dowd, B. (2012). Interaction terms in nonlinear models. Health Services Research, 47, 255–274.)

My questions are therefore (i) why do the margin command results differ when using these different interaction specifications, (ii) which AMEs should I "trust" out of the ones yielded from the 4 specifications above, and (ii) how can I use predictnl while still getting accurate results or, alternatively, how can I calculate an AME "coefficient" and standard error for the interaction term in a different way?

Any help would be much appreciated, thank you!

Last edited by Sarah Thea Smith; 16 Mar 2023, 07:19. Reason: Added tags
Tags: interaction, margins, predictnl, syntax
Andrew Musau

Join Date: Oct 2014

Posts: 10275
#2

16 Mar 2023, 09:24

If you create the interaction manually, margins will get the computation wrong because it does not know that the interaction term is related to the dummies. So #1 and #2 are incorrect. #3 and #4 are equivalent syntaxes.
Comment
Sarah Thea Smith

Join Date: Mar 2023

Posts: 14
#3

16 Mar 2023, 09:42

Thank you for your response!

It seems as if both inteff and predictnl don't work unless I create the interaction manually, although both commands can be used for dummy#dummy interactions as far as I understand. Does anyone know how I can use either of those commands with my variables of interest?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10275
#4

16 Mar 2023, 09:48

I do not see why not. What you call _b[interaction], i.e., the coefficient on the interaction term can be referenced. See the -coeflegend- option on how to do so.

Code:

logit outcome i.dummy1##i.dummy2, coeflegend

Otherwise, explain why you think predictnl won't work.
Comment

Sarah Thea Smith

Join Date: Mar 2023
Posts: 14

16 Mar 2023, 10:56

Thank you for your help, I didn't know about the coeflegend command before, and will certainly use it often in future.

Using the syntax you suggest in fact yields the same results as using the manual interaction variable when used in the predictnl code suggested by Karaca-Mandic, Norton and Dowd (2012), which surprised me given the above:

Code:

//USING COEFLEGEND 

logit outcome i.dummy1##i.dummy2, coeflegend


predictnl phat = (_b[1.dummy1] + _b[1.dummy1#1.dummy2]) * ///
                    (1/(1+exp(- (_b[_cons] + _b[1.dummy1]* 1.dummy1 + _b[1.dummy2] + _b[1.dummy1#1.dummy2]*1.dummy1))))* ///
                    (1 - (1/(1+exp(-(_b[_cons]+_b[1.dummy1]*1.dummy1 + _b[1.dummy2] + _b[1.dummy1#1.dummy2]*1.dummy1))))) ///
                    - _b[1.dummy1]*(1/(1+exp(-(_b[_cons]+_b[1.dummy1]*1.dummy1))))* ///
                    (1-(1/(1+exp(-(_b[_cons]+_b[1.dummy1]*1.dummy1))))), se(phat_se)
                    
                    
                    sum phat* // phat = -.0992968 / phat_se = .0538029
 
/*
Variable       Obs         Mean      Std. Dev.        Min          Max
                    
phat          1,518    -.0992968    .092123    -.1569106    .0479085
phat_se    1,518    .0538029    .0489037    .0232185    .1319472
*/                   

//USING MANUAL INTERACTION

gen interaction= dummy1*dummy2     
                
logit outcome dummy1 dummy2 interaction         
            
predictnl phat2 = (_b[dummy1] + _b[interaction]) * ///
                    (1/(1+exp(- (_b[_cons] + _b[dummy1]* dummy1 + _b[dummy2] + _b[interaction]*dummy1))))* ///
                    (1 - (1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1 + _b[dummy2] + _b[interaction]*dummy1))))) ///
                    - _b[dummy1]*(1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1))))* ///
                    (1-(1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1))))), se(phat2_se)


                    sum phat2*

/* 
Variable       Obs         Mean      Std. Dev.        Min          Max
                    
phat2          1,518    -.0992968    .092123    -.1569106    .0479085
phat2_se    1,518    .0538029    .0489037    .0232185    .1319472
*/

Unfortunately, I still seem to only be able to use inteff using the manual dummy variable (I have tried all kinds of other combinations but always get error messages):

Code:

logit outcome dummy1 dummy2 interaction 

                        
inteff outcome dummy1 dummy2 interaction 

/*
     Variable |        Obs        Mean    Std. Dev.       Min        Max
 - ------------+---------------------------------------------------------
    _logit_ie |      1,151   -.1516624           0  -.1516624  -.1516624
   _logit_se |      1,151    .0656353           0   .0656353   .0656353
     _logit_z |      1,151   -2.310684           0  -2.310684  -2.310684
*/

But this way of specifying the command clearly leads to incorrect results, as the mean, min and max are identical for each of the estimated values, and the standard deviation is 0. The estimates also differ to the predictnl results. I assume all of this is likely due to the inclusion of the manual interaction term variable.

Having read up on the two commands some more, inteff is probably better for my purposes considering my real models contain several control variables, and the predictnl command above confuses me as is (mathematical formulas aren't my strong suit). So if anyone has any idea how I may be able to use inteff while interacting two dummy variables I would love to hear them!

Thank you again!

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10275

16 Mar 2023, 16:48

I think you misunderstand the output of inteff (Stata Journal). That is a really old command and the authors programmed it to calculate the statistics for each observation.

Description

The new command inteff calculates the interaction effect, standard error, and z-statistic for each observation for either logit or probit when two variables
have been interacted. The interacted variables cannot have higher order terms, such as squared terms. The command is designed to be run immediately after
fitting a logit or probit model.

So the interaction effect is given by _logit_ie, its standard error is given by _logit_se and the z-statistic is given by _logit_z. You do not need it with the introduction of margins. Here is how you can recreate its output with margins.

Code:

*CREATE A FAKE DATASET
clear
set obs 1500
set seed 03162023
gen outcome= runiformint(0,1)
gen dummy1= rnormal(1,5)<0
gen dummy2= rnormal(1,2)<0
gen interaction= dummy1*dummy2

*LOGIT + INTEFF
logit outcome dummy1 dummy2 interaction
inteff outcome dummy1 dummy2 interaction

*NOW WITH MARGINS AND FACTOR VARIABLES
logit outcome i.dummy1##i.dummy2
margins dummy2 , dydx(dummy1) pwcompare

Res.:

Code:

*LOGIT + INTEFF

. inteff outcome dummy1 dummy2 interaction
Logit with two dummy variables interacted
(0 observations deleted)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   _logit_ie |      1,500    .0151134           0   .0151134   .0151134
   _logit_se |      1,500    .0573764           0   .0573764   .0573764
    _logit_z |      1,500    .2634089           0   .2634089   .2634089
 
.
.
. *NOW WITH MARGINS AND FACTOR VARIABLES

.
. margins dummy2 , dydx(dummy1) pwcompare

Pairwise comparisons of conditional marginal effects

Model VCE    : OIM                              Number of obs     =      1,500

Expression   : Pr(outcome), predict()
dy/dx w.r.t. : 1.dummy1

--------------------------------------------------------------
             |   Contrast Delta-method         Unadjusted
             |      dy/dx   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
0.dummy1     |  (base outcome)
-------------+------------------------------------------------
1.dummy1     |
      dummy2 |
     1 vs 0  |   .0151135   .0573764     -.0973421     .127569
--------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the
      base level.

Last edited by Andrew Musau; 16 Mar 2023, 16:51.

Comment

Sarah Thea Smith

Join Date: Mar 2023

Posts: 14
#7

17 Mar 2023, 03:57

Thank you so much, that has just saved me a lot of time, I really appreciate it.
Comment
Paul Mao

Join Date: Dec 2022

Posts: 6
#8

02 Sep 2023, 15:19

Originally posted by Andrew Musau View Post

I think you misunderstand the output of inteff (Stata Journal). That is a really old command and the authors programmed it to calculate the statistics for each observation.

So the interaction effect is given by _logit_ie, its standard error is given by _logit_se and the z-statistic is given by _logit_z. You do not need it with the introduction of margins. Here is how you can recreate its output with margins.

Andrew, can the same logic be used for categorical by continuous interaction?

*LOGIT + INTEFF
logit outcome dummy1 continuous 1 interaction
inteff outcome dummy1 continuous 1 interaction

*NOW WITH MARGINS logit outcome i.dummy1##c.continuous1 margins dummy1, at(continuous1=(a b c d e)) Would the z statistics and confidence intervals obtained with margins for the continuous values a, b, c, d, e be the same as the ones obtained with the inteff command for these observations? I am trying to understand if we can use the margins command to show the statistical significance of the interaction effect at different values of the continuous variable.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10275

03 Sep 2023, 10:51

Originally posted by Paul Mao View Post

*NOW WITH MARGINS logit outcome i.dummy1##c.continuous1 margins dummy1, at(continuous1=(a b c d e)) Would the z statistics and confidence intervals obtained with margins for the continuous values a, b, c, d, e be the same as the ones obtained with the inteff command for these observations?

Correct. You can similarly extend the example in #6 to show this.

Code:

*CREATE A FAKE DATASET
clear
set obs 1500
set seed 09032023
gen outcome= runiformint(0,1)
gen dummy= rnormal(1,5)<0
gen contvar= runiformint(1,7)
gen interaction= contvar*dummy
*LOGIT + INTEFF
logit outcome contvar dummy interaction
inteff outcome contvar dummy interaction if contvar==3

*NOW WITH MARGINS AND FACTOR VARIABLES
logit outcome c.contvar##i.dummy
margins dummy, dydx(contvar) at(contvar==3) pwcompare

Res.:

Code:

*INTEFF
. inteff outcome contvar dummy interaction if contvar==3
Logit with one continuous and one dummy variable interacted

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   _logit_ie |        214     .018865           0    .018865    .018865
   _logit_se |        214    .0130807           0   .0130807   .0130807
    _logit_z |        214    1.442197           0   1.442197   1.442197

*MARGINS

. margins dummy, dydx(contvar) at(contvar==3) pwcompare

Pairwise comparisons of conditional marginal effects

Model VCE: OIM                                           Number of obs = 1,500

Expression: Pr(outcome), predict()
dy/dx wrt:  contvar
At: contvar = 3

--------------------------------------------------------------
             |   Contrast Delta-method         Unadjusted
             |      dy/dx   std. err.     [95% conf. interval]
-------------+------------------------------------------------
contvar      |
       dummy |
     1 vs 0  |    .018865   .0130807     -.0067728    .0445028
--------------------------------------------------------------

.

Comment

Paul Mao

Join Date: Dec 2022

Posts: 6
#10

03 Sep 2023, 19:37

Thank you Andrew. It is very clear.
Comment

Announcement