Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Different ways of specifying interaction lead to different AMEs which is causing issues with predictnl

    I am currently in the process of figuring out the predictnl command. I am trying to present the results of a logistic regression in AME format in my regression table, and because my regression includes an interaction term, I will need the predictnl command to be able to calculate an AME "coefficient" and standard error for the interaction term in the table. In trying to do so, I have run into an issue regarding different ways of specifying interactions leading to different AMEs, however.

    To illustrate my question/dilemma, I have created the following example. As you can see, all variables in question are dummy variables. I am using StataSE version 16.1.

    Code:
    * Example generated by -dataex-. For    more    info,    type    help    dataex
    clear
    input float(outcome dummy1 dummy2)
    0 0 0
    . 0 0
    0 1 0
    0 0 0
    0 0 1
    0 0 1
    0 0 1
    0 0 1
    0 1 .
    0 0 1
    0 0 0
    . . 1
    0 0 0
    . . 0
    1 . 1
    . . 0
    . 1 0
    0 0 1
    0 1 1
    0 0 .
    0 0 .
    0 1 0
    0 0 1
    1 1 0
    1 1 .
    0 0 0
    1 1 0
    0 1 0
    0 0 0
    0 0 0
    0 1 0
    0 0 1
    0 0 1
    0 0 1
    0 . 0
    . . 0
    0 . 1
    . 0 0
    0 0 1
    1 0 0
    0 0 0
    0 1 0
    0 1 .
    0 0 0
    0 0 1
    . . 1
    1 1 1
    . 0 0
    0 0 1
    0 0 1
    0 0 1
    1 . 1
    0 1 0
    0 . 1
    . . .
    . . 0
    1 0 0
    0 0 0
    0 0 0
    0 0 1
    . 0 1
    0 . 0
    0 1 0
    1 0 0
    0 0 1
    0 0 .
    0 1 0
    0 0 0
    0 1 0
    0 0 1
    0 0 0
    0 0 1
    0 0 1
    0 0 1
    1 1 0
    0 1 0
    0 . 1
    . . .
    0 0 0
    0 0 0
    0 . 1
    0 0 1
    0 0 0
    0 0 0
    0 0 1
    0 0 0
    0 0 1
    0 0 0
    0 0 .
    0 1 0
    0 0 1
    . . 1
    0 1 .
    . 0 1
    1 0 0
    1 1 .
    1 0 0
    . . 1
    0 0 1
    0 1 0
    end
    label values outcome _statalist
    label values dummy1 _statalist
    label values dummy2 _statalist
    label def _statalist 0 "0", modify
    label def _statalist 1 "1", modify
    As far as I can tell, in order to use predictnl, I need to create a new variable for my interaction:

    Code:
     gen interaction = dummy1*dummy2
    Doing so does not change the logistic regression output when I compare models run with this "interaction" variable to models run specifying the interaction with # or ##. However, and this is where I am very confused at this point, I have noticed that different ways of specifying the interaction lead to different AMEs being predicted (I have added the value Stata predicts after each margins command below):

    Code:
    //SPECIFICATION 1 - using "interaction" variable WITHOUT "i" in front of dummies
    logit outcome dummy1 dummy2 interaction
            margins, dydx(dummy1) at (dummy2=1) // .1562664
    
    //SPECIFICATION 2 - using "interaction" variable WITH "i" in front of dummies
    logit outcome i.dummy1 i.dummy2 interaction
            margins, dydx(dummy1) at (dummy2=1) //.195811
            
    //SPECIFICATION 3 - using ##
    logit outcome i.dummy1##i.dummy2
            margins, dydx(dummy1) at (dummy2=1) //.2918876
            
    //SPECIFICATION 4 - using #
    logit outcome i.dummy1 i.dummy2 i.dummy1#i.dummy2
            margins, dydx(dummy1) at (dummy2=1) //.2918876
    In class, I was taught that all of these ways of specifying the interaction should lead to the same results and they do when it comes to the log-odds predictions, but as you can see, the AMEs are different when I use the "interaction" variable (specifications 1 and 2), and also change depending on whether I add "i." in front of the dummy variables when using the interaction variable. I am inclined to trust the results of specifications 3 and 4 most, as this is how I have usually seen interactions be specified here, and because they yield the same results, but this presents me with the issue that I have to use specification 1 to be able to use predictnl in the way I want:

    Code:
        logit outcome dummy1 dummy2 interaction
    
    predictnl phat = (_b[dummy1] + _b[interaction]) * ///
                        (1/(1+exp(- (_b[_cons] + _b[dummy1]* dummy1 + _b[dummy2] + _b[interaction]*dummy1))))* ///
                        (1 - (1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1 + _b[dummy2] + _b[interaction]*dummy1))))) ///
                        - _b[dummy1]*(1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1))))* ///
                        (1-(1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1))))), se(phat_se)
        
                    sum phat*
    (Code taken from page 272 of Karaca-Mandic, P., Norton, E. C. and Dowd, B. (2012). Interaction terms in nonlinear models. Health Services Research, 47, 255–274.)

    My questions are therefore (i) why do the margin command results differ when using these different interaction specifications, (ii) which AMEs should I "trust" out of the ones yielded from the 4 specifications above, and (ii) how can I use predictnl while still getting accurate results or, alternatively, how can I calculate an AME "coefficient" and standard error for the interaction term in a different way?

    Any help would be much appreciated, thank you!

    Last edited by Sarah Thea Smith; 16 Mar 2023, 07:19. Reason: Added tags

  • #2
    If you create the interaction manually, margins will get the computation wrong because it does not know that the interaction term is related to the dummies. So #1 and #2 are incorrect. #3 and #4 are equivalent syntaxes.

    Comment


    • #3
      Thank you for your response!

      It seems as if both inteff and predictnl don't work unless I create the interaction manually, although both commands can be used for dummy#dummy interactions as far as I understand. Does anyone know how I can use either of those commands with my variables of interest?

      Comment


      • #4
        I do not see why not. What you call _b[interaction], i.e., the coefficient on the interaction term can be referenced. See the -coeflegend- option on how to do so.

        Code:
        logit outcome i.dummy1##i.dummy2, coeflegend
        Otherwise, explain why you think predictnl won't work.

        Comment


        • #5
          Thank you for your help, I didn't know about the coeflegend command before, and will certainly use it often in future.

          Using the syntax you suggest in fact yields the same results as using the manual interaction variable when used in the predictnl code suggested by Karaca-Mandic, Norton and Dowd (2012), which surprised me given the above:

          Code:
          //USING COEFLEGEND 
          
          logit outcome i.dummy1##i.dummy2, coeflegend
          
          
          predictnl phat = (_b[1.dummy1] + _b[1.dummy1#1.dummy2]) * ///
                              (1/(1+exp(- (_b[_cons] + _b[1.dummy1]* 1.dummy1 + _b[1.dummy2] + _b[1.dummy1#1.dummy2]*1.dummy1))))* ///
                              (1 - (1/(1+exp(-(_b[_cons]+_b[1.dummy1]*1.dummy1 + _b[1.dummy2] + _b[1.dummy1#1.dummy2]*1.dummy1))))) ///
                              - _b[1.dummy1]*(1/(1+exp(-(_b[_cons]+_b[1.dummy1]*1.dummy1))))* ///
                              (1-(1/(1+exp(-(_b[_cons]+_b[1.dummy1]*1.dummy1))))), se(phat_se)
                              
                              
                              sum phat* // phat = -.0992968 / phat_se = .0538029
           
          /*
          Variable       Obs         Mean      Std. Dev.        Min          Max
                              
          phat          1,518    -.0992968    .092123    -.1569106    .0479085
          phat_se    1,518    .0538029    .0489037    .0232185    .1319472
          */                   
          
          //USING MANUAL INTERACTION
          
          gen interaction= dummy1*dummy2     
                          
          logit outcome dummy1 dummy2 interaction         
                      
          predictnl phat2 = (_b[dummy1] + _b[interaction]) * ///
                              (1/(1+exp(- (_b[_cons] + _b[dummy1]* dummy1 + _b[dummy2] + _b[interaction]*dummy1))))* ///
                              (1 - (1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1 + _b[dummy2] + _b[interaction]*dummy1))))) ///
                              - _b[dummy1]*(1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1))))* ///
                              (1-(1/(1+exp(-(_b[_cons]+_b[dummy1]*dummy1))))), se(phat2_se)
          
          
                              sum phat2*
          
          /* 
          ​​​​​​​Variable       Obs         Mean      Std. Dev.        Min          Max
                              
          phat2          1,518    -.0992968    .092123    -.1569106    .0479085
          phat2_se    1,518    .0538029    .0489037    .0232185    .1319472
          */
          Unfortunately, I still seem to only be able to use inteff using the manual dummy variable (I have tried all kinds of other combinations but always get error messages):

          Code:
          logit outcome dummy1 dummy2 interaction 
          
                                  
          inteff outcome dummy1 dummy2 interaction 
          
          /*
               Variable |        Obs        Mean    Std. Dev.       Min        Max
           - ------------+---------------------------------------------------------
              _logit_ie |      1,151   -.1516624           0  -.1516624  -.1516624
             _logit_se |      1,151    .0656353           0   .0656353   .0656353
               _logit_z |      1,151   -2.310684           0  -2.310684  -2.310684
          */
          But this way of specifying the command clearly leads to incorrect results, as the mean, min and max are identical for each of the estimated values, and the standard deviation is 0. The estimates also differ to the predictnl results. I assume all of this is likely due to the inclusion of the manual interaction term variable.

          Having read up on the two commands some more, inteff is probably better for my purposes considering my real models contain several control variables, and the predictnl command above confuses me as is (mathematical formulas aren't my strong suit). So if anyone has any idea how I may be able to use inteff while interacting two dummy variables I would love to hear them!

          ​​​​​​​Thank you again!

          Comment


          • #6
            I think you misunderstand the output of inteff (Stata Journal). That is a really old command and the authors programmed it to calculate the statistics for each observation.


            Description

            The new command inteff calculates the interaction effect, standard error, and z-statistic for each observation for either logit or probit when two variables
            have been interacted. The interacted variables cannot have higher order terms, such as squared terms. The command is designed to be run immediately after
            fitting a logit or probit model.
            So the interaction effect is given by _logit_ie, its standard error is given by _logit_se and the z-statistic is given by _logit_z. You do not need it with the introduction of margins. Here is how you can recreate its output with margins.


            Code:
            *CREATE A FAKE DATASET
            clear
            set obs 1500
            set seed 03162023
            gen outcome= runiformint(0,1)
            gen dummy1= rnormal(1,5)<0
            gen dummy2= rnormal(1,2)<0
            gen interaction= dummy1*dummy2
            
            *LOGIT + INTEFF
            logit outcome dummy1 dummy2 interaction
            inteff outcome dummy1 dummy2 interaction
            
            *NOW WITH MARGINS AND FACTOR VARIABLES
            logit outcome i.dummy1##i.dummy2
            margins dummy2 , dydx(dummy1) pwcompare
            Res.:

            Code:
            *LOGIT + INTEFF
            
            . inteff outcome dummy1 dummy2 interaction
            Logit with two dummy variables interacted
            (0 observations deleted)
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
               _logit_ie |      1,500    .0151134           0   .0151134   .0151134
               _logit_se |      1,500    .0573764           0   .0573764   .0573764
                _logit_z |      1,500    .2634089           0   .2634089   .2634089
             
            .
            .
            . *NOW WITH MARGINS AND FACTOR VARIABLES
            
            .
            . margins dummy2 , dydx(dummy1) pwcompare
            
            Pairwise comparisons of conditional marginal effects
            
            Model VCE    : OIM                              Number of obs     =      1,500
            
            Expression   : Pr(outcome), predict()
            dy/dx w.r.t. : 1.dummy1
            
            --------------------------------------------------------------
                         |   Contrast Delta-method         Unadjusted
                         |      dy/dx   Std. Err.     [95% Conf. Interval]
            -------------+------------------------------------------------
            0.dummy1     |  (base outcome)
            -------------+------------------------------------------------
            1.dummy1     |
                  dummy2 |
                 1 vs 0  |   .0151135   .0573764     -.0973421     .127569
            --------------------------------------------------------------
            Note: dy/dx for factor levels is the discrete change from the
                  base level.
            Last edited by Andrew Musau; 16 Mar 2023, 16:51.

            Comment


            • #7
              Thank you so much, that has just saved me a lot of time, I really appreciate it.

              Comment


              • #8
                Originally posted by Andrew Musau View Post
                I think you misunderstand the output of inteff (Stata Journal). That is a really old command and the authors programmed it to calculate the statistics for each observation.



                So the interaction effect is given by _logit_ie, its standard error is given by _logit_se and the z-statistic is given by _logit_z. You do not need it with the introduction of margins. Here is how you can recreate its output with margins.
                Andrew, can the same logic be used for categorical by continuous interaction?

                *LOGIT + INTEFF
                logit outcome dummy1 continuous 1 interaction
                inteff outcome dummy1 continuous 1 interaction


                *NOW WITH MARGINS logit outcome i.dummy1##c.continuous1 margins dummy1, at(continuous1=(a b c d e)) Would the z statistics and confidence intervals obtained with margins for the continuous values a, b, c, d, e be the same as the ones obtained with the inteff command for these observations? I am trying to understand if we can use the margins command to show the statistical significance of the interaction effect at different values of the continuous variable.

                Comment


                • #9
                  Originally posted by Paul Mao View Post

                  *NOW WITH MARGINS logit outcome i.dummy1##c.continuous1 margins dummy1, at(continuous1=(a b c d e)) Would the z statistics and confidence intervals obtained with margins for the continuous values a, b, c, d, e be the same as the ones obtained with the inteff command for these observations?
                  Correct. You can similarly extend the example in #6 to show this.

                  Code:
                  *CREATE A FAKE DATASET
                  clear
                  set obs 1500
                  set seed 09032023
                  gen outcome= runiformint(0,1)
                  gen dummy= rnormal(1,5)<0
                  gen contvar= runiformint(1,7)
                  gen interaction= contvar*dummy
                  *LOGIT + INTEFF
                  logit outcome contvar dummy interaction
                  inteff outcome contvar dummy interaction if contvar==3
                  
                  *NOW WITH MARGINS AND FACTOR VARIABLES
                  logit outcome c.contvar##i.dummy
                  margins dummy, dydx(contvar) at(contvar==3) pwcompare
                  Res.:

                  Code:
                  *INTEFF
                  . inteff outcome contvar dummy interaction if contvar==3
                  Logit with one continuous and one dummy variable interacted
                  
                      Variable |        Obs        Mean    Std. dev.       Min        Max
                  -------------+---------------------------------------------------------
                     _logit_ie |        214     .018865           0    .018865    .018865
                     _logit_se |        214    .0130807           0   .0130807   .0130807
                      _logit_z |        214    1.442197           0   1.442197   1.442197
                  
                  *MARGINS
                  
                  . margins dummy, dydx(contvar) at(contvar==3) pwcompare
                  
                  Pairwise comparisons of conditional marginal effects
                  
                  Model VCE: OIM                                           Number of obs = 1,500
                  
                  Expression: Pr(outcome), predict()
                  dy/dx wrt:  contvar
                  At: contvar = 3
                  
                  --------------------------------------------------------------
                               |   Contrast Delta-method         Unadjusted
                               |      dy/dx   std. err.     [95% conf. interval]
                  -------------+------------------------------------------------
                  contvar      |
                         dummy |
                       1 vs 0  |    .018865   .0130807     -.0067728    .0445028
                  --------------------------------------------------------------
                  
                  .

                  Comment


                  • #10
                    Thank you Andrew. It is very clear.

                    Comment

                    Working...
                    X