Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Continuous interaction variables and margins

    Hi all,

    I've got an analysis set-up that includes several interaction variables, all of which are continuous but some of which are bounded (i.e. one variable is between 0 and 1). I've been searching for a way in stata to garner marginal effects, but the "margins" command has been throwing up categorical/factor blocks left and right. I've searched high and low and cannot find if the margins command can be used to analyze marginal effects of two continuous variables.

    To give a sense of what I'm doing, I have institutional variables and production of good X interacted, and I want to show that the marginal effects of my Y variable change in the presence of the institutional variable and the production of good X. That is, does the institutional variable change the effect that good X and my Y variable show and by how much.

    My question is, how can one ascertain marginal effects using stata for an interaction variable containing two continuous variables? And does the fact of bounding one of the variables change the interaction?

  • #2
    Chistopher.
    as per FAQ, please post what you typed and what Stata gave you back. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      If you have continuous variables, you need to run the interactions with c. before the variables. Once you do that, it should work just fine. Bounds on a variable don't matter to Stata (but might be important for interpretation). If c. doesn't fix everything, then provide what you typed what Stata gave back.

      Comment


      • #4
        Something like this?

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . regress price c.mpg##c.headroom
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(3, 70)        =      7.14
               Model |   148844287         3  49614762.3   Prob > F        =    0.0003
            Residual |   486221109        70  6946015.85   R-squared       =    0.2344
        -------------+----------------------------------   Adj R-squared   =    0.2016
               Total |   635065396        73  8699525.97   Root MSE        =    2635.5
        
        ----------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -----------------+----------------------------------------------------------------
                     mpg |  -42.41371   273.6704    -0.15   0.877     -588.232    503.4046
                headroom |   1224.462   1963.955     0.62   0.535    -2692.523    5141.446
                         |
        c.mpg#c.headroom |    -76.377   94.22537    -0.81   0.420    -264.3036    111.5496
                         |
                   _cons |   8119.723   6001.821     1.35   0.180    -3850.533    20089.98
        ----------------------------------------------------------------------------------
        
        . margins, dydx(mpg) at(mpg = (20(10)40) headroom = (2(1)5))
        
        Conditional marginal effects                    Number of obs     =         74
        Model VCE    : OLS
        
        Expression   : Linear prediction, predict()
        dy/dx w.r.t. : mpg
        
        1._at        : mpg             =          20
                       headroom        =           2
        
        2._at        : mpg             =          20
                       headroom        =           3
        
        3._at        : mpg             =          20
                       headroom        =           4
        
        4._at        : mpg             =          20
                       headroom        =           5
        
        5._at        : mpg             =          30
                       headroom        =           2
        
        6._at        : mpg             =          30
                       headroom        =           3
        
        7._at        : mpg             =          30
                       headroom        =           4
        
        8._at        : mpg             =          30
                       headroom        =           5
        
        9._at        : mpg             =          40
                       headroom        =           2
        
        10._at       : mpg             =          40
                       headroom        =           3
        
        11._at       : mpg             =          40
                       headroom        =           4
        
        12._at       : mpg             =          40
                       headroom        =           5
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        mpg          |
                 _at |
                  1  |  -195.1677   98.24459    -1.99   0.051    -391.1104    .7749436
                  2  |  -271.5447   60.54361    -4.49   0.000    -392.2951   -150.7943
                  3  |  -347.9217   124.2413    -2.80   0.007    -595.7132   -100.1303
                  4  |  -424.2987   212.0451    -2.00   0.049    -847.2093    -1.38817
                  5  |  -195.1677   98.24459    -1.99   0.051    -391.1104    .7749436
                  6  |  -271.5447   60.54361    -4.49   0.000    -392.2951   -150.7943
                  7  |  -347.9217   124.2413    -2.80   0.007    -595.7132   -100.1303
                  8  |  -424.2987   212.0451    -2.00   0.049    -847.2093    -1.38817
                  9  |  -195.1677   98.24459    -1.99   0.051    -391.1104    .7749436
                 10  |  -271.5447   60.54361    -4.49   0.000    -392.2951   -150.7943
                 11  |  -347.9217   124.2413    -2.80   0.007    -595.7132   -100.1303
                 12  |  -424.2987   212.0451    -2.00   0.049    -847.2093    -1.38817
        ------------------------------------------------------------------------------
        
        .
        And you could do a similar command to get marginal effects of headroom at specified values of mpg and headroom.

        Comment


        • #5
          This thread have been quite useful to me. But I have a question, what do the P values in the dy/dx table mean?

          Comment


          • #6
            They test the null hypotheses that the corresponding marginal effects are zero. For example, since 1.at corresponds to mpg = 20 and headroom = 2, the first row of
            Code:
             ------------------------------------------------------------------------------              |            Delta-method              |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg          |          _at |           1  |  -195.1677   98.24459    -1.99   0.051    -391.1104    .7749436           2  |  -271.5447   60.54361    -4.49   0.000    -392.2951   -150.7943           3  |  -347.9217   124.2413    -2.80   0.007    -595.7132   -100.1303           4  |  -424.2987   212.0451    -2.00   0.049    -847.2093    -1.38817           5  |  -195.1677   98.24459    -1.99   0.051    -391.1104    .7749436           6  |  -271.5447   60.54361    -4.49   0.000    -392.2951   -150.7943           7  |  -347.9217   124.2413    -2.80   0.007    -595.7132   -100.1303           8  |  -424.2987   212.0451    -2.00   0.049    -847.2093    -1.38817           9  |  -195.1677   98.24459    -1.99   0.051    -391.1104    .7749436          10  |  -271.5447   60.54361    -4.49   0.000    -392.2951   -150.7943          11  |  -347.9217   124.2413    -2.80   0.007    -595.7132   -100.1303          12  |  -424.2987   212.0451    -2.00   0.049    -847.2093    -1.38817 ------------------------------------------------------------------------------
            tells us that the marginal effect of mpg on price, conditional on mpg = 20 and headroom = 2, estimated at -195.1677, is not statistically significant at the 0.05 level (although it is a very near miss).

            I hasten to point out that null hypothesis tests about the values of such highly conditional marginal effects are often of no interest, unless they are explicitly part of the research goals. Usually these -margins, dydx()- outputs for continuous by continuous interactions are generated with the intent to graph them so as to get an overall understanding of the model.

            Added: Somehow the code box above got mangled. I don't have the time and patience to fix it just now. Suffice it to say that it was just a copy of the -margins, dydx()- output shown in #4.
            Last edited by Clyde Schechter; 23 May 2018, 10:39.

            Comment


            • #7
              schoolenroll=male+heightforagezscore+(heightforage *male)
              school enroll is a binary variable. Height for age is a continuous var.

              i want the effect of height of girls vs boys on the school enrollment.

              prob schoolenroll i.male HeightZscore i.male#c.HeightZscore
              margins,dydx ( HeightZscoreP) at (male=(1 0)) atmeans

              Clyde Schechter can you pls check is this is correct

              Comment


              • #8
                Dear Clyde, maybe you find this question interesting and make some advice

                We analyse how the radiation dose affects the binary outcome (cancer) using logit with a mixture of predictors interacting each other.
                The effect of radiation dose is of primary interest, but we suggest the interaction of the dose with sex and age at exposure. So the Stata command might be written as

                logit cr c.dose##sex c.dose#c.age c.age##c.age sex#c.age

                which gives us both "pure" dose component, and the dose# interactions with sex and the age along with the "pure" age and sex coeffs, as well.

                This should be consistent with basic model (please correct if I'm wrong)

                y(cr=1)=const + b1*sex + b2*dose + b3*(sex*dose) + b4*age + b5*(age*age) + b6*(age*dose) + b7*(sex*age)

                Should the mutual interactions of the dose, sex and age exist, can we use the following short equation to describe the modified effect of the dose?

                logit cr c.age##sex c.dose#c.age#c.age#sex

                What is (in theory) the difference between the dose#age#sex,and the dose#age + dose#sex + sex#age components?

                Comment


                • #9
                  It is almost certainly a serious mis-specification of the model to include interaction of dose or sex with the linear age variable but not the quadratic component. In a quadratic model, the linear component by itself is fairly exotic in its interpretation and I have great difficulty imagining any circumstance where interaction with the linear component alone would make any sense. A valid model for your situation would be:
                  Code:
                  logit cr c.dose##c.age##c.age##i.sex
                  This will expand to include the interactions of both dose and sex with both linear and quadratic age terms, as well as the dose#sex interaction.

                  What is (in theory) the difference between the dose#age#sex,and the dose#age + dose#sex + sex#age components?
                  There is no difference between these. However, as already indicated, because the quadratic component of age is omitted, both versions are equally wrong.

                  Comment


                  • #10
                    Dear Clyde, thank you for the clarification. Of no doubts, the polynomial age interaction must be included in the model.

                    Assuming that
                    There is no difference between these
                    the question is if the same model you provided

                    Code:

                    logit cr c.dose##c.age##c.age##i.sex
                    can be shown as the following:

                    Code:
                     
                     logit cr c.dose i.sex c.age c.age#c.age c.dose#i.sex c.dose#c.age c.dose#c.age#c.age i.sex#c.age i.sex#c.age#c.age
                    Here, we have all possible interactions accounted. However, the output (and the dose coeffs) will be quite different in both models.

                    I'm trying to interpret the sense of # and ## operators: How does the dose##sex and the dose#sex will be reflected in the general model
                    Code:
                     y (cr) = const + b1*sex + b2*age + ... + bi*xk 
                    In the other words, what do # and ## mathematically do with the variables?

                    Comment


                    • #11
                      Code:
                      logit cr c.dose i.sex c.age c.age#c.age c.dose#i.sex c.dose#c.age c.dose#c.age#c.age i.sex#c.age i.sex#c.age#c.age
                      // IS, INDEED, EQUIVALENT TO
                      logit cr c.dose##c.age##c.age##i.sex
                      However, the output (and the dose coeffs) will be quite different in both models.
                      No, they should not be different. Please show complete, exact output of both regressions and the exact code that was used to generate them. The order in which the coefficients are presented might be different, but the values of corresponding coefficients should be exactly the same.

                      How does the dose##sex and the dose#sex will be reflected in the general model
                      -dose##sex- is equivalent to -dose sex dose#sex-. In terms of the regression equation, dose#sex corresponds to the dose*sex term, whereas dose##sex corresponds to the dose, sex, and dose*sex terms.

                      Comment


                      • #12
                        Dear Clyde, I acknowledge your help, thank you! The datasample has 10 000 individual records. The results are shown below:

                        Output of the 1st (short) model @Clyde

                        Code:
                        . logit cr c.dose##c.age##c.age##i.sex,or
                        
                        Iteration 0:   log likelihood =  -4562.032  
                        Iteration 1:   log likelihood = -4216.0365  
                        Iteration 2:   log likelihood = -4148.5849  
                        Iteration 3:   log likelihood = -4141.1218  
                        Iteration 4:   log likelihood = -4140.9693  
                        Iteration 5:   log likelihood = -4140.9692  
                        
                        Logistic regression                                     Number of obs = 10,000
                                                                                LR chi2(11)   = 842.13
                                                                                Prob > chi2   = 0.0000
                        Log likelihood = -4140.9692                             Pseudo R2     = 0.0923
                        
                        ----------------------------------------------------------------------------------------
                                            cr | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
                        -----------------------+----------------------------------------------------------------
                                          dose |    1.20182   .0615702     3.59   0.000     1.087005    1.328762
                                           age |   1.156728   .0241439     6.98   0.000     1.110362    1.205031
                                               |
                                  c.dose#c.age |    .994604   .0017923    -3.00   0.003     .9910974    .9981231
                                               |
                                   c.age#c.age |   .9991984    .000174    -4.60   0.000     .9988574    .9995395
                                               |
                            c.dose#c.age#c.age |   1.000043   .0000156     2.73   0.006     1.000012    1.000073
                                               |
                                         1.sex |   15.30515   11.94568     3.50   0.000     3.314928    70.66446
                                               |
                                    sex#c.dose |
                                            1  |   .7285395   .0752805    -3.07   0.002     .5949738    .8920893
                                               |
                                     sex#c.age |
                                            1  |    .926632   .0247154    -2.86   0.004     .8794352    .9763619
                                               |
                              sex#c.dose#c.age |
                                            1  |   1.010385   .0034988     2.98   0.003     1.003551    1.017266
                                               |
                               sex#c.age#c.age |
                                            1  |   1.000433   .0002241     1.93   0.053     .9999935    1.000872
                                               |
                        sex#c.dose#c.age#c.age |
                                            1  |   .9999203   .0000287    -2.78   0.005     .9998641    .9999765
                                               |
                                         _cons |   .0007362   .0004531   -11.72   0.000     .0002204    .0024598
                        ----------------------------------------------------------------------------------------
                        Note: _cons estimates baseline odds.
                        ! please note incredibly high Odds Ratio coeff at sex=1 (female)

                        and the output of the 2nd (long) model @me

                        Code:
                        logit cr c.dose i.sex c.age c.age#c.age c.dose#i.sex c.dose#c.age c.dose#c.age#c.age i.sex#c.age i.sex#c.age#c.age,or
                        
                        Iteration 0:   log likelihood =  -4562.032  
                        Iteration 1:   log likelihood = -4219.3374  
                        Iteration 2:   log likelihood = -4154.2363  
                        Iteration 3:   log likelihood = -4147.4464  
                        Iteration 4:   log likelihood = -4147.3435  
                        Iteration 5:   log likelihood = -4147.3435  
                        
                        Logistic regression                                     Number of obs = 10,000
                                                                                LR chi2(9)    = 829.38
                                                                                Prob > chi2   = 0.0000
                        Log likelihood = -4147.3435                             Pseudo R2     = 0.0909
                        
                        ------------------------------------------------------------------------------------
                                        cr | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
                        -------------------+----------------------------------------------------------------
                                      dose |   1.090039   .0438321     2.14   0.032     1.007427    1.179424
                                     1.sex |   3.784193   2.479557     2.03   0.042     1.047687    13.66832
                                       age |   1.134298   .0211647     6.75   0.000     1.093565    1.176547
                                           |
                               c.age#c.age |   .9993472   .0001564    -4.17   0.000     .9990407    .9996538
                                           |
                                sex#c.dose |
                                        1  |   1.006671    .005437     1.23   0.218     .9960707    1.017384
                                           |
                              c.dose#c.age |   .9977421   .0013835    -1.63   0.103     .9950342    1.000457
                                           |
                        c.dose#c.age#c.age |   1.000018   .0000118     1.55   0.122     .9999951    1.000041
                                           |
                                 sex#c.age |
                                        1  |   .9688044   .0217323    -1.41   0.158     .9271327    1.012349
                                           |
                           sex#c.age#c.age |
                                        1  |   1.000093   .0001884     0.49   0.623     .9997236    1.000462
                                           |
                                     _cons |   .0013602   .0007444   -12.06   0.000     .0004653     .003976
                        ------------------------------------------------------------------------------------

                        Comment


                        • #13
                          Sorry, my mistake in #11. I didn't read your extended model carefully enough. Compared to c.dose##c.age##c.age##i.sex, it is missing the following terms:

                          c.dose#c.age#i.sex and c.dose#c.age#c.age#i.sex.

                          Those omissions in the second model account for the difference. You need to decide which model is the one you want. In the ## model, dose is fully interact with combinations of age and sex, whereas in the extended model that omits the two terms shown above, dose is separately interacted with age and with sex. So, in the extended model, the extent to which that sex modifies dose effect is the same at all ages and the extent to which age modifies dose effect is the same in both sexes. In the ## model, the extent to which age modifies the dose effect does (i.e. can) differ across the sexes, and the extent to which sex modifies dose effect varies by age.

                          The ease with which it is possible to mistakenly omit terms like this, and the difficulty of fully understanding a long command with the separate terms written out, is one of the strongest arguments for using the ## notation in preference to the # notation in most circumstances.
                          Last edited by Clyde Schechter; 18 Apr 2023, 11:19.

                          Comment


                          • #14
                            Well, Clyde, you're welcome to suggest the best ## model in this particular case. I'll test it and (I think it will be useful for readers) show the results.
                            Is that updated model correct?
                            Code:
                            logit cr c.dose##c.age##c.age##i.sex c.dose#c.age#i.sex c.dose#c.age#c.age#i.sex
                            Output:

                            Code:
                             logit cr c.dose##c.age##c.age##i.sex c.dose#c.age#i.sex c.dose#c.age#c.age#i.sex
                            
                            Iteration 0:   log likelihood =  -4562.032  
                            Iteration 1:   log likelihood = -4216.0365  
                            Iteration 2:   log likelihood = -4148.5849  
                            Iteration 3:   log likelihood = -4141.1218  
                            Iteration 4:   log likelihood = -4140.9693  
                            Iteration 5:   log likelihood = -4140.9692  
                            
                            Logistic regression                                     Number of obs = 10,000
                                                                                    LR chi2(11)   = 842.13
                                                                                    Prob > chi2   = 0.0000
                            Log likelihood = -4140.9692                             Pseudo R2     = 0.0923
                            
                            ----------------------------------------------------------------------------------------
                                                cr | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                            -----------------------+----------------------------------------------------------------
                                              dose |    .183837   .0512308     3.59   0.000     .0834266    .2842475
                                               age |   .1455955   .0208726     6.98   0.000     .1046861     .186505
                                                   |
                                      c.dose#c.age |  -.0054106    .001802    -3.00   0.003    -.0089425   -.0018787
                                                   |
                                       c.age#c.age |  -.0008019   .0001742    -4.60   0.000    -.0011433   -.0004606
                                                   |
                                c.dose#c.age#c.age |   .0000425   .0000156     2.73   0.006      .000012    .0000731
                                                   |
                                             1.sex |   2.728189   .7805008     3.50   0.000     1.198436    4.257943
                                                   |
                                        sex#c.dose |
                                                1  |  -.3167135   .1033307    -3.07   0.002    -.5192378   -.1141891
                                                   |
                                         sex#c.age |
                                                1  |  -.0761987   .0266723    -2.86   0.004    -.1284754    -.023922
                                                   |
                                  sex#c.dose#c.age |
                                                1  |   .0103314   .0034629     2.98   0.003     .0035443    .0171185
                                                   |
                                   sex#c.age#c.age |
                                                1  |   .0004327   .0002241     1.93   0.053    -6.46e-06    .0008718
                                                   |
                            sex#c.dose#c.age#c.age |
                                                1  |  -.0000797   .0000287    -2.78   0.005    -.0001359   -.0000235
                                                   |
                                             _cons |   -7.21395   .6154648   -11.72   0.000    -8.420239   -6.007661
                            ----------------------------------------------------------------------------------------

                            Comment


                            • #15
                              When you have c.dose##c.age##c.age##i.sex, you don't need to also mention c.dose#c.age#i.sex c.dose#c.age#c.age#i.sex: Stata will create those for you "for free." Whenever you use the ## operator, you automatically get the interaction term plus the "main" terms. When it's a 3-or-higher level interaction specified with ##s, you get the highest level interaction, all of the "main" terms, and all of the lower level interactions that are included.

                              It's not my place to tell you which model to use: you have to pick the model that corresponds to how you think the real data generating process works. That difference is not a technical coding matter, nor a statistical issue: it's a substantive difference in what you think is going on in the real world. I don't even know what your outcome variable cr is, so I couldn't begin to guess how it relates to those things. Even if I did know what cr is, chances are I do not know enough about it and whatever it is you are dosing, to guess how these things all inter-relate. But as the researcher in this area, you must have some understanding in this area that would lean you towards one model or the other. If you believe that the way in which the effect of dose depends on age is also sex-specific, then you want the full-blown -c.dose##c.age##c.age##i.sex- model. If you think that the way in which the of dose depends on age is the same for both sexes, even though sex itself modifies the dose effect, then you want a smaller model, which could be specified compactly with -c.dose##(c.age##c.age i.sex)-.

                              Comment

                              Working...
                              X