Interaction term with a multiple categorical variable and a continuous variable

Sebas Kalkman

Join Date: Jan 2020
Posts: 14

Interaction term with a multiple categorical variable and a continuous variable

28 Jan 2020, 16:25

What I am thinking of is to do a regression with interaction terms: Tobin's Q = ESG + Total Assets + Leverage + year dummy + country dummy + (ESG * Country dummy)
Tobin's Q = profitability
ESG = measurement for corporate social responsibility
I specifically want to look whether ESG within an industry differs for the outcome on Tobin's Q.

My questions 1: Is it possible to do a categorical * continuous variable interaction? I have a multi-group categorical variable (9 industries). 2: Why does the significance for the first order effect (ESG) change? 3. Does it matter that the significance of the first order effect changes when choosing an other reference base to omit? 4. And what does it mean?

For example:

Code:

 reg tobin_Q ESG TA Lev i.year i.ID ib(3).ID#c.ESG

      Source |       SS           df       MS      Number of obs   =     3,162
-------------+----------------------------------   F(24, 3137)     =     45.30
       Model |  14263.2171        24  594.300715   Prob > F        =    0.0000
    Residual |  41155.7471     3,137  13.1194603   R-squared       =    0.2574
-------------+----------------------------------   Adj R-squared   =    0.2517
       Total |  55418.9642     3,161  17.5320988   Root MSE        =    3.6221

------------------------------------------------------------------------------
     tobin_Q |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ESG |   .0335445   .0268197     1.25   0.211    -.0190415    .0861305
          TA |   7.74e-06   3.32e-07    23.31   0.000     7.09e-06    8.39e-06
         Lev |  -.0006931   .0002678    -2.59   0.010    -.0012182   -.0001681
             |
        year |
       2013  |   .6918609    .223241     3.10   0.002     .2541477    1.129574
       2014  |   .6280345   .2232988     2.81   0.005      .190208    1.065861
       2015  |   .0730515   .2233201     0.33   0.744    -.3648168    .5109199
       2016  |  -.1180459   .2240188    -0.53   0.598    -.5572842    .3211924
       2017  |  -.2589376   .2248478    -1.15   0.250    -.6998012     .181926
             |
          ID |
          1  |  -2.049102   27.69795    -0.07   0.941    -56.35703    52.25883
          2  |   -4.06836   2.690989    -1.51   0.131    -9.344638    1.207918
          4  |  -.5423996   2.303534    -0.24   0.814    -5.058985    3.974186
          5  |   1.567367   2.367521     0.66   0.508    -3.074679    6.209413
          6  |   1.096329   2.611065     0.42   0.675    -4.023238    6.215897
          7  |   1.226059   2.459059     0.50   0.618    -3.595467    6.047586
          8  |   2.271404   2.291336     0.99   0.322    -2.221266    6.764075
          9  |   1.133982   2.342542     0.48   0.628    -3.459088    5.727052
             |
    ID#c.ESG |
          1  |   .0194658   .3365823     0.06   0.954    -.6404781    .6794096
          2  |   .0984976   .0320148     3.08   0.002     .0357255    .1612697
          4  |   .0299245   .0273642     1.09   0.274     -.023729     .083578
          5  |  -.0051829   .0281038    -0.18   0.854    -.0602866    .0499209
          6  |  -.0093211   .0316338    -0.29   0.768    -.0713461     .052704
          7  |  -.0086444   .0291978    -0.30   0.767    -.0658932    .0486043
          8  |  -.0348389   .0273409    -1.27   0.203    -.0884467    .0187689
          9  |  -.0087986   .0279942    -0.31   0.753    -.0636873    .0460902
             |
       _cons |  -1.651635   2.258138    -0.73   0.465    -6.079213    2.775943
------------------------------------------------------------------------------

Code:

 reg tobin_Q ESG TA Lev i.year i.ID ib(2).ID#c.ESG

      Source |       SS           df       MS      Number of obs   =     3,162
-------------+----------------------------------   F(24, 3137)     =     45.30
       Model |  14263.2171        24  594.300715   Prob > F        =    0.0000
    Residual |  41155.7471     3,137  13.1194603   R-squared       =    0.2574
-------------+----------------------------------   Adj R-squared   =    0.2517
       Total |  55418.9642     3,161  17.5320988   Root MSE        =    3.6221

------------------------------------------------------------------------------
     tobin_Q |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ESG |   .1320421   .0175225     7.54   0.000     .0976854    .1663988
          TA |   7.74e-06   3.32e-07    23.31   0.000     7.09e-06    8.39e-06
         Lev |  -.0006931   .0002678    -2.59   0.010    -.0012182   -.0001681
             |
        year |
       2013  |   .6918609    .223241     3.10   0.002     .2541477    1.129574
       2014  |   .6280345   .2232988     2.81   0.005      .190208    1.065861
       2015  |   .0730515   .2233201     0.33   0.744    -.3648168    .5109199
       2016  |  -.1180459   .2240188    -0.53   0.598    -.5572842    .3211924
       2017  |  -.2589376   .2248478    -1.15   0.250    -.6998012     .181926
             |
          ID |
          1  |   2.019258   27.64541     0.07   0.942    -52.18565    56.22417
          3  |    4.06836   2.690989     1.51   0.131    -1.207918    9.344638
          4  |    3.52596   1.539616     2.29   0.022     .5072031    6.544717
          5  |   5.635727   1.632885     3.45   0.001     2.434096    8.837358
          6  |   5.164689   1.969744     2.62   0.009     1.302573    9.026806
          7  |   5.294419    1.76245     3.00   0.003     1.838748    8.750091
          8  |   6.339764   1.520463     4.17   0.000     3.358561    9.320967
          9  |   5.202342   1.596727     3.26   0.001     2.071607    8.333077
             |
    ID#c.ESG |
          1  |  -.0790318   .3359756    -0.24   0.814     -.737786    .5797224
          3  |  -.0984976   .0320148    -3.08   0.002    -.1612697   -.0357255
          4  |  -.0685731   .0183596    -3.73   0.000    -.1045712    -.032575
          5  |  -.1036804   .0194234    -5.34   0.000    -.1417644   -.0655965
          6  |  -.1078186   .0242669    -4.44   0.000    -.1553992   -.0602381
          7  |   -.107142   .0209803    -5.11   0.000    -.1482785   -.0660055
          8  |  -.1333365   .0182929    -7.29   0.000    -.1692037   -.0974692
          9  |  -.1072961   .0192822    -5.56   0.000    -.1451032   -.0694891
             |
       _cons |  -5.719995   1.474049    -3.88   0.000    -8.610193   -2.829796
------------------------------------------------------------------------------

The regressions above are coded almost identical expect for one thing . The reference base of the first regression is 6 (Finance industry) and the reference of the second regression is 2 (Mining Industry). Surprisingly is that the first order effect (ESG) in the first regression is non significant (p=0,21) while in the second regression it is significant (p=0,00). Another thing is that the all the industry dummies and interaction term with ESG are almost all insignificant in the first regression while almost all significant for the second regression?

I am not really sure what this means and how to interpret both models. The F(24,3137) = 45,30 and p = 0.00 and R2 is the same for both models indicating that the models are both significant. And I understand that the main effect reported by Stata only refers to one of these depending on which reference level you chose. This is how that parameter changes (sometimes substantially) between models. And as you can compute the other effect(s) from your model results and get exactly the same results. But getting the same results but other significance would matter right? Is there a right way to choose which variable to be the reference base?

Thanks in advance,

Sebas Kalkman

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

28 Jan 2020, 17:01

Of course these changes occur. In the first regression the coefficient of ESG is the marginal effect of ESG when ID = 3, whereas in the second regression, the coefficient of ESG is the marginal effect of ESG when ID = 2. The phrase "coefficient of ESG" thus represents two different things in the two models. So unless you have reason to believe that the marginal effect of ESG is the same for both of those two IDs then there would be something wrong if the coefficient of ESG didn't change. If you are thinking that the coefficient of ESG is some kind of "overall" marginal effect of ESG in these models, then you just don't understand what an interaction model is.

What doesn't change, however, are the correctly calculated marginal effects. If you run:

Code:

margins ID, dydx(ESG)

you will get the marginal ESG effects at each level of ID, and you will see they are the same for both models.

Also unchanging are the predicted values of tobin_Q. Run the -predict- command after each model and you will see that the predicted values are the same for both. (And, consequently, the predictive margins you would get are also the same with both models.) In technical terms, these two models are simply two different parameterizations of the same model. That is why the predictions, and, as you have already noticed, the F statistic and R2 are the same for both. But you have to remember that the coefficients with the same "names" represent different things in the two models, so the coefficients shift around. They shift in ways that ultimately lead to the same model predictions and overall model properties.

Finally, be aware that the American Statistical Association has recommended that the concept of statistical significance be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and
https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr. To really understand what your model tells you, focus on effect sizes and their uncertainty (standard errors or confidence intervals) and avoid attributing importance to what is or is not "statistically significant."
Comment
Sebas Kalkman

Join Date: Jan 2020

Posts: 14
#3

29 Jan 2020, 02:52

Thank you for your answer Clyde!

What bothers me about this is as follows: using one base level or another of a dummy, changes my "ability" to make statistically significant predictions\descriptions of the results, while It shouldn't. I understand the difference between statistical significance & importance. What I'm trying to say is that given the differences when setting a different base level, Though the indicator & interaction term are equivalent , the first order (for example: TOBINS Q) may not be. If I have to interpret the results above, how should I come to the same conclusion?

Im just not sure how to interpret both models if I should explain it in a paper (If they lead to the same conclusion)?
And does the marginal effects at each level of ID tells something about the Tobin's Q?

Thanks in advance,

Sebas Kalkman
Comment

Sebas Kalkman

Join Date: Jan 2020
Posts: 14

29 Jan 2020, 04:20

Code:

. margins ID, dydx(ESG)

Average marginal effects                        Number of obs     =      3,162
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : ESG

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ESG          |
          ID |
          1  |   .0530103   .3355349     0.16   0.874    -.6048798    .7109004
          2  |   .1320421   .0175225     7.54   0.000     .0976854    .1663988
          3  |   .0335445   .0268197     1.25   0.211    -.0190415    .0861305
          4  |    .063469   .0055509    11.43   0.000     .0525853    .0743527
          5  |   .0283617   .0084116     3.37   0.001     .0118689    .0448544
          6  |   .0242235   .0167948     1.44   0.149    -.0087064    .0571533
          7  |   .0249001   .0115479     2.16   0.031     .0022578    .0475424
          8  |  -.0012944   .0054789    -0.24   0.813     -.012037    .0094483
          9  |    .024746   .0080703     3.07   0.002     .0089223    .0405696
------------------------------------------------------------------------------

This is the output of the marginal effect. Should I use this in my explanation or should I use the ESG#ID interactions themselves to explain?

and can this explain my main question; ESG lead to different outcomes in the increase of Tobin's Q within Industry

Last edited by Sebas Kalkman; 29 Jan 2020, 05:12.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#5

29 Jan 2020, 13:54

What I'm trying to say is that given the differences when setting a different base level, Though the indicator & interaction term are equivalent , the first order (for example: TOBINS Q) may not be. If I have to interpret the results above, how should I come to the same conclusion?

Please re-read my response in #2. The differences you are seeing in the regression output are only because you are misunderstanding the output. There are no differences that have any real meaning between those two models. You are thinking you are seeing real differences because coefficients that are labeled the same in the two models are actually different things. But when you compare apples with apples, there are no differences at all. Because regression coefficients in interaction models are tricky in that way, it is best to ignore the regression output and do your interpretation based on -margins- output. Even if you insist on using statistical significance criteria, you should make them using the -margins- output, not the regression coefficients.
Comment
Sebas Kalkman

Join Date: Jan 2020

Posts: 14
#6

29 Jan 2020, 14:33

Thank you for your answer Clyde! much appreciated
Comment

Announcement

Interaction term with a multiple categorical variable and a continuous variable

Comment

Comment

Comment

Comment

Comment