How to interpret interaction terms with continuous variables?

Marinela Veleva

Join Date: Mar 2021

Posts: 13
#1

How to interpret interaction terms with continuous variables?

25 Mar 2021, 10:41

Dear all,

I am currently doing my thesis and I would really appreciate some help, with the interpretation of some results. I am studying the effect of board characteristics on firm performance during COVID and my data is cross-sectional (as per end of 2019) and I examine the effect of board of directors characteristics on cumulative abnormal return during February-March 2020. As one of my additional tests, I am using interaction terms between the natural logarithm of board size and the natural logarithm of firm size (measured by market capitalization) to test whether board size has opposing effects for small sizes as compared to large sizes. This is how my OLS regression looks like:

I can see that the sign for ln(board size) is positive and significant by itself, however, the sign for its interaction with ln(firm size) is negative and signfiicant. The same with CEO duality. How do I interpret these coefficients, though? Does this mean that for small company sizes the effect is positive, but for large firms the effect is negative? For example, for smaller firms increasing the board size associates positively with performance, but for larger firms increasing board size leads to worse performance?

I would really appreciate some help!
Thank you in advance!

Last edited by Marinela Veleva; 25 Mar 2021, 10:48.
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

25 Mar 2021, 10:48

The interpretation, Marinela, comes from taking derivatives.

If you are fitting the model EY = a + b*X + c*W + d*X*W, then the marginal effect of X on EY is nonlinear, and in particular
d(EY)/dX = b + d*W.
Comment
Marinela Veleva

Join Date: Mar 2021

Posts: 13
#3

25 Mar 2021, 11:00

Joro, thank you for the answer. However, I am not quite advanced with derivatives and I am not quite sure how to interpret your comment. Could you elaborate on it in the context of my case? Could I frame it in the way that the effect of board size on firm performance depends on firm size and it is positive for smaller firms while negative for larger firms?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

26 Mar 2021, 03:17

In your case,

(1) d(CAR)/d(log (Board Size)) = .575 - .081*(log Firm Size).

(I do not know why you are taking logs of board size and firm size, but I will interpret everything in terms of these log-variables.)

Now you can summarize log Firm Size, you write the following command immediately after you fit your regression that you showed above:

Code:

summ lC_Size if e(sample), detail

and now depending on what values you got from the summarize, you can substitute them in the expression (1) and for example see what is the marginal effect of log Board Size on CAR at the 5 percentile, at the 25th percentile, at the 75 percentile, at the 95 percentile etc. of log Firm Size.

Originally posted by Marinela Veleva View Post

Joro, thank you for the answer. However, I am not quite advanced with derivatives and I am not quite sure how to interpret your comment. Could you elaborate on it in the context of my case? Could I frame it in the way that the effect of board size on firm performance depends on firm size and it is positive for smaller firms while negative for larger firms?
Comment
Marinela Veleva

Join Date: Mar 2021

Posts: 13
#5

27 Mar 2021, 10:02

Joro Kolev , thank you for the helpful comment! I didn't even know about this function/option in which I can see what is the effect for different percetiles for the same variable. Having seen this, I have a further question - if I estimate a regression without interaction coefficients, for example:

1) could I use the same code:

summ var if e(sample), detail

in order to estimate what is the simultaneous effect of 4 variables in a corresponding percentile on the return/what is the return? For example, estimating the return, using the coefficients, if a firm is simultaneously at the 75th percentile of board size, the 25th percentile of independence, 25th percentile of gender diversity and with CEO duality = 0? Something like this:

summ lBoard_Size Independence_ Gender_Diversity_ CEO_Duality if e(sample), detail

Is this possible?

2)When I use such commpand, which column am I supposed to take the value from- the first or the second?

3) A side question: is it possible to combine the commands ", robust " and " , cluster (var) " at the same regression? I have heteroskedasticity in my models and since clustering does not solve heteroskedasticity within the clusters, i would like also to avoid this one? What could I do?

Thank you in advance!
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

27 Mar 2021, 10:24

3) Yes, it is possible to combine robust and cluster, in fact, cluster automatically implies robust, so if you specify cluster(var) your standard errors will be robust as well, on top of being clustered by var.

For 1) and 2) you might have misunderstood me.

In linear regression the marginal effects are constant.

The derivative in #4 needs to be evaluated at percentiles of log Firm Size (not percentiles of log Board Size).
Comment
Marinela Veleva

Join Date: Mar 2021

Posts: 13
#7

27 Mar 2021, 10:39

So for 3), I can directly use the , cluster (var) command and i will combine heteroskedasitcity and clustering robust errors ?

For 1) and 2) i got the point about the interaction terms and the derivative when I use a regression with interaction terms. However, the regression that I showed afterwards is another regression, where I just am looking at the pure effects of the variables without any interaction, which is for a subsample. So my question was more about another context where I wanted to look at a firm which simultaneously is at the mentioned percentiles if each of the 4 variables. You said that the marginal effects are constant - does this mean that I cannot evaluate it in a simultaneous manner? For example: for estimating the return for a hypothetical firm in the mentioned percentiles : -0.32(value at 75th percentile of Board size) + 0.07 (value at 25th percentile of Independence) - 0.33 (value at 25th percentile of Gender divesity)...etc...?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

28 Mar 2021, 02:13

3) Yes, you just write , cluster(cluter_var) and your standard errors are robust and clustered by cluter_var.

1) and 2) Yes, you can do what you are saying. Although the marginal effects are constant when you have linear regression without interactions, of course the Yhat will be different when you use different (values of the) predictors.

Yhat = a + b*X + c*Z, where the a,b,c are the estimated parameters. You can plug in whatever values you want for the X and Z, and see how this will affect Yhat.

You might want to look up the help for -predict- and -margins-.

Originally posted by Marinela Veleva View Post

So for 3), I can directly use the , cluster (var) command and i will combine heteroskedasitcity and clustering robust errors ?

For 1) and 2) i got the point about the interaction terms and the derivative when I use a regression with interaction terms. However, the regression that I showed afterwards is another regression, where I just am looking at the pure effects of the variables without any interaction, which is for a subsample. So my question was more about another context where I wanted to look at a firm which simultaneously is at the mentioned percentiles if each of the 4 variables. You said that the marginal effects are constant - does this mean that I cannot evaluate it in a simultaneous manner? For example: for estimating the return for a hypothetical firm in the mentioned percentiles : -0.32(value at 75th percentile of Board size) + 0.07 (value at 25th percentile of Independence) - 0.33 (value at 25th percentile of Gender divesity)...etc...?
Comment
Marinela Veleva

Join Date: Mar 2021

Posts: 13
#9

28 Mar 2021, 05:36

Thank you! As a very curious researcher on my topic, I have another question:
1) I was thinking to do the clustering, however the more I read on the forums and I also checked the Wooldridge book, when clustering is mentioned, it seems to be always in the context of panel data, or time series data, or pooled regressions. However, my data is cross-sectional and after all that reading I am still not sure whether clustering is really applicable in my case. I have 7 industries in the 1-digit level and somewhere around 39 industries in the 2-digit level.
2) I am also not sure anymore whether is is good to use 1-digit level control variables or 2-sigit level because when I use the 2-digit control industries I am missing the F-statistic of the regression. I checked and I have singleton dummies and this might be the reason, but I am wondering if there is any problem to use dummies which take the value of 1 only for one obervation (and if there is a source for that). Also, when I use the 2-digit dummies my significance of the results changes a lot. I have only 293 cross sectional observations and I am wondering if including 39 2-digit dummies creates some kind of a disbalance in such regression. However, I am not sure if 1-digit dummies are too aggregate. What would be your advice?

Meanwhile, I decided to create a dummy for small firms in the wider sample that I have with 1087 firms (the first screenshot in #1). This was just to provide a clearer effect only on small firms after doing the interactions with company size. However, when I do the interactions with the dummy for small firm, I get the interaction between the dummy for CEO duality and Small firms both =1 omitted, but this is exctly what I want to see the effect of. This is how it looks:

To summarize:
1) Is clustering on 39 clusters reasonable/applicable in a simple cross sectional data with 293 observations?
2) For control variables - what is better in the tradeoff between the 1 and 2 digit control variables - are 1-digit too aggregate; or are 2-digit creating some disbalance and are singletons to be avoided?
3) How can I get the effect of the dummy interaction for CEO duality and small fims=1?

Thank you again!

Originally posted by Joro Kolev View Post

3) Yes, you just write , cluster(cluter_var) and your standard errors are robust and clustered by cluter_var.

1) and 2) Yes, you can do what you are saying. Although the marginal effects are constant when you have linear regression without interactions, of course the Yhat will be different when you use different (values of the) predictors.

Yhat = a + b*X + c*Z, where the a,b,c are the estimated parameters. You can plug in whatever values you want for the X and Z, and see how this will affect Yhat.

You might want to look up the help for -predict- and -margins-.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#10

29 Mar 2021, 02:43

1) Yes, cluster-robust variance is appropriate and needed in any context where there are clusters within which the equation error terms are correlated. In Finance we believe that there is industry level co-movement of stock returns. If I were you, I would cluster my errors at the 2-digit industry level. ( Because you have too few clusters at the 1-digit industry level, and with so few clusters, 8th in your case at the 1-digit level correct inference is beyond the scope of a Bachelors'/Masters' thesis.)

2) My advice would be to keep the dummies at the 1-digit industry level. Industries which have only one observation at the 2 digit level would not contribute to the parameter estimates if you include dummies at the 2 digits level, you would be throwing out the information coming from those singleton industries.

So on 1) and 2) I would recommend that you include dummies at the 1 digit industry level, but cluster at the 2 digit industry level.

3) If Stata is dropping variables, it simply means that they are multicollinear in your sample. If they are multicollinear, they are multicollinear, there is nothing anybody could do about this. If I were you, I would control for firm size not with dummies for small firms, but with a continuous variable such as number of employees, or total value of assets.

Originally posted by Marinela Veleva View Post

Thank you! As a very curious researcher on my topic, I have another question:
1) I was thinking to do the clustering, however the more I read on the forums and I also checked the Wooldridge book, when clustering is mentioned, it seems to be always in the context of panel data, or time series data, or pooled regressions. However, my data is cross-sectional and after all that reading I am still not sure whether clustering is really applicable in my case. I have 7 industries in the 1-digit level and somewhere around 39 industries in the 2-digit level.
2) I am also not sure anymore whether is is good to use 1-digit level control variables or 2-sigit level because when I use the 2-digit control industries I am missing the F-statistic of the regression. I checked and I have singleton dummies and this might be the reason, but I am wondering if there is any problem to use dummies which take the value of 1 only for one obervation (and if there is a source for that). Also, when I use the 2-digit dummies my significance of the results changes a lot. I have only 293 cross sectional observations and I am wondering if including 39 2-digit dummies creates some kind of a disbalance in such regression. However, I am not sure if 1-digit dummies are too aggregate. What would be your advice?

Meanwhile, I decided to create a dummy for small firms in the wider sample that I have with 1087 firms (the first screenshot in #1). This was just to provide a clearer effect only on small firms after doing the interactions with company size. However, when I do the interactions with the dummy for small firm, I get the interaction between the dummy for CEO duality and Small firms both =1 omitted, but this is exctly what I want to see the effect of. This is how it looks:
[ATTACH=CONFIG]n1600192[/ATTACH]

To summarize:
1) Is clustering on 39 clusters reasonable/applicable in a simple cross sectional data with 293 observations?
2) For control variables - what is better in the tradeoff between the 1 and 2 digit control variables - are 1-digit too aggregate; or are 2-digit creating some disbalance and are singletons to be avoided?
3) How can I get the effect of the dummy interaction for CEO duality and small fims=1?

Thank you again!
Comment
Marinela Veleva

Join Date: Mar 2021

Posts: 13
#11

29 Mar 2021, 02:52

Joro Kolev , thank you so much for the invaluable advice! Could you clarify one thing - what do you mean by "if you include dummies at the 2 digits level, you would be throwing out the information coming from those singleton industries."? Would there be any source, which I can use for backing up my choice of 1-digit dummies? This is just because my supervisor is extremely questioning every time.

Best!

Originally posted by Joro Kolev View Post

1) Yes, cluster-robust variance is appropriate and needed in any context where there are clusters within which the equation error terms are correlated. In Finance we believe that there is industry level co-movement of stock returns. If I were you, I would cluster my errors at the 2-digit industry level. ( Because you have too few clusters at the 1-digit industry level, and with so few clusters, 8th in your case at the 1-digit level correct inference is beyond the scope of a Bachelors'/Masters' thesis.)

2) My advice would be to keep the dummies at the 1-digit industry level. Industries which have only one observation at the 2 digit level would not contribute to the parameter estimates if you include dummies at the 2 digits level, you would be throwing out the information coming from those singleton industries.

So on 1) and 2) I would recommend that you include dummies at the 1 digit industry level, but cluster at the 2 digit industry level.

3) If Stata is dropping variables, it simply means that they are multicollinear in your sample. If they are multicollinear, they are multicollinear, there is nothing anybody could do about this. If I were you, I would control for firm size not with dummies for small firms, but with a continuous variable such as number of employees, or total value of assets.

Last edited by Marinela Veleva; 29 Mar 2021, 02:57.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

#12

29 Mar 2021, 04:22

Marinela, the source is Joro Kolev, who says that the following two are algebraic regression facts:

1.) Fact one, if every industry is a singleton, nobody can estimate anything on top of the constant. Here, I tag only one observation per group defined by the variable rep:

Code:

. sysuse auto, clear
(1978 Automobile Data)

. keep if !missing(rep)
(5 observations deleted)

. egen tag = tag(rep)

. reg price mpg i.rep if tag, absorb(rep)
note: mpg omitted because of collinearity
note: 2.rep78 omitted because of collinearity
note: 3.rep78 omitted because of collinearity
note: 4.rep78 omitted because of collinearity
note: 5.rep78 omitted because of collinearity

Linear regression, absorbing indicators         Number of obs     =          5
                                                F(0, 0)           =       0.00
                                                Prob > F          =          .
                                                R-squared         =     1.0000
                                                Adj R-squared     =          .
                                                Root MSE          =          0

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |          0  (omitted)
             |
       rep78 |
          2  |          0  (omitted)
          3  |          0  (omitted)
          4  |          0  (omitted)
          5  |          0  (omitted)
             |
       _cons |       6921          .        .       .            .           .
------------------------------------------------------------------------------

.

as you see I cannot estimate anything but a constant.

Now I am going to keep the rep==1 and rep==2 as singletons, but the rest of the groups defined by rep>2 I let them be whatever they are (not singletons):

Code:

. replace tag = tag + 1 if rep>2
(59 real changes made)

. reg price mpg i.rep if tag, absorb(rep)
note: 2.rep78 omitted because of collinearity
note: 3.rep78 omitted because of collinearity
note: 4.rep78 omitted because of collinearity
note: 5.rep78 omitted because of collinearity

Linear regression, absorbing indicators         Number of obs     =         61
                                                F(1, 55)          =      17.25
                                                Prob > F          =     0.0001
                                                R-squared         =     0.3416
                                                Adj R-squared     =     0.2817
                                                Root MSE          =     2573.4

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -261.2437   62.89928    -4.15   0.000    -387.2967   -135.1908
             |
       rep78 |
          2  |          0  (omitted)
          3  |          0  (omitted)
          4  |          0  (omitted)
          5  |          0  (omitted)
             |
       _cons |   11945.14   1392.397     8.58   0.000     9154.718    14735.57
------------------------------------------------------------------------------

Now I managed to estimate the slope on mpg. However,

2.) Fact two, the slope I estimated on mpg is not determined by the two singleton groups, in fact the regression above simply disregarded/threw out the singleton groups.

I estimate below the regression only for rep>2, that is I throw out manually the singleton groups, and the slope on mpg is still the same.

Code:

. reg price mpg i.rep if rep>2, absorb(rep)
note: 4.rep78 omitted because of collinearity
note: 5.rep78 omitted because of collinearity

Linear regression, absorbing indicators         Number of obs     =         59
                                                F(1, 55)          =      17.25
                                                Prob > F          =     0.0001
                                                R-squared         =     0.2431
                                                Adj R-squared     =     0.2018
                                                Root MSE          =     2573.4

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -261.2437   62.89928    -4.15   0.000    -387.2967   -135.1908
             |
       rep78 |
          4  |          0  (omitted)
          5  |          0  (omitted)
             |
       _cons |   11864.94    1398.91     8.48   0.000     9061.463    14668.42
------------------------------------------------------------------------------

Note that the observations in the previous two regressions are different, 61 vs 59. And yet the slope on mpg is the same.

The same will happen in your regression if you include dummies at the 2 digit industry level. The singleton industries will not contribute to the estimation of your slopes.

Finally, it is up to you what you do. If you think that it is crucial to include dummies at the 2 digit industry level, you do that, and you live with the fact that the singleton industries were "silenced", and not allowed to say anything about what your slope parameters are.

On the other hand if you include dummies at the 1 digit industry level, you control for industry at more coarse level, but you are allowing every industry to speak regarding what your slope estimates are.

Originally posted by Marinela Veleva View Post

Joro Kolev , thank you so much for the invaluable advice! Could you clarify one thing - what do you mean by "if you include dummies at the 2 digits level, you would be throwing out the information coming from those singleton industries."? Would there be any source, which I can use for backing up my choice of 1-digit dummies? This is just because my supervisor is extremely questioning every time.

Best!

Announcement

How to interpret interaction terms with continuous variables?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment