Interpreting effects where the variable is in partly insignificant interaction

Andy Lobo

Join Date: Apr 2016
Posts: 4

Interpreting effects where the variable is in partly insignificant interaction

20 Apr 2016, 21:31

Hi,

I have a few questions regarding the interpretation of variables involved in interactions in my fixed effects model of a 3 year panel dataset, output below. I am attempting to estimate the effect of a CEO change (succession) on subsequent performance. Where post_roa_avg is post-succession performance; pre_roa_avg is pre-succession performance; ceo_change is a dummy for succession; industry is a categoric variable for 10 different industries; roa_perfcrisis is a dummy for pre-succession performance crisis; and ceo_outsider2 is a dummy for the CEO's level of experience within the firm.

My questions are:

1. Given that industry is time-invariant, and so the main effect drops out- how would I interpret the interaction between industry and CEO change? Would it be the difference in the succession effect for each industry, compared to the industry base group (i.e. industry 0)?

2. When trying to work out the effect of the CEO change, is it appropriate to include those interactions that are insignificant?

For example, would the effect of CEO change = 1(5.016801) + 1( -3.944285) + 1(-2.510845) + 1( -3.49886) + 1(-2.774231) + 1(-2.42435) + 1( -6.6174) + 1(-3.788363) + 1(-4.518241) + 1(-5.077814) + 1(-1.868187) + 1(-2.446759)? Whereby I sum all the coefficients involved in CEO change(=1) regardless of their significance?

Or should the effect of CEO change only consider those coefficients that are significant? So where, effect of CEO change = 1( -6.6174) + 1(-3.788363) + 1(-4.518241) + 1(-1.868187) + (-2.446759)? Note, I have used the 10% significance level

This question equally applies to roa_perfcrisis, where the interaction term with CEO_change is significant, but alone roa_perfcrisis is insignificant.

3. On a more general note, what are your thoughts on the trade-off between not "dropping" insignificant variables that "tell a story" (are theoretically important/useful to discuss) vs. the gains in precision and consistency from "dropping" insignificant variables? For reference, I have 1065 observations. Is there any particular best practice that you're aware of, or is it best to include a very well-specified, efficient model with only significant variables AND a more detailed model with a wider range of interesting explanatory variables? And then use the more efficient one to make predictions on the size and sign of the effect, and the more detailed one to "tell the story"?

Thank you for any help!

Code:

. xtreg post_roa_avg c.pre_roa_avg i.ceo_change##i.industry i.ceo_change##roa_perfcrisis i.ceo_ch
> ange##i.ceo_outsider2, fe vce(cluster id)
note: 1.industry omitted because of collinearity
note: 2.industry omitted because of collinearity
note: 3.industry omitted because of collinearity
note: 4.industry omitted because of collinearity
note: 5.industry omitted because of collinearity
note: 6.industry omitted because of collinearity
note: 7.industry omitted because of collinearity
note: 8.industry omitted because of collinearity
note: 9.industry omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      1,065
Group variable: id                              Number of groups  =        355

R-sq:                                           Obs per group:
     within  = 0.0364                                         min =          3
     between = 0.4224                                         avg =        3.0
     overall = 0.2854                                         max =          3

                                                F(15,354)         =       4.12
corr(u_i, Xb)  = 0.4927                         Prob > F          =     0.0000

                                                (Std. Err. adjusted for 355 clusters in id)
-------------------------------------------------------------------------------------------
                          |               Robust
             post_roa_avg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
              pre_roa_avg |   .0752473   .1288437     0.58   0.560     -.178148    .3286426
             1.ceo_change |   5.016801   2.203255     2.28   0.023     .6836854    9.349916
                          |
                 industry |
                       1  |          0  (omitted)
                       2  |          0  (omitted)
                       3  |          0  (omitted)
                       4  |          0  (omitted)
                       5  |          0  (omitted)
                       6  |          0  (omitted)
                       7  |          0  (omitted)
                       8  |          0  (omitted)
                       9  |          0  (omitted)
                          |
      ceo_change#industry |
                     1 1  |  -3.944285   2.724114    -1.45   0.149    -9.301766    1.413196
                     1 2  |  -2.510845   2.182444    -1.15   0.251    -6.803031    1.781342
                     1 3  |   -3.49886   2.354727    -1.49   0.138    -8.129874    1.132153
                     1 4  |  -2.774231   2.525508    -1.10   0.273    -7.741117    2.192655
                     1 5  |   -2.42435   2.714372    -0.89   0.372    -7.762674    2.913973
                     1 6  |    -6.6174   2.234403    -2.96   0.003    -11.01177   -2.223027
                     1 7  |  -3.788363    2.22239    -1.70   0.089     -8.15911    .5823843
                     1 8  |  -4.518241   2.173422    -2.08   0.038    -8.792685   -.2437978
                     1 9  |  -5.077814   4.081574    -1.24   0.214      -13.105    2.949367
                          |
         1.roa_perfcrisis |   .2071931   .5603334     0.37   0.712    -.8948078    1.309194
                          |
ceo_change#roa_perfcrisis |
                     1 1  |  -1.868187    1.11897    -1.67   0.096    -4.068852    .3324775
                          |
          1.ceo_outsider2 |   1.960373   .8667372     2.26   0.024     .2557713    3.664975
                          |
 ceo_change#ceo_outsider2 |
                     1 1  |  -2.446759   1.416299    -1.73   0.085    -5.232178    .3386599
                          |
                    _cons |  -.3026603   .1394599    -2.17   0.031    -.5769343   -.0283863
--------------------------+----------------------------------------------------------------
                  sigma_u |  13.631599
                  sigma_e |  3.9832778
                      rho |  .92133106   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------------

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

20 Apr 2016, 21:59

1. Given that industry is time-invariant, and so the main effect drops out- how would I interpret the interaction between industry and CEO change? Would it be the difference in the succession effect for each industry, compared to the industry base group (i.e. industry 0)?

Yes.

2. When trying to work out the effect of the CEO change, is it appropriate to include those interactions that are insignificant?

For example, would the effect of CEO change = 1(5.016801) + 1( -3.944285) + 1(-2.510845) + 1( -3.49886) + 1(-2.774231) + 1(-2.42435) + 1( -6.6174) + 1(-3.788363) + 1(-4.518241) + 1(-5.077814) + 1(-1.868187) + 1(-2.446759)? Whereby I sum all the coefficients involved in CEO change(=1) regardless of their significance?

Or should the effect of CEO change only consider those coefficients that are significant? So where, effect of CEO change = 1( -6.6174) + 1(-3.788363) + 1(-4.518241) + 1(-1.868187) + (-2.446759)? Note, I have used the 10% significance level

No, not even anything like that. There is no such thing as "the effect of CEO change" in an interaction model. There are 10 different effects of CEO_change, one in each industry. (Actually, in your model with so many interaction terms, it's even more complicated than that.) The gist of it is that in industry 0, the effect of CEO change is 5.02, the coefficient of CEO change. In industry 1, it is 5.02 - 3.94, the sum of the coefficient of CEO change and the coefficient of the 1.industry#ceo_change interaction. For industry 2 it is 5.02 - 2.51, the sum of the coefficient of CEO change and the coefficient of the 2.indstury # ceo_change interaction. But you shouldn't calculate these all yourself: it is tedious, you will probably make mistakes along the way, and you won't get standard errors and confidence intervals for these effects. Instead you should let -margins- do this for you.

Code:

margins, dydx(ceo_change) at(industry = (0(1)9)) level(90)

will give you the marginal effect of ceo_change in each of the ten industries, adjusted to the distributions of all the other variables in your model. This will take only seconds, the results will be correct, and will be accompanied by estimates of uncertainty. Since you want 90% confidence intervals, don't forget to specify the -level(90)- as part of the command. For an even better, fuller exposition of the effects of ceo_change at different levels of the variables it interacts with:

Code:

margins, dydx(ceo_change) at(industry = (0(1)9) roa_perf_crisis = (0 1) ceo_outsider2 = (0 1)) level(90)

But that is a lot of output to plow through and try to make sense of.

If you run just -margins, dydx(ceo_change)- with no -at()- specification, you will get an average marginal effect of ceo change across industries. This is not any of the sums you were illustrating in your question. It is also not, in my view, a useful or meaningful statistic: it is entirely conditional on the exact distribution of all the variables in your data set and does not really generalize to anything else. But if you feel compelled to come up with a single summary statistic for ceo effect, I suppose the average marginal effect is no worse than the other bad alternatives available. Really, though you should be focusing on the many effects of ceo_change at each level of the other variables it is interacted with. If you really believe there is a single meaningful effect, then you shouldn't include interactions in the model in the first place!

3. On a more general note, what are your thoughts on the trade-off between not "dropping" insignificant variables that "tell a story" (are theoretically important/useful to discuss) vs. the gains in precision and consistency from "dropping" insignificant variables? For reference, I have 1065 observations. Is there any particular best practice that you're aware of, or is it best to include a very well-specified, efficient model with only significant variables AND a more detailed model with a wider range of interesting explanatory variables? And then use the more efficient one to make predictions on the size and sign of the effect, and the more detailed one to "tell the story"?

This question defies a brief or simple answer. It depends on what the purpose of your model is, and what the disutility associated with things like over-fitting and model imprecision are. I will say that in general building a model by trying a bunch of variables and then retaining only the ones that turn out statistically significant is one of the worst ways of selecting variables (though not quite as bad as stepwise variable selection). It is rarely optimal for any purpose. By the way, including variables that are not statistically significant does not necessarily reduce model precision and consistency. If the variable in question is associated with the outcome, even to a non-significant extent, adjusting out that variable's separate contribution to the variation in outcome by including it can still result in improved model precision. It's really only the inclusion of variables that have essentially no relationship to the outcome that makes models inefficient. So those are a few generalities, but really the decision about what to include in a model is really among the most difficult things we do in statistics. Each situation requires a separate look.

Last edited by Clyde Schechter; 20 Apr 2016, 22:03.
Comment

Andy Lobo

Join Date: Apr 2016
Posts: 4

22 Apr 2016, 05:21

Clyde, thank you so much for your thorough response- you've made my life a lot easier by introducing me to the use of the margins command in a fixed effects model (I had only previously come across its use in a logit model).

Given my model is complicated by the omission of the time invariant industry variable (see code from #1) does this effect my interpretation?

I would expect the interpretation of Case 1 to be that CEO change in Industry 0, where the CEO was not an outsider and the firm was not in a performance crisis "led to" a 5.03% increase in performance compared to a firm in Industry 0, where the CEO was not an outsider and the firm was not in a performance crisis which had no CEO change. Is the general "industry effect" on performance swallowed up by the individual specific heterogeneity of the fixed effects model, and thereby not relevant to my interpretation here? Note, my Y variable (performance) is a profitability ratio so is in percentage terms, I don't believe this should effect my interpretation.

Further, are the p-values from this table interpretable in the usual way? So a CEO change in Industry 0, with a non-outsider CEO and no performance crisis caused a significant change in performance (p = 0.023), whereas a CEO change in Industry 0, with a non-outsider CEO but in a performance crisis did not cause a significant change in performance (p = 0.182)?

I know you mentioned the level(90) however I am not particularly interested in changing the confidence intervals, I was merely using the 10% SL example as a cut-off point when trying to manually (and very confusingly) calculate the effects in #1.

Thanks for your thoughts on Q3

Code:

 

 quietly xtreg post_roa_avg i.ceo_change i1.ceo_change#i.industry i.ceo_change##(i.ceo_outsider2 i.roa_perfcrisis) c.pre_roa_avg i.y
> ear, fe vce(cluster id)

. margins, dydx(ceo_change) at(industry = (0(1)9) roa_perfcrisis = (0 1) ceo_outsider2 = (0 1))
(note: continuous option implied because a factor with only one level was specified in the dydx() option)

Average marginal effects                        Number of obs     =      1,065
Model VCE    : Robust

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 1.ceo_change

1._at        : industry        =           0
               ceo_outsider2   =           0
               roa_perfcrisis  =           0

2._at        : industry        =           0
               ceo_outsider2   =           0
               roa_perfcrisis  =           1

3._at        : industry        =           0
               ceo_outsider2   =           1
               roa_perfcrisis  =           0

4._at        : industry        =           0
               ceo_outsider2   =           1
               roa_perfcrisis  =           1


------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.ceo_change |
         _at |
          1  |   5.033075   2.207324     2.28   0.023      .706799    9.359352
          2  |   3.170695   2.376205     1.33   0.182    -1.486581    7.827972
          3  |   2.583992     2.2566     1.15   0.252    -1.838864    7.006847
          4  |   .7216115    2.55727     0.28   0.778    -4.290546    5.733769
          5  |   1.077063   1.886979     0.57   0.568    -2.621348    4.775474
          6  |  -.7853171   1.828574    -0.43   0.668    -4.369256    2.798622
          7  |  -1.372021   1.891413    -0.73   0.468    -5.079123    2.335081
          8  |  -3.234401   2.008421    -1.61   0.107    -7.170833    .7020315
          9  |   2.506304   1.065492     2.35   0.019     .4179778    4.594629
         10  |   .6439235   1.000902     0.64   0.520    -1.317808    2.605655
         11  |   .0572197    1.09421     0.05   0.958    -2.087392    2.201831
         12  |   -1.80516   1.318007    -1.37   0.171    -4.388407    .7780864
         13  |   1.524025   .9391773     1.62   0.105    -.3167283    3.364779
         14  |  -.3383547   1.517731    -0.22   0.824    -3.313053    2.636344
         15  |  -.9250586   1.721191    -0.54   0.591     -4.29853    2.448413
         16  |  -2.787439   2.248837    -1.24   0.215    -7.195078    1.620201
         17  |   2.235081   1.859734     1.20   0.229     -1.40993    5.880092
         18  |   .3727011   1.781249     0.21   0.834    -3.118483    3.863885
         19  |  -.2140027    1.36968    -0.16   0.876    -2.898526     2.47052
         20  |  -2.076383   1.504519    -1.38   0.168    -5.025186    .8724208
         21  |   2.592378   1.879931     1.38   0.168     -1.09222    6.276975
         22  |   .7299975   1.450407     0.50   0.615    -2.112747    3.572742
         23  |   .1432937   1.584997     0.09   0.928    -2.963243     3.24983
         24  |  -1.719086    1.32479    -1.30   0.194    -4.315628    .8774552
         25  |  -1.587346   .6586336    -2.41   0.016    -2.878245   -.2964483
         26  |  -3.449726   .7740216    -4.46   0.000    -4.966781   -1.932672
         27  |   -4.03643      1.513    -2.67   0.008    -7.001856   -1.071004
         28  |   -5.89881   1.768558    -3.34   0.001    -9.365121     -2.4325
         29  |   1.221561   1.140644     1.07   0.284     -1.01406    3.457182
         30  |  -.6408193   .8437225    -0.76   0.448    -2.294485    1.012846
         31  |  -1.227523    .968404    -1.27   0.205     -3.12556    .6705138
         32  |  -3.089903   1.010902    -3.06   0.002    -5.071235   -1.108571
         33  |   .4947036   .9697279     0.51   0.610    -1.405928    2.395335
         34  |  -1.367676   1.407755    -0.97   0.331    -4.126825    1.391472
         35  |   -1.95438   1.097374    -1.78   0.075    -4.105193    .1964329
         36  |   -3.81676   1.708493    -2.23   0.025    -7.165345   -.4681755
         37  |  -.0611375   3.647825    -0.02   0.987    -7.210742    7.088467
         38  |  -1.923518   3.674856    -0.52   0.601    -9.126103    5.279068
         39  |  -2.510221   3.433382    -0.73   0.465    -9.239526    4.219084
         40  |  -4.372601   3.558001    -1.23   0.219    -11.34616    2.600953
------------------------------------------------------------------------------

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#4

22 Apr 2016, 09:04

Given my model is complicated by the omission of the time invariant industry variable (see code from #1) does this effect my interpretation?

No. It does mean that you can't make any inferences about the effects of industry, but that is an inherent limitation of a fixed-effects model, whether it contains interaction terms or not.

I would expect the interpretation of Case 1 to be that CEO change in Industry 0, where the CEO was not an outsider and the firm was not in a performance crisis "led to" a 5.03% increase in performance compared to a firm in Industry 0, where the CEO was not an outsider and the firm was not in a performance crisis which had no CEO change.

Almost. As a small tweak, to avoid implying causality that you cannot support with observational data, I would say "is associated with" rather than "led to." Also, it isn't a 5.03% increase. It's an increase of 5.03 percentage points. So, if in the absence of CEO change the outcome were, for example, 2.0%, in the presence of CEO change it is 2 + 5.03 = 7.03%, not 2*(1.0503) = 2.1006%.

Is the general "industry effect" on performance swallowed up by the individual specific heterogeneity of the fixed effects model, and thereby not relevant to my interpretation here?

Correct.

Note, my Y variable (performance) is a profitability ratio so is in percentage terms, I don't believe this should effect my interpretation.

Well, only in terms of units. Since Y is in %, marginal effects on Y are in percentage points.

Further, are the p-values from this table interpretable in the usual way? So a CEO change in Industry 0, with a non-outsider CEO and no performance crisis caused a significant change in performance (p = 0.023), whereas a CEO change in Industry 0, with a non-outsider CEO but in a performance crisis did not cause a significant change in performance (p = 0.182)?

Well, let's just say that the p-values in the -margins- output table are tests of the null hypotheses that each corresponding marginal effect is zero. So their interpretation is the same as any other null hypothesis test p-value. But, look, you've got 40 separate tests here, so the nominal p-value can't be taken seriously. If you use the conventional 0.05 significance level, there could be several type I errors among these outputs. Moreover, the fact that a particular marginal effect is not statistically significantly different from 0, in any context, does not support the conclusion that the effect is actually zero (often paraphrased as "no effect"). It just says that the effect is small enough that, given the level of precision supported by the data, we can't say with confidence that it isn't zero. Issues of sample size and measurement error loom large in such circumstances. And if you think about it, the null hypothesis of zero effect of CEO change is just a straw man: it is really not plausible that changing CEO would have no effect on a firm's performance. The effect might be small, and it might be positive or negative, but the notion that there would be no effect at all is really rather laughable, isn't it?

I guess my point is that I really don't like p-values in this context and I generally avoid looking at them altogether. I usually approach this kind of data by looking at the magnitude of the marginal effects and their confidence intervals. And, at least in this context, I have banished the phrase "statistically significant" from my vocabulary.

I know you mentioned the level(90) however I am not particularly interested in changing the confidence intervals, I was merely using the 10% SL example as a cut-off point when trying to manually (and very confusingly) calculate the effects in #1.

Understood.

Thanks for your thoughts on Q3

You're welcome.

Last edited by Clyde Schechter; 22 Apr 2016, 09:09. Reason: Remove stray material inadvertently pasted in the wrong place.
Comment

Announcement

Interpreting effects where the variable is in partly insignificant interaction

Comment

Comment

Comment