Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation of interaction variable

    Dear all,

    I am analyzing whether a CEO turnover (c. 600 events) has a long-term (5 years) impact on operating performance (OROA). I aim to use the following regression model:

    ∆OROA(t+5)= α + β1FORCED + β2OUTSIDE + β3FORCED*OUTSIDE + β4SIZE + Year Fixed Effects + ε

    ∆OROA(t+5): Change in OROA 5 years after the succession
    FORCED: A dummy assigned the value of one if the succession is categorized as forced is used to capture the effect of forced vs voluntary turnover on operating performance.
    OUTSIDE: A dummy assigned the value of one if the successor CEO is recruited from outside the firm is used to capture the effect of outside vs inside successors on operating performance.
    FORCED*OUTSIDE: Interaction variable
    SIZE: Firm size is measured using the natural logarithm of sales and used to control for the firm size effect on operating performance.

    My questions are:
    1. How do I interpret the coefficient for the interaction variable? See attached screenshot for regression results
    2. Does it make sense to show the results from the same model first without the interaction and second with the interaction variable? The significance changes drastically between the variables.
    All help is appreciated!

  • #2
    There was no screenshot in your post. But just as well--if you read the Forum FAQ you will see that those are strongly discouraged for a number of very good reasons.

    Anyway, let's go over how you interpret all the major coefficients in the output of an interaction model, because I'm willing to speculate that you don't have the interpretation of the non-interaction coefficients right either. (I say that because you probably wouldn't ask the second question if you did.)

    The coefficient of FORCED represents the expected difference in outcome between a forced and an unforced turnover if the turnover is to an insider.

    The coefficient of OUTSIDE represents the expected difference in outcome between a turnover to an outsider and one to an insider if the turnover is unforced.

    The coefficient of the interaction term is represents either of the following (they are the same):
    1. The difference between the effects of forced and unforced turnovers on outcomes when if the turnover is to an outsider.
    2. The difference between the effects of turnover to an outsider or an insider on outcomes if the turnover is forced.

    These interaction models are somewhat difficult to understand because different coefficients represent different kinds of things. (FORCED and OUTSIDE are effects, whereas the interaction is a difference between effects.) They are also difficult because the FORCED and OUTSIDE coefficients do not represent what their names suggest; they are, instead, conditional on the other being 0. So most consumers of these results will have some difficulty understanding them, and will probably not be very interested in them if they do! Most consumers of these results will be more interested in the actual expected outcomes in each of the four combinations of FORCED and OUTSIDE. You can't get those directly from the regression output. You can get them by running appropriate -lincom- commands after the regression. But that's somewhat tedious and it is easy to make mistakes. Which is why we have the -margins- command. To use -margins-, however, you must be sure to use factor-variable notation in your regression command. So it will be something like this:

    Code:
    regression_command outcome_var i.forced##i.outside perhaps_other_variables, perhaps_options
    margins forced#outside
    The -margins- output will show you the expected outcomes in all four combinations. If you are interested in knowing the effect of a turnover's being forced, depending on whether to an insider or an outsider, you can get those with
    Code:
    margins outside, dydx(forced)
    Similarly, if you want the effect of turnover to an outsider, depending on its forced or unforced nature, that comes from
    Code:
    margins forced, dydx(outside)
    Looking at the results this way is usually more helpful than working in terms of the constructs represented by the regression output.

    Turning to your second question, you can now see why it is usually only asked in the absence of understanding. The coefficients of FORCED and OUTSIDE in the non-interaction model are unconditional one-size fits all estimates of the effects of FORCED and OUTSIDE. Consequently they are not estimating the same things as their eponymous coefficients in the interaction model--hence there is absolutely no reason to expect them to be the same, or even close, or even have a passing resemblance to each other.

    Finally, please stop using statistical significance. The American Statistical Association has recommended that the concept be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913. Focus on the coefficients and their magnitudes. Examine whether there are any practically important differences between values at the bounds of the confidence intervals. Report the p-values themselves if you like, but do not characterize them as "significant" or "not significant." The use of the significant/not significant dichotomy is just a recipe for confusion and error.

    Comment


    • #3
      Many thanks - everything you mention makes a lot of sense.

      Two follow-up questions:
      1. I want to be able to make inferences about both the unconditional and the conditional effects of forced vs voluntary and outside vs inside. Does it make sense to have both a non-interaction and an interaction model or can we get all the insights from only using -margins- on the interaction model?
      2. I'm not sure I fully comprehend the difference between the results from
      Code:
      margins Forced##Outside
      and
      Code:
      margins outside, dydx(forced)
      margins forced, dydx(outside)
      Could you please elaborate a bit on that?

      Comment


      • #4
        1. In the interaction model there is, by definition, no such thing as an unconditional effect of any variable that participates in the interaction. Sometimes, however, people are interested in the effect of, say, FORCED, averaged over the insider and outsider observations. This is something like an unconditional effect, but you have to remember that it is completely dependent on the prevalence of insider and outsider observations in the sample you are analyzing. In a different sample with different proportions of insider and outsider turnovers, the average effect of FORCED will be different. Following the interaction model, you can get this average effect of forced by running:
        Code:
        margins, dydx(FORCED)
        Similarly, if you want the averaged effect of OUTSIDE, you can get that from -margins, dydx(OUTSIDE)- after the interaction model is run.

        I think it's important to emphasize that you really do not have the option of choosing between the models with and without interaction. They are models that rely on different assumptions about the world, and they cannot both be true. (Of course, neither one might be true at all, but that's a separate issue.) If the interaction coefficient in your interaction model is large enough to matter for practical purposes, then a model that doesn't contain it is simply mis-specified and should not be used. If, on the other hand, the interaction term coefficient is very close to zero, too small to make any practical difference, then you can simplify things and use the non-interaction model instead. In that case, using the interaction model, though not strictly speaking wrong, is probably just adding some noise to the modeling, and you would be better off sticking to the non-interaction model.

        The point is that if there is, a priori, reason to suspect that there is an interaction, then you must start with the interaction model. If the interaction coefficient is negligibly small, then you can, and usually should, simplify things by going to the non-interaction model. But if the interaction coefficient is not negligible, you really need to stick with the interaction term: the non-interaction model is just wrong.

        2. The output from -margins FORCED#OUTSIDE- -- consists of four rows. Each of the rows corresponds to one combination of FORCED (0/1) and OUTSIDE (0/1). The number in the column headed "Margin" in a row is the model predictions of the expected value of your outcome variable under the conditions defined by that combination of FORCED and OUTSIDE.

        By contrast, the output of -margins, dydx(forced) contains two "superrows." The first superrow just identifies the base level of FORCED. Nothing of interest there. The second superrow corresponds to the other value of FORCED, and it contains two rows, one corresponding to OUTSIDE = 0 and the other to OUTSIDE = 1. The column to the right of that is headed "dy/dx" and the numbers in that column are marginal effects. So, in the row with OUTSIDE = 0, the number in that column is the marginal effect of FORCED conditional on OUTSIDE = 0. In other words, it is the expected difference in outcome between a forced and an unforced turnover when OUTSIDE = 0. In the row with OUTSIDE = 1, the number in that column is the expected difference in outcome between a forced and unforced outcome when OUTSIDE = 1.

        I think the clearest explanation of the -margins- command is to be found in the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. I recommend you read that. It contains several worked examples, some of them similar to your own model.

        Comment


        • #5
          Once again, I very much appreciate your assistance.

          Your post together with Richard Williams' presentation made the interpretation much clearer.

          Thanks!

          Comment


          • #6
            On a related note, I also want to investigate if there is a meaningful difference between the total turnover group (forced and voluntary combined) and the control group, i.e. test the hypothesis that the control-group adjusted performance for the total turnover group is equal to zero. Should a turnover dummy be included in the regression in some way (I guess not since we already have the FORCED variable) or is it sufficient to do e.g. a Wilcoxon Sign Rank test to be able to conclude that the performance difference is not meaningfully different than zero?

            Comment


            • #7
              If your goal is to contrast the turnover group with the controls, I would not do an analysis including the FORCED variable. Just the Turnover variable.

              I also wouldn't do it with a Wilcoxon Sign Rank test because that gives you not "meaningfully different" but "statistically significantly different." In fact, that's really all a Wilcoxon Sign Rank gives you. Unlike, say ttest or regress, where you get a mean difference and you can ignore the test if you don't think its assumptions are adequately met, Wilcoxon Sign Rank gives you only a test statistic and a p-value, neither of which shows you what the difference betwe

              For an assessment of a meaningful difference you might want to compare the means of the two groups using a ttest (or just -regress outcome i.Turnover-). If you are concerned about the distribution of the outcome variable being too non-normal (and if your sample is too small to overcome this problem, which I doubt) you could do a quantile regression on the median or something like that. (-help qreg-).

              Comment


              • #8
                Thanks!

                So if I understand you correctly, I could test my hypothesis the following way:

                H1: CEO succession does not affect subsequent firm performance.
                - Tested with a t-test comparing the means of the turnover group vs the non-turnover control group

                H2: Whether CEO succession is forced or voluntary does not affect subsequent firm performance.
                H3: Whether a CEO succession candidate is appointed from inside or outside the firm does not affect subsequent firm performance.
                - Both tested by looking at the regression model I posted originally together with the marginal effects analysis

                Comment


                • #9
                  H1: Correct, assuming there are no other variables you wish to adjust for. In particular, this is only legitimate if you can credibly assume that any other variable that is predictive of subsequent firm performance has the same distribution in the transition and non-transition groups. Otherwise the results will be suspect due to omitted variable (confounding) bias.

                  H2 and H3 cannot be tested in the interaction model. For H2, the problem is that in the interaction model there is no parameter that corresponds to the effect of forced vs voluntary succession. There are instead two separate parameters: the effect of forced vs voluntary succession if the successor is an outsider, and the effect of forced vs voluntary succession if the successor is an insider. But there is no overall parameter corresponding to "the effect of forced vs voluntary succession." The closest you can come to that is testing the average effect of forced vs voluntary succession, averaged over both insiders and outsiders in the particular proportions that insiders and outsiders occur in your data set. Analogous considerations, with the roles of FORCED and OUTSIDER interchanged apply to H3.

                  H2 and H3 could be done in a non-interaction model. But if the interaction model shows a meaningfully non-zero interaction coefficient, then the non-interaction model is a mis-specification and inferences from it will not be valid.

                  Once again, let me reiterate that you should focus on estimating the size of these various effects, not using statistical significance to test hypotheses about whether these effects exist. Your final conclusions should reflect your best estimates of the sizes of these various effects. The conclusions may be that the effects are meaningfully large, or you might conclude that they are most likely too small to matter, or you may end up concluding that the data are consistent with effects that range between trivially small and meaningfully large. But you should not speak of effects as "existing" or "not existing." That is just the now discredited significance testing in disguise.

                  Comment


                  • #10
                    Thanks for those comments.

                    When performing the regressions it is clear that the interaction variable is of great importance in the model, i.e. having a very meaningfully non-zero coefficient. Does that mean that I cannot in fact test hypothesis 2 and 3 at all? Or is it rather a matter of formulating them in another way? Or could you argue that the average effect tested in the interaction model is a good indicator for whether e.g. forced vs voluntary impacts firm performance?

                    One question on H1; is there a way to adjust the test to account for the omitted variable bias?

                    Comment


                    • #11
                      Taking your second question first: do a regression analysis and include the omitted variables as covariates. A ttest cannot be adjusted.

                      For your first question, consider this analogy. Suppose I observed the effect of a certain coaching intervention on the performance of children on school basketball teams. And suppose I found that the intervention improves performance in girls, but worsens performance in boys. It would make no sense to then speak of whether the coaching intervention is a good one: there are two different answers to that question. Now, you could average these effects, and if they were about equal in magnitude, and if the study sample had about equal numbers of boys and girls, you would conclude that on balance the intervention has little to no effect on performance. But that would completely obscure the fact that the intervention improves girls' performance and worsens that of boys. It would be a misleading conclusion that would lead to poor decision making about whether and when to use the coaching intervention.

                      Now, you have not actually shown your results, and perhaps the differences in effect of forced turnover between insider and outsider turnovers are not as dramatic as that example. Perhaps the effect is the same direction in both, but just a different magnitude. In that case, it would not be as misleading as in my example, but it is still presenting something of an illusion. This is important because whether a turnover will be forced, and whether it is to an insider or an outsider are both events that the firm can control. Since the turnover to an insider or an outsider is something the firm can choose, it makes no sense at all to then ponder an average effect of forcing the turnover: the firm knows who the successor will be, and should evaluate according to the effects of forcing a turnover accordingly--not relying on an average of the actual and counterfactual situations.


                      Comment


                      • #12
                        Thanks!

                        To illustrate the results, here is an example from when using OROA 5 years after the succession as the dependent:
                        Code:
                        . quietly reg ΔOROAt5_adjusted i.FORCED##i.OUTSIDE SIZE i.FISCALYEAR, cluster(CompanyID)
                        
                        . margins FORCED##OUTSIDE
                        
                        Predictive margins                              Number of obs     =        132
                        Model VCE    : Robust
                        
                        Expression   : Linear prediction, predict()
                        
                        --------------------------------------------------------------------------------
                                       |            Delta-method
                                       |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        ---------------+----------------------------------------------------------------
                                FORCED |
                                    0  |  -.0059115   .0110512    -0.53   0.594    -.0278018    .0159787
                                    1  |    .039094   .0222139     1.76   0.081    -.0049074    .0830955
                                       |
                               OUTSIDE |
                                    0  |   .0042655   .0134235     0.32   0.751     -.022324    .0308549
                                    1  |    .014756   .0153956     0.96   0.340    -.0157398    .0452518
                                       |
                        FORCED#OUTSIDE |
                                  0 0  |   .0112071   .0140294     0.80   0.426    -.0165825    .0389967
                                  0 1  |  -.0235569   .0162268    -1.45   0.149    -.0556991    .0085853
                                  1 0  |   -.009155   .0290929    -0.31   0.754    -.0667824    .0484724
                                  1 1  |   .0888277   .0335758     2.65   0.009     .0223204    .1553349
                        --------------------------------------------------------------------------------
                        - The results are clearly showing that there is a meaningful positive effect to a forced-outsider succession, but from what I understand (looking at the p-values) we obtain no meaningful effect in any of the other interactions.
                        - Are the four first rows telling us anything of meaning with regards to hypothesis H2 and H3?

                        Some thoughts on your comments:
                        - If I understand you correctly it makes no sense to have a hypothesis regarding forced vs voluntary by itself, since it is clearly related to the decision about insider vs outsider as well?
                        - One of the papers that I'm replicating are drawing conclusions about the effect of forced vs voluntary, without taking outsider vs insider into account. Can I replicate that in some meaningful way and still add the interaction part or will my conclusion indicate that omitting the interaction made their model plain wrong?
                        - On H1; how do I include the omitted variables as covariates?

                        Comment


                        • #13
                          Clearly the predicted outcome in the forced outsider condition is the largest of all the predicted outcomes, and really by a pretty large margin. The other three conditions are much smaller, and about equal to each other in magnitude, though voluntary insider is slightly positive, and the other two are slightly negative.

                          The first two rows are predicted values for the forced and voluntary conditions averaged over insider and outsider. The second two rows are predicted values for insider and outsider conditions averaged over voluntary and forced. They do not really answer 2 and 3, for the same reasons I outlined in earlier posts in this thread. They are averages that obscure important variation. Looking at those numbers, if you ignored the interactions, you would conclude that neither FORCED nor OUTSIDER makes much difference at all. But you can clearly see in the bottom four rows of the table that there is a huge difference when we have a forced transition to an outsider--an important fact that would be completely hidden by looking at results in the top four rows.

                          Regarding inclusion as covariates: you just add them to the list of regression variables. It's exactly what you did with size and fiscalyear in the current regression.

                          Comment


                          • #14
                            Many thanks.

                            Just to confirm that I have understood everything correctly:
                            1. There is now way to reject the null hypotheses 2 and 3 as they are formulated right now. I need to rethink what I really want to, and have the ability to, test and adjust the hypotheses based on that.
                            2. As I cannot credibly assume that any other variable that is predictive of subsequent firm performance has the same distribution in the turnover and non-turnover groups, I need to do a regression analysis with the Turnover variable instead of the Forced variable and interpret the effect and t-statistics from that:
                              Code:
                              reg ΔOROAt5_adjusted TURNOVER SIZE i.FISCALYEAR, cluster(CompanyID)

                            Comment


                            • #15
                              I'm still struggling to figure out how to do point 2.
                              I have the variable ΔOROAt5_adjusted, which is the difference in performance between the two groups, and I want to see if it is different from zero when controlling for SIZE and year fixed effects (i.FISCALYEAR). I have tried different regression specifications but do not end up with anything meaningful. Does it make sense to use this specification:
                              Code:
                              reg ΔOROAt5_adjusted SIZE i.FISCALYEAR, cluster(CompanyID)
                              and interpret the constant in some way?

                              Comment

                              Working...
                              X