Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collinearity issues

    Hi everyone,

    I am investigating the effect of financial adviser reputation on various dependent variables (CAR, Deal premium, Time to completion) in the context of M&A. By classifying bidder and target advisers into top-tier and non-top-tier categories, I create various dummy variables (Bidder and Target top-tier) and interaction terms of adviser reputation. My model is specified in the following way (see attachment):

    reg Target_CAR Bidder_Adviser_Top_Tier T_Adviser_Top_Tier (Bidder_Adviser_Top & Target_Adviser_Top) (Bidder_Adviser_Top & Target_Adviser_Non_Top) (Bidder_Adviser_Non_Top & Target_Adviser_Top)

    However, Stata seems to omit two interaction terms because of collinearity. Does anyone have an idea why this happens? As far as I know this should be possible because these terms are not directly linked and I already excluded the interaction term of non-top-tier bidder and target advisers. Is my model specified incorrectly specified, or what could have gone wrong?

    Thank you in advance!

    Martin
    Attached Files

  • #2
    Bidder advisor and target advisor are each classified into two categories: top tier and bottom tier. So you have, in terms of information (though not necessarily represented this way in your variables) a dichotomous variable for bidder advisor top tier, and another dichotomous variable for target advisor top tier. With two dichotomous variables, the interaction consists of a single dichotomous variable. You have put in three dichotomous variables: so two of them are necessarily redundant. Stata chose to retain the one you call BT_TNT, but, in fact, it could just as well have kept any one of them and dropped the other two.

    Thinking about it less abstractly, If you know B_Adviser_Tier and T_Adviser_Tier and BT_TNT, then you can directly calculate BT_TT and BNT_TT from these. So BT_TT and BNT_TT are redundant, and Stata, of course, omits them.

    In any case, this is the year 2017, and Stata has had factor-variable notation for several years now. So you shouldn't be wasting your time generating your own interaction terms anyhow. Let Stata do the work for you, and you won't have headaches with things like this:

    Code:
    regress T_CAR2_w i.B_Adviser_Tier##i.T_Adviser_Tier
    Do read -help fvvarlist- and the linked section of the manual for more information about factor-variable notation.

    Comment


    • #3
      Thank you very much for taking the time to help me out Clyde. Your explanation makes a lot of sense.

      Would it be possible to interpret the three interaction terms when the main effects (B_Adviser_Top and T_Adviser_Top) are excluded from the regression? Before running this model, I already look at the effect of these two dummy variables. So I have all the statistics when the interaction terms are not included.

      Comment


      • #4
        Would it be possible to interpret the three interaction terms when the main effects (B_Adviser_Top and T_Adviser_Top) are excluded from the regression?
        Yes, it is possible, but you shouldn't. Models with interaction terms that don't also contain the corresponding main effects are, in general, mis-specified.

        But let's think about what it would mean in your case. The interaction term 1.B_Advisor_Top#1.T_Adviser_Top will take on the value 1 when, and only when, both the bidder and target advisers are top tier. In any other circumstances that interaction term is zero. So a model with only that interaction term and no main effects is equivalent to running this:

        Code:
        gen both_advisors_top_tier = (B_Adviser_Top == 1) & (T_Adviser_Top == 1) if !missing(B_Adviser_top, T_Adviser_Top)
        regress T_CAR2_w i.both_advisors_top_tier
        So you would interpret that model just the way you would interpret this one. But it is better not to think of it as an interaction model, because in this situation that that "interaction" is not functioning as an interaction between two other variables--it's just a shorthand way of writing a more complicated variable that stands on its own.

        The reason for making a point of not doing this with interaction notation is that it's easy, when coding, to mistakenly write a model that includes an interaction term and omits one or more of the main effects (or sub-interactions if it is a multi-way interaction). If your brain and eye are in the habit of expecting to always see all of the main effects and sub-interactions when there is an interaction, you are more likely to notice and then fix that error. But if your brain and eye don't form that habit of giving you the gnawing feeling that "something's wrong," then you'll miss it. And this is an error that Stata will not pick up, as it is perfectly legal syntax.

        Comment

        Working...
        X