Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Main effects and interaction terms

    Dear all,

    If the main effect, interaction terms and time dummies are included in the panel regression and if the yearly main effect is constant across panels then why is the inclusion of main effect necessary given that it is captured by time dummies?

    Note: Main effect along with interaction terms are the focus of the research.

    Kind regards.

  • #2
    It isn't necessary. You can omit it if you like. If you don't omit it, Stata will omit it for you due to the colinearity with the time indicators.

    As a matter of good programming practice, it is best not to omit it, and leave it up to Stata. That way, if there is an error in the data so that the yearly main effect is, in fact, not constant across panels, Stata will leave it in, and when you see that in the output you will know you have a data error somewhere. If you just omit mention of it in your command, you won't find out about the problem.

    As for "Main effect along with interaction terms are the focus of the research," two comments:

    1. The main effect is not estimable if it is colinear with the time indicators, so your research goal cannot be achieved.

    2. The "main effect" is usually unimportant anyway, so why should it be the focus of your research? Remember that in an interaction model, the "main effect" is not the effect of anything. It is the effect of that variable conditional on the other variable(s) involved in the interaction being zero. This is typically not an effect that anybody cares about.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      2. The "main effect" is usually unimportant anyway, so why should it be the focus of your research? Remember that in an interaction model, the "main effect" is not the effect of anything. It is the effect of that variable conditional on the other variable(s) involved in the interaction being zero. This is typically not an effect that anybody cares about.
      I very often do care about that, but I need to make sure that 0 means something and is not located outside the range of the data. At the very least it tells me if something went wrong, just like your advise on keeping the main effects in. But also, an interaction effect tells us how much an effect changes. To see whether that is big or small you need to have a baseline to compare it with, that is typically the main effect.

      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thanks for the comments and guidance, Clyde and Maarten.

        In a two-country dataset, a particular quality of a country is included and is constant across panels (firms) of that country but the quality varies over years. The country's quality is interacted with another variable (b). Research aim is to identify the effect of country's quality on dependent variable and to ascertain whether the quality strengthens/weakens the association between (b) and the dependent variable. Literature includes main effect, interaction term and time dummies. So, in this case, what will be the way, if any, in which I can get the output for country's quality? (Also. country's quality and the variable (b) can be 0).

        Also, Maarten states "an interaction effect tells us how much an effect changes. To see whether that is big or small you need to have a baseline to compare it with, that is typically the main effect". Does this mean main effect should always be included and let Stata do the rest, in cases where the 0 means something?

        Kind regards.

        Comment


        • #5
          So I think we need to clarify some things here.

          Research aim is to identify the effect of country's quality on dependent variable and to ascertain whether the quality strengthens/weakens the association between (b) and the dependent variable.
          So we have an outcome, y, a time-dependent variable quality, q, which is constant across panels at any time. We have another variable of interest, b. And there are firms (firm_id), within which data is nested.

          So the outline of the code for an analysis looks something like this:
          Code:
          xtset firmid year
          regression_command y i.year q##b, fe // or perhaps re
          Let's clarify terminology. The term "main effect" of q in this model, were it estimable, would be represented by the coefficient of q in the output (or some transform of the coefficient such as an odds ratio). The reason I so dislike the terminology "main effect" for this (though nearly everyone uses it, so we can't escape it) is that people commonly misinterpret this as representing some overall effect of q on y. It is not that. It is the effect of q on y conditional on b = 0. I perhaps overreached in stating that this is never of interest: Maarten rightly points out that it can be of use as a base of comparison for year over year differences and the like. But the important point is that it should not be understood as representing "the effect of the country's quality on the dependent variable." Indeed, by using an interaction model, you are stipulating that there is no such thing as "the effect." Rather, for each level of b there is a separate effect of q. So when you talk about "the effect of the country's quality" you necessarily mean something that is not directly represented in the statistical model. I think that most people, when speaking about this, have in mind, loosely, the average marginal effect of q, which can be estimated by running
          Code:
          margins, dydx(q)
          Of course, there are other reasonable candidates for this. It is important to know just what you mean when you say you want to estimate "the effect" of something that has many different effects. This is a subject matter issue, not a statistical one. You need to decide just what it is you are trying to estimate here.

          That said, now we must confront the additional, statistical, complication that because q is constant across panels in every year, none of the effects of q are estimable in this model, because q is colinear with the i.year variables. This is a difficult dilemma and it prevents you from implementing whatever solution you choose to adopt for the substantive dilemma discussed in the preceding paragraphs. You can create the appearance of solving this problem by omitting one of the year indicators instead. If you were to, for example place i.year at the end of the predictor varlist, Stata will oblige you by retaining q and dropping two year indicators instead of the usual one. Or you could accomplish the same thing by specifying an incomplete range of year indicators explicitly. For example, if your years range from 2000 to 2015, inclusive, you could specify i(2000/2013).year, or i(2001/2014).year or something like that instead of i.year. Each of these will give you some kind of coefficient for q. But if you try several of them you will see the problem: each of them will give you a different coefficient for q. And none of those coefficients is any more credible or useful than any of the others. The effects is statistically unidentifiable from the data, and any identification made by imposing some constraint (i.e. by choosing some colinear variable to omit) is conditional specifically on that constraint. So while you can create the illusion of estimating something about q, you cannot achieve the reality of that with this model. The best you can do is try to pick an identifying constraint that is inherently plausible in its own right, and then estimate the effects of q conditional on that constraint.

          For example, if we could persuade ourselves that in fact the outcome variable y is not subject to yearly shocks, or that those shocks and the concomitant effects of time-varying panel-invariant variables are small enough to ignore, then we could simply omit years from the model altogether. This is equivalent to identifying the model by imposing the constraint that all of the year coefficients are 0. We don't have to be that extravagant. Perhaps we can convince ourselves that conditions in, say, 2008 were sufficiently similar to those in 2000 (which I'm taking to be the base year for the sake of discussion) that we can impose the constraint _b[2008.year] == _b[2000.year]. Then by representing year as i.(2001/2007)year i.(2009/2015).year in the code, we achieve that and we will get estimates about q that are conditional on that much milder belief. In fact, any condition that can be phrased as a linear constraint on the year effects will suffice--the key thing is that it be a credible one. As for what is credible, that, again, is a subject matter issue, not a statistical one.

          Comment


          • #6
            Thank you very much for a comprehensive explanation.

            With the replies, my understanding improved. Also, I reviewed the literature and learnt, to a certain extent, the use of interaction terms in my field.

            I have two research aims. First ascertains the effect between b and y (here I will use regression without interaction and, hence, it only consists of y and b) and the second aim determines whether q has an impact on the effect between b and y (here I will use regression with interaction and, hence, it will consist of y, b, b*q, q). In my field, effect is ascertained through the simple coefficient value in the output table after the regression command in Stata. Value of q depends on the method used while b is continuous and starts from 0.

            And, q is measured in following three ways:

            1- varies from 0 (worse) to 10 (best).

            2-binary where 0 is worst and 1 is best.

            3-varies from -1 (worst) to +1 (best)

            Results based on regression consisting of only y and b indicates positive effect (+coefficient). Now, if the second regression where b*q and q are added indicates that b*q has a negative coefficient then it can be interpreted that q erodes the positive impact of b.

            I think I should not interpret coefficient of (b) in the second regression given that it represents a value that is conditional on q being 0 which is not meaningful. Hence, (b) should be interpreted only in the first regression. It would be wrong if I were to conclude that b affects y based on second regression, hence, the need to use two regressions to solve two research questions.

            Further, I think the significant value of q in the second regression is meaningful as it will show that q can also independently affect y.

            My sample includes firms (panels) from two countries and q represents country's quality which is constant across panels of each country. Here I will use i.year at the end to ensure Stata drops two of the year indicators which will make q estimable as this is also the case in prior literature.

            In order to be certain that I have understood well, could you please indicate whether my aforementioned interpretations are fine?

            Kind regards.

            Comment


            • #7
              I agree heartily with most of your interpretations. I have a reservation about one of them:

              Further, I think the significant value of q in the second regression is meaningful as it will show that q can also independently affect y.
              Just as the coefficient of b in the second regression is conditional on q = 0, the coefficient of q in the second regression is conditional on b = 0. Since you say that b = 0 is the minimum value of b, the coefficient of q, being conditional on b = 0, may indeed be meaningful and interesting. But do remember that it is conditional on b = 0 and is not a general estimate of the effect of q. Once again, there is no such thing as "the effect of q" in the second regression. There is a different effect of q for each value of b, and the coefficient of q gives you only the one where b = 0.

              Everything else looks right to me.

              Comment


              • #8
                Thanks for providing comments.

                It is meaningful given that prior literature states that coefficient of q being conditional on b=0 provides evidence that even when b is absent q affects y.

                Again, thanks for all the help and guidance.

                Comment

                Working...
                X