Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Main effect changes sign

    Step-1 I run a regression with one independent variable (that is of primary interest to my research)

    Y=b0 +b1 X1 + e

    The result shows X has a negative coefficient, and significant (5% level)

    Step-2 I run regression by including another variable

    Y=b0 + b1 X1+b2 X2 + e

    In result, b1 turns positive and b2 is also positive (and significant)

    Step-3 Interaction

    Y=b0 + b1 X1+b2 X2 + b3 X1 X2 + e

    b1 turns negative (significant) and b3 is positive (and significant)


    What is going on? Can anyone suggest the story behind this econometric exercise?

  • #2
    Ajay:
    the story is probably trivial: different modes give back different results.
    Obviously, interested listers can reply more positively thanks to your sharing what you typed and what Stata gace you back and/or posting an dexample/excerpt of your data via -dataex- (both these recommendations are well covered in the FAQ). Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      If you're calling this econometrics -- looks like mainstream statistics (too) to me -- there is some economic context about what are Y, X1, X2 and what you expect to happen, including whether Y = Xb is a good idea as functional form. There is no information about that here for any economist to comment.

      Otherwise I have found that economists in particular (*) often seem reluctant to look at the data, for some mix of reasons like

      1. That's too much like introductory statistics, left behind in my undergraduate courses.

      2. That's subjective, not rigorous.

      3. Test, test, test is the mantra to follow.

      4. Other economists in my field don't do it much.

      5. There is no "says it all" graph about a moderately complicated model; therefore I will not use graphs at all.

      As a geographer I have no such inhibitions. I would want to see a scatter plot matrix of all three variables, and added variable plots after each regression.

      With an obvious bias, I recommend favplots from SSC as a cosmetic improvement on official avplot and avplots.

      The possible diagnoses are endless, but some are

      a. The model is ill-advised.

      b. One or more variables is better considered on logarithmic scale.

      c. Structure such as mixing groups and outliers is complicating model fits.

      (*) often naturally doesn't mean always; here's a shout-out to those economists in the Stata world who agree strongly or largely with the advice to look at the data too.

      Comment


      • #4
        X1 and X2 must be negatively correlated with each other. The deeper story is hard to tell without more information on the variables themselves (which as #3 said, would allow us to put an economist's hat on, and answer the question) and/or basic descriptive statistics and/or plots, again as suggested in #3.

        Comment


        • #5
          Actually, there is no story to tell at all. This is all routine statistics. O.P. raises the question, I think, out of expectation that b1 should come out more or less the same, or at least of the same sign, in all three models. But that expectation is entirely unfounded.

          A change in b1 between steps 1 and steps 2 will arise if X2 is a confounder of the relationship between X1 and Y. (That is, if X1 is correlated with both of them.) Indeed, it is precisely because of this that we add covariates to models: to reduce omitted variable bias. The value of b1 can change in any way imaginable between steps 1 and 2.

          More subtle is the difference between steps 2 and 3. Because the model in step 3 adds an interaction term, the b1 in step 3 isn't even estimating the same thing as the b1 in step 2. The model in step 2 assumes that the Y:X1 relationship is the same regardless of the value of X2. The model in step 3 explicitly assumes that the Y:X1 relationship depends on the value of X2. In step 2, b1 is the(unique) slope of the Y:X1 relationship, holding for all values of X2. By contrast, in step 3's model, there is no such thing as the slope of the Y:X1 relationship, there are as many different such slopes as there are values of X2. And in step 3, the value of b1 is the particular slope of the Y:X1 relationship that applies when, and only when, X2 = 0. (And if X2 = 0 isn't even a possible value of X2, then b1 represents a non-existent entity and is, in its own right, of no interest whatsoever.)

          Comment


          • #6
            Thanks, everyone. Appreciate that.

            Comment

            Working...
            X