Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Flip sign after including other predictors

    Hi,

    I am running a Fixed Effect in Stata (xtreg y x,fe) and I obtain a negative and significant coefficient for x. However when I add more explanatory variables (xtreg y x z, fe) the coefficient on x becomes positive (and it is still statistically significant).

    What can be causing this issue? How can I fixed it? What does this tell me about the true effect of x on y?

    Thank you

    Marco

  • #2
    Marco:
    posting what you typed and what Stata gave you back (as per FAQ) can increase the chances to spot what's going on with your data. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      You mean like this?

      .ÿversionÿ14.1

      .ÿ
      .ÿclearÿ*

      .ÿsetÿmoreÿoff

      .ÿsetÿseedÿ1345955

      .ÿ
      .ÿinputÿbyteÿ(xÿyÿsimpson)

      ÿÿÿÿÿÿÿÿÿÿÿÿxÿÿÿÿÿÿÿÿÿyÿÿÿsimpson
      ÿÿ1.ÿ1ÿ8ÿ0
      ÿÿ2.ÿ2ÿ9ÿ0
      ÿÿ3.ÿ3ÿ10ÿ0
      ÿÿ4.ÿ4ÿ11ÿ0
      ÿÿ5.ÿ8ÿ1ÿ1
      ÿÿ6.ÿ9ÿ2ÿ1ÿ
      ÿÿ7.ÿ10ÿ3ÿ1
      ÿÿ8.ÿ11ÿ4ÿ1
      ÿÿ9.ÿend

      .ÿ
      .ÿquietlyÿreplaceÿyÿ=ÿyÿ+ÿruniform(0,ÿ0.1)

      .ÿ
      .ÿregressÿyÿc.x

      ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMSÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿÿÿ8
      -------------+----------------------------------ÿÿÿF(1,ÿ6)ÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ11.72
      ÿÿÿÿÿÿÿModelÿ|ÿÿ71.2743725ÿÿÿÿÿÿÿÿÿ1ÿÿ71.2743725ÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿ=ÿÿÿÿ0.0141
      ÿÿÿÿResidualÿ|ÿÿ36.4879943ÿÿÿÿÿÿÿÿÿ6ÿÿ6.08133239ÿÿÿR-squaredÿÿÿÿÿÿÿ=ÿÿÿÿ0.6614
      -------------+----------------------------------ÿÿÿAdjÿR-squaredÿÿÿ=ÿÿÿÿ0.6050
      ÿÿÿÿÿÿÿTotalÿ|ÿÿ107.762367ÿÿÿÿÿÿÿÿÿ7ÿÿ15.3946238ÿÿÿRootÿMSEÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ2.466

      ------------------------------------------------------------------------------
      ÿÿÿÿÿÿÿÿÿÿÿyÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
      -------------+----------------------------------------------------------------
      ÿÿÿÿÿÿÿÿÿÿÿxÿ|ÿÿ-.8123718ÿÿÿ.2372944ÿÿÿÿ-3.42ÿÿÿ0.014ÿÿÿÿÿ-1.39301ÿÿÿ-.2317333
      ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ10.93381ÿÿÿ1.669514ÿÿÿÿÿ6.55ÿÿÿ0.001ÿÿÿÿÿ6.848661ÿÿÿÿ15.01897
      ------------------------------------------------------------------------------

      .ÿregressÿyÿc.xÿi.simpson

      ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMSÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿÿÿ8
      -------------+----------------------------------ÿÿÿF(2,ÿ5)ÿÿÿÿÿÿÿÿÿ=ÿÿ28005.03
      ÿÿÿÿÿÿÿModelÿ|ÿÿ107.752748ÿÿÿÿÿÿÿÿÿ2ÿÿ53.8763739ÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿ=ÿÿÿÿ0.0000
      ÿÿÿÿResidualÿ|ÿÿ.009619051ÿÿÿÿÿÿÿÿÿ5ÿÿÿ.00192381ÿÿÿR-squaredÿÿÿÿÿÿÿ=ÿÿÿÿ0.9999
      -------------+----------------------------------ÿÿÿAdjÿR-squaredÿÿÿ=ÿÿÿÿ0.9999
      ÿÿÿÿÿÿÿTotalÿ|ÿÿ107.762367ÿÿÿÿÿÿÿÿÿ7ÿÿ15.3946238ÿÿÿRootÿMSEÿÿÿÿÿÿÿÿ=ÿÿÿÿ.04386

      ------------------------------------------------------------------------------
      ÿÿÿÿÿÿÿÿÿÿÿyÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
      -------------+----------------------------------------------------------------
      ÿÿÿÿÿÿÿÿÿÿÿxÿ|ÿÿÿ1.006989ÿÿÿ.0138701ÿÿÿÿ72.60ÿÿÿ0.000ÿÿÿÿÿÿ.971335ÿÿÿÿ1.042644
      ÿÿÿ1.simpsonÿ|ÿÿ-14.03507ÿÿÿ.1019244ÿÿ-137.70ÿÿÿ0.000ÿÿÿÿ-14.29708ÿÿÿ-13.77307
      ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ7.035184ÿÿÿ.0410285ÿÿÿ171.47ÿÿÿ0.000ÿÿÿÿÿ6.929717ÿÿÿÿ7.140651
      ------------------------------------------------------------------------------

      .ÿ
      .ÿexit

      endÿofÿdo-file


      .


      You could try Googling regression sign simpsons and see if that helps.

      Comment


      • #4
        there are other possible reasons also; I gave a couple of citations in #4 to:
        http://www.statalist.org/forums/foru...-fe-regression

        Comment


        • #5
          Hi, Thanks everyone for your help. How can I fix the Simpson's paradox?

          I expanded my data. Now I obtain the same sign in all regression but the significance of my variable of interest is changing. I am interested in estimating the effect of diff on y. The dependent variable y will not change in the following regressions (all fixed effects with i.year): Stata output.pdf

          1) First I run diff on y and I obtain a negative not stat sign results for diff
          2) I add GDP and I obtain the same negative and not stat significant results as in 1
          3) I add assets and employee to model one and once again I obtain the same negative and not stat significant results (no GDP here) as before
          4) Finally reg with all variables (GDP, Employee, Assets). Here I get negative stat significant results.

          The last one is actually the model that I am trying to estimate and it is giving me the results I was expecting. However the fact that the statistical significance is changing as I add and delete variables makes me doubt about the accuracy of my results.

          What do you recommend? How can I fix this? What is causing this? What should I do in Stata?

          Thank you
          Marco

          Comment


          • #6
            First of all, are the changes in p-values large, or are we talking about things like 0.08 vs 0.03? If the latter, don't waste another second thinking about this. 0.05 is just an arbitrary magic number anyway.

            If the former, first, there is no reason to expect statistical significance (or even effect sizes) to remain the same, or even approximately the same when the model is changed. As you observed earlier, the introduction of additional predictors can results in major shifts in effect sizes and even directions if the newly introduced variables are not independent of the earlier variables. Also, remember that introducing additional variables usually means losing more observations due to missing values of the new variable(s), so the estimation sample changes (and shrinks).

            I would focus my energies on whether you have done everything correctly with the final model. If you are satisfied with that, there is no reason to expect any of its results to be consistent with the earlier models. And the p-values are the least likely to be consistent of all the outputs.

            Comment


            • #7
              Thank you Dr. Schechter. I thought that the robustenss of the results consisted in having the same variable significance and sign indipendently of how many other varaibles are added to the model.
              How can I conclude that my results are robust if the signifance changes (in my example)?

              Furthermore, How can I fix the Simpson's paradox in case I observe a sign flip?

              Thank you

              Giulia


              Comment


              • #8
                Robustness to adjustment for other variables is neither necessary, nor, in general, possible, and in many situations not desirable. I don't know what you're getting at here. When I think about analyzing the robustness of an analysis, I think about robustness to things like the handling of missing data, or to substitution of different variables that purport to be measures of the same construct, or, when sensible, consistency of results in different subsets of the data. (For the last, only the effect sizes should be consistent, not the p-values.)

                There is no automatic way to "fix" a Simpson's paradox. Simpson's paradox is simply the observation that when there is a confounding variable, the estimated effects can be very different depending on whether one adjusts for the confounding variable or not. You have to then consider what is the question you are asking in your research. That question, if it is clearly posed and focused, will call for either an adjusted or an unadjusted analysis, not both. The analysis called for by your question is the one that you should use to answer the question. (Thinking carefully about this may also reveal that the question actually calls for some analysis that is altogether different from the ones you have done. Better to discover that sooner than later.) Or, you may have several research questions, one of which calls for an adjusted analysis and the other for an unadjusted one--in that case each question has its own answer, they differ, and there is nothing wrong with that. In the end, it boils down to understanding your research questions and matching the analysis to the question.

                Finally, please help us resolve a different paradox. Sometimes you sign your name as Giulia, and sometimes as Marco. The norm in this community is that we use our real first and last names as our username on the Forum. If you are really Marco Maggio, then thank you for observing that norm. If your real name is Giulia something, or even something else entirely, please press on Contact Us (lower right hand corner of the window) and ask the forum administrator to change your user name accordingly. (You cannot change your user name yourself using the edit functions in the profile; you have to contact the administrators to do that.) Thank you.

                Comment


                • #9
                  Giulia (Marco? Who is there?):
                  I thought that the robustenss of the results consisted in having the same variable significance and sign indipendently of how many other varaibles are added to the model..
                  Set aside remote astral convergences, this is not expected to happen, as coefficients express the contribution of each predictor in explaining the <depvar> variation when adjusted for the remaining <indepvars>.
                  Another topic to consider is what you mean by "robustness".
                  I'm under the impression that you mean something along the lines of a "what if" scenario rather than in statistical terms (i.e. robustness vs heteroskedasticity).
                  If the first meaning is what is on your mind, you can report in your paper/article different regression models (it happens quite frequently in technical journal dealing with biostatistics or econometrics, just to mention the first two reserarch field that spring to my mind) and comment difference in coefficients..
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Giulia and I are working together on this paper. She responded to the message as I was coding and by mistake signed it. Sorry for the confusion, and thank you for your precious help.
                    Best,
                    Marco

                    Comment

                    Working...
                    X