Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Negative significant coefficient in the regression output, but positive correlated to the dependent variable

    Dear Statalist,
    I have the following problem, i established a data set for 19 counrties for for the variables loggdppercapita, logpopulationgrowth, loglaborforcegrowth, loginvestmentgrowth, logimmigrantinflowgrowth (loginflowgrowth) and ,loghighskilledgrowth & loglowskilledgrowth. Each of the variables represent growth rates derived from stock variables with the help of using ln(Xt)-ln(Xt-1). here is the data set i use :
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(loggdpgrowth logpopgrowth loglaborgrowth loglinvestgrowth loginflowgrowth loginflowgrowth loglowskilledgrowth loghighkilledgrowth)
               .           .           .           .           .           .             .          .
       .06028897   .05732834   .06224922   .01968714   2.1722233   2.1722233    -1.3355235 -3.0879986
       .14959073   .05809558   .06046182   .24298707    -2.76001    -2.76001    -.06662089   .4202158
       .09659618  .062820606   .09320515   .19788386    1.568616    1.568616      .3559149  .26695347
       .06348623   .07720463    .1038266   .16576183   .22314355   .22314355   -.010850448  .47932905
               .           .           .           .           .           .             .          .
       .07526228   .03461574   .08441805   .10717171           0           0      1.168791    2.07472
       .14096007  .007930947  .001570285      .15937    4.204693    4.204693     .16259895  .19635168
       .05861368   .02663594   .05226456 -.020815495   .28394374   .28394374   -.019749345  .17557235
       .04791661  .016343333    .0580097  -.00143478 -.011299555 -.011299555    -.12120999   .1646051
               .           .           .           .           .           .             .          .
       .02917356    .0547166  .022099247  -.08457112 -.031748697 -.031748697   .0016591722   .2244809
        .1497431   .04710162   .07619868   .26300412  .062520355  .062520355   -.028644534   .1824056
       .07806837   .04890824    .0869417    .2349733   .11441035   .11441035     -.3206632  .28793913
      .005620142   .05107695   .06556274   .12757158   .07796154   .07796154     .00693503   .3235654
               .           .           .           .           .           .             .          .
        .3394656   .07706587   .11756635    .4764142           0           0     .31666145   .6000391
       .13709114   .06652695   .07507706 -.013166876    3.295837    3.295837      .3135707   .5379595
       .14667551   .05926096   .10334109    .2542444   .59598345   .59598345      .1354786   .3308052
       .11585438   .05549186    .1709093   .15916912   .25131443   .25131443      .1539352   .4149731
               .           .           .           .           .           .             .          .
       .09773859  .017820256 -.031167237   .05625751    .8708283    .8708283     .10593564  .40993005
       .12835652  .020097736  .014226177    .2477395   -.2348396   -.2348396     .07165857   .4498853
       .05203145  .014837272  .013086108   .04638309   -.1590647   -.1590647     .11718187   .2010917
       -.0212187   .02338935   .00981166  -.14106275   .46134555   .46134555      .3682835  .13479806
               .           .           .           .           .           .             .          .
      -.04634852    .0240464  -.03919502   -.4533068   .11607217   .11607217     .50009257   .9184534
        .2344336   .01330611   .03876052   .42818555    .1162598    .1162598      .2366593   .4694261
       .11440618  .013411246  .011643566    .1222387   -2.104134   -2.104134       .334099   .3697724
      .018550014   .02210497   .02082898 -.007522303    .9555115    .9555115      .3752247   .4297992
               .           .           .           .           .           .             .          .
       .04644706  .017434595   .02244805  -.08934807   2.8716795   2.8716795    .036629435    .285955
        .1205982   .02275811    .0328019   .21358986    .4605249    .4605249    -.05424777   .3142989
       .04553349  .036539227   .05152243   .09624484    -1.94591    -1.94591    -.10048105  .18320465
      .009541729   .02883284   .03411155   .05423969    .2876821    .2876821      .1657095    .376986
               .           .           .           .           .           .             .          .
       .07290115   .02787105   .06626658    .0577916  -.03922071  -.03922071     .15795243   .6558886
       .08842836   .00650998  .013176122   .07879982  -.15860502  -.15860502     .07480585  .06914639
      .025229994   .00313229  .020898214   -.1585224  -.04800922  -.04800922     -.1021394  .14473702
      .069776036 -.008432408   .01599168     .080361     .151806     .151806    -.03380778   .1980607
               .           .           .           .           .           .             .          .
       .19987343   .02663908    .0857481   .16530387    2.564949    2.564949     .06499214     .40892
        .4266123   .05297505    .1862054    .7533885   1.0986123   1.0986123     .27767527   .5843659
       .16746257   .08913268    .1572594    .4855108   .55594605   .55594605      .3035376   .3958891
      -.05264842   .09186222  .063779965  -.48073375   -.6641597   -.6641597 -7.372538e-06  .52683115
               .           .           .           .           .           .             .          .
        .1260369  .067770004   .04672312   .10999452   .02325686   .02325686     .11719574  .28777838
        .2319701  .065532215   .11905802    .3421726  -2.6741486  -2.6741486     .14159507  .28076795
       .08152428   .06404705    .0938804   .14804362    .9162908    .9162908    .015640428  .10464027
      .035896864   .08604117   .13735537   .01379289    .3364722    .3364722   -.008267311  .11057387
               .           .           .           .           .           .             .          .
       .07999881  .033379447  .069668844  .025712887  -.09662683  -.09662683   -.008489441   .3054137
        .1824714   .02973067   .10023113   .27360034    .1847341    .1847341   -.019474776   .3068378
       .04190752  .024460847   .05188055  -.04068007  -.29170623  -.29170623     .07967682   .3000829
       .04532545  .017946353   .03059125   .02074446  -2.1812243  -2.1812243      -.023702      .1662
               .           .           .           .           .           .             .          .
        .0588692   .09820542   .11040684    .2384224   4.0775375   4.0775375      .6811829 -.15111293
        .0933444   .04895349   .06040147   .06949462  -.20633644  -.20633644     -.2785171   .1476927
        .1283672   .06915012   .12217858    .3688009     .189242     .189242     .05244703   .5398125
     .0010086084   .05111548   .07040765   -.1780086   .03390155   .03390155     .11580537 -.05391967
               .           .           .           .           .           .             .          .
       .15620023   .02737427  .012783254   .12055348   .22314355   .22314355      .1101725   .3036527
       .14815983   .02978316   .07763528   .09917857    .4187103    .4187103     .11148968   .3123993
        .0797666   .02903874   .02147943   .13466525   .10008346   .10008346     .05894842   .3627602
      -.00990866   .05593254   .07236302   .05837389    .4519851    .4519851      .3031179   .5471318
               .           .           .           .           .           .             .          .
       .08019076 .0042937896  .005795905  -.05395782    3.988984    3.988984      .3448909   .4854159
       .17384216  .025963364   .09380315    .3845648   -.8979416   -.8979416     .27870867   .5016804
       .02258182   .02052971   .04713079   -.1490518    .6225296    .6225296     .10483156   .3076473
       .02397121   .00662069  .009369194  -.08729347    .2937611    .2937611     .09294757  .18413346
               .           .           .           .           .           .             .          .
        .0610591   .01371697   .05028479  -.08743752    .6931472    .6931472      .3159425   .4750669
        .1781952  .022002054    .0911762   .37180105     .474458     .474458     .52095103   .5417989
        .0858938   .08083726   .15362947     .300678   .44183275   .44183275       .689962  1.0592467
      -.01209606   .06482909   .10468013   -.2078383   -.4643056   -.4643056     .02241987  .37722975
               .           .           .           .           .           .             .          .
      .004620904  .030844213  -.03747214   -.3560828  -.19290367  -.19290367      .0845315   .3682156
        .1705228   .00510424   .00869255    .3001953   .08167803   .08167803   -.029535193  .25239608
       .11211313  .017592434   .04266719   .13156615   .09352606   .09352606     .06133433  .28912866
        .0407626   .03787499   .04560601   .08385364    .2787134    .2787134     .07152986   .3660699
               .           .           .           .           .           .             .          .
      -.04076827   .04728463  .034074426   -.2003454    3.701302    3.701302    -.01998531   .1903751
       .09268976   .02018538   .02258227   .11438596  -.01242252  -.01242252     .05681875  .21937443
       .03966381  .034591876   .04254594   .03872634   .07232066   .07232066     .02772554  .02215173
       .05918493    .0508291   .07872295   .04153123   -2.056452   -2.056452   -.005469023  .07207627
               .           .           .           .           .           .             .          .
       .06878295  .013385585 -.023309294  -.05225018   -.3022809   -.3022809     .15554643   .3763301
        .1430493  .014942925   .02988996   .14127417    .7503056    .7503056     .14055558   .4326245
       .11347052  .025295086   .03829459   .08205475    .3285041    .3285041     .01264964   .3153004
      -.01882726   .03841027   .04199512  -.10670586   .03922071   .03922071     .06742107   .3780761
               .           .           .           .           .           .             .          .
       .06310392    .0645891  .065743335   .10730036    1.332227    1.332227     -.1118574   .1145305
       .15263852   .05794195   .07326381    .3128823  .067139305  .067139305      .3100808  .22639628
       .07891827   .04624218  .036929216   .11334303  -2.2643638  -2.2643638     .26763058   .3770499
    -.0078561185   .04573817  .030834695   -.1980203   -.6931472   -.6931472     .11593544  .13732281
               .           .           .           .           .           .             .          .
               .           .           .           .           .           .             .          .
               .           .           .           .           .           .             .          .
               .           .           .           .           .           .             .          .
               .           .           .           .           .           .             .          .
    end
    My two regression equations are as follows:

    ln(gdppercapitagrowth)=ß0+ß1ln(populationgrowth)+ß 2ln(laborforcegrowth)+ß3ln(investmentgrowth)+ß4ln( inflowgrowth)
    in which the inflow growth variable represents the growth rate of immigrant inflows.

    For the first regression i want to test that immigration inflows lead to lower capital per worker and hence decrease GDP per capita.

    In the second regression i split up the inflow of immigrants by their level of educational attainment lowskilledgrowth & highskillegrowth and exclude the inflowgrowth variable (the rest stay's the same). With the second regression i want to test that highskilled immigrants have a larger postive impact on Gdp per capita then low skilled immigrants.
    After that i followed the following codes:
    /* generate Ln growth variables
    gen loggdpgrowth=ln(GDPpercapita)-ln(GDPpercapita[_n-1])
    gen logpopgrowth=ln(Populationtotal)-ln(Populationtotal[_n-1])
    gen lnlaborgrowth=ln(Laborforcetotal)-ln(Laborforcetotal[_n-1])
    gen lnlinvestgrowth=ln(Investmentinphysicalcapital)-ln(Investmentinphysicalcapital[_n-1])
    gen lninflowgrowth=ln(immigrantinflow)-ln(immigrantinflow[_n-1])
    gen lnlowskilledgrowth=ln(lowskilledimmigrants)-ln(lowskilledimmigrants[_n-1])
    gen lnhighkilledgrowth=ln(highskilled)-ln(highskilled[_n-1])
    */

    First regression
    reg loggdpgrowth logpopgrowth lnlaborgrowth lnlinvestgrowth lninflowgrowth
    hettest
    reg loggdpgrowth logpopgrowth lnlaborgrowth lnlinvestgrowth lninflowgrowth,vce(robust)
    /* run the hausman test to test if random effects or fixed effects */
    xtset country Time
    xtreg loggdpgrowth logpopgrowth lnlaborgrowth lnlinvestgrowth lninflowgrowth,fe
    estimates store fixed
    xtreg loggdpgrowth logpopgrowth lnlaborgrowth lnlinvestgrowth lninflowgrowth,re
    estimate store random
    hausman fixed random

    /* Second regression*/

    reg loggdpgrowth logpopgrowth lnlaborgrowth lnlinvestgrowth lnlowskilledgrowth lnhighkilledgrowth
    hettest
    reg loggdpgrowth logpopgrowth lnlaborgrowth lnlinvestgrowth lnlowskilledgrowth lnhighkilledgrowth,vce(robust)
    /* run the hausman test to test if random effects or fixed effects */

    xtset country Time
    xtreg loggdpgrowth logpopgrowth lnlaborgrowth lnlinvestgrowth lnlowskilledgrowth lnhighkilledgrowth,fe
    estimates store fixed
    xtreg loggdpgrowth logpopgrowth lnlaborgrowth lnlinvestgrowth lnlowskilledgrowth lnhighkilledgrowth,re
    estimates store random
    hausman fixed random

    As for both regressions the hausman test indicates the use of the fixed effects are appropriate i represent them here as follows : See attachments

    What my concern is right now and actually give me quite some headaches is how it can be that that i have partially highly significant negative coefficient's but then in the correlation output they indicate a positive relationship between the independent and dependent variable. I would highly appreciate any suggestions & opinions on this problem.


    Best regards

    Nico Peters

    Attached Files

  • #2
    I haven't looked at the specifics of your problem. But, correlations and coefficients can be opposite signs when suppressor effects are present. For a discussion, see

    http://www3.nd.edu/~rwilliam/xsoc63993/l35.pdf
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      there are other reasons this may happen also; a couple of citations can be found in #3 of http://www.statalist.org/forums/foru...ols-regression

      Comment


      • #4
        As an aside, the following code is probably incorrect:

        Code:
        gen loggdpgrowth=ln(GDPpercapita)-ln(GDPpercapita[_n-1])
        gen logpopgrowth=ln(Populationtotal)-ln(Populationtotal[_n-1])
        gen lnlaborgrowth=ln(Laborforcetotal)-ln(Laborforcetotal[_n-1])
        gen lnlinvestgrowth=ln(Investmentinphysicalcapital)-ln(Investmentinphysicalcapital[_n-1])
        gen lninflowgrowth=ln(immigrantinflow)-ln(immigrantinflow[_n-1])
        gen lnlowskilledgrowth=ln(lowskilledimmigrants)-ln(lowskilledimmigrants[_n-1])
        gen lnhighkilledgrowth=ln(highskilled)-ln(highskilled[_n-1])
        First, at a minimum, these all need to be prefixed with -by country Time, sort:-. Otherwise, the first observation for each new country in the data set will be based on the final observation in some other country that precedes it in the data set. And if the data are not sorted by Time (within country) then you are basing growth in a given year on some random other year (and maybe even some other country). Also, if there are any gaps in years in your data, then the reference to var[_n-1] will result in basing the calculation on the last year for which data is available in your data set, not on the immediately preceding year. A safer way to do this is:

        Code:
        xtset country Time
        gen loggdpbrowth = ln(GDPpercapita)-ln(L1.GDPpercapita)
        gen logpopgrowth = ln(Populationtotal)-ln(L1.Populationtotal)
        // etc.
        The use of the L1 operator after -xtset-ting your data will automaticaly cause Stata to assure that the data are properly sorted, and the references to L1.var will always refer to the immediately preceding year in the same country, never to any other year nor any other country.

        Comment


        • #5
          Thank you very much Richard , Rich and Clyde for the fast response helped me to get a grasp whats going on here. But Clyde when i use your approach Stata generates just 100 missing values which gave an output of :
          .
          .
          .
          .
          .
          Before that i deleted the gaps by hand myself to which you are referring to. So what am i doing wrong i used exactly the same syntax as you proposed.
          Last edited by Nico Peters; 19 Dec 2016, 17:12.

          Comment


          • #6
            Well, that suggests that all your previous results were spurious! The code in #4 guarantees that each log growth rate is calculated from the current time and the immediately preceding time. The fact that it gives all missing values implies that you don't in fact have consecutive values of time in your data.

            Now, it may be a matter of units. For example, if you have annual data and your Time variable is denominated in days (i.e. 1jan2000, 1jan2001, 1jan2002, etc.) then when you -xstset- your data as -xtset country Time-, Stata thinks that you have 364 (and in leap year 365) day gaps between your observations. So if you want Stata to understand that, you have to create a separate variable for just the year. (Similar considerations apply if your time unit is monthly.) To see what I'm talking about:

            Code:
            . clear
            
            . set obs 11
            number of observations (_N) was 0, now 11
            
            . gen time = mdy(1, 1, 2000+_n)
            
            . format time %td
            
            . gen country = 1
            
            . gen x = runiform()
            
            . 
            . xtset country time
                   panel variable:  country (strongly balanced)
                    time variable:  time, 01jan2001 to 01jan2011, but with gaps
                            delta:  1 day
            
            . gen l1 = L1.x
            (11 missing values generated)
            
            . 
            . xtset, clear
            
            . 
            . gen year = yofd(time)
            
            . xtset country year
                   panel variable:  country (strongly balanced)
                    time variable:  year, 2001 to 2011
                            delta:  1 unit
            
            . gen l2 = L1.x
            (1 missing value generated)
            
            . 
            . list, noobs clean
            
                     time   country          x   l1   year         l2  
                01jan2001         1   .0674011    .   2001          .  
                01jan2002         1   .3379889    .   2002   .0674011  
                01jan2003         1   .9748848    .   2003   .3379889  
                01jan2004         1   .7264384    .   2004   .9748848  
                01jan2005         1   .0454151    .   2005   .7264384  
                01jan2006         1   .7459667    .   2006   .0454151  
                01jan2007         1   .4961259    .   2007   .7459667  
                01jan2008         1   .7167162    .   2008   .4961259  
                01jan2009         1    .859742    .   2009   .7167162  
                01jan2010         1   .1340756    .   2010    .859742  
                01jan2011         1   .4884419    .   2011   .1340756

            Comment


            • #7
              Oh im sorry i forgot in my first post the country and time variables i have 19 countries with in 5 year intervals from 1990-2010 :
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long country int Time
               1 1990
               1 1995
               1 2000
               1 2005
               1 2010
               2 1990
               2 1995
               2 2000
               2 2005
               2 2010
               3 1990
               3 1995
               3 2000
               3 2005
               3 2010
               4 1990
               4 1995
               4 2000
               4 2005
               4 2010
               5 1990
               5 1995
               5 2000
               5 2005
               5 2010
               6 1990
               6 1995
               6 2000
               6 2005
               6 2010
               7 1990
               7 1995
               7 2000
               7 2005
               7 2010
               8 1990
               8 1995
               8 2000
               8 2005
               8 2010
               9 1990
               9 1995
               9 2000
               9 2005
               9 2010
              10 1990
              10 1995
              10 2000
              10 2005
              10 2010
              11 1990
              11 1995
              11 2000
              11 2005
              11 2010
              12 1990
              12 1995
              12 2000
              12 2005
              12 2010
              13 1990
              13 1995
              13 2000
              13 2005
              13 2010
              14 1990
              14 1995
              14 2000
              14 2005
              14 2010
              15 1990
              15 1995
              15 2000
              15 2005
              15 2010
              16 1990
              16 1995
              16 2000
              16 2005
              16 2010
              17 1990
              17 1995
              17 2000
              17 2005
              17 2010
              18 1990
              18 1995
              18 2000
              18 2005
              18 2010
              19 1990
              19 1995
              19 2000
              19 2005
              19 2010
               .    .
               .    .
               .    .
               .    .
               .    .
              end
              label values country country
              label def country 1 "Australia", modify
              label def country 2 "Austria", modify
              label def country 3 "Canada", modify
              label def country 4 "Chile", modify
              label def country 5 "Denmark", modify
              label def country 6 "Finland", modify
              label def country 7 "France", modify
              label def country 8 "Germany", modify
              label def country 9 "Ireland", modify
              label def country 10 "Luxembourg", modify
              label def country 11 "Netherlands", modify
              label def country 12 "New Zealand", modify
              label def country 13 "Norway", modify
              label def country 14 "Portugal", modify
              label def country 15 "Spain", modify
              label def country 16 "Sweden", modify
              label def country 17 "Switzerland", modify
              label def country 18 "United Kingdom", modify
              label def country 19 "United States", modify

              Or is that kind of formula not possible to use with my 5 year intevals ?
              Thanks for your attention helps me a lot :-)

              Comment


              • #8
                OK. That's similar to the situation I described in #6, but the solution you need is different: in the -xtset- command you need to tell Stata that consecutive observations are at 5 year intervals:

                Code:
                xtset country Time, delta(5)
                Then the commands using the lag operator will know that L1.var means the value of var 5 years earlier (no more and no less).

                Comment


                • #9
                  Thanks Clyde that worked out now. But do you have any suggestions how i can get rid of this correlation problem or is it just not possible ? As i used population growth and investment of capital as control variables derived from the Solow growth model. So normally one should expect to see then a negative/positive relationship right ?. I'm sorry for asking such basic things but its one of the first times i analyse data in such a depth. But it is very nice to get such helpful & instructive advise

                  Comment


                  • #10
                    You don't have a correlation problem. The problem is just that your expectations are incorrect. There is nothing inconsistent or paradoxical about having positive correlations and negative regression coefficients like that. There is a big difference between a correlation coefficient (which derives from a single predictor) and a multiple regression coefficient where one variable can "correct" for another variable that is correlated with the first's "overshooting" and things like that. Do follow the links provided above by Richard Williams and RIch Goldstein. You might also want to check out the Wikipedia page on Simpson's Paradox, which might also be in play here.

                    Comment


                    • #11
                      sorry problem solved
                      Last edited by Nico Peters; 14 Jan 2017, 12:56.

                      Comment

                      Working...
                      X