Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why my correlation is negative when the estimated coefficients are positive?

    Dear Statalist Community

    This should be my very last questions for statalist community. Thank you every experts who have helped me this far. I appreciate it a lot.

    I am trying to explain the reason why the correlation between my dependent variable (government expenditure as a % of GDP) and independent variable (immigration) is negatively correlated, while the estimated coefficients are positive (in all cases when I include or exclude each control variables.)

    1. I try to investigate by using
    Code:
    extremes  govex_gdp immi
    and found no outliers. I wonder whether there are other ways in which I can find the reason for this?
    2. I also experiment with another regression that use log on all of my variables. I would like to know whether I can interpret that in terms of one percentage change in standard deviation? eg. one standard deviation in independent variable contributes to ___% of one standard deviation increase increase in dependent variable.




    Code:
    . extremes  govex_gdp immi
    +------------------------+
    | obs: govex_~p immi |
    |------------------------|
    | 901. 10.20876 . |
    | 586. 10.2687 . |
    | 585. 10.3964 . |
    | 902. 10.46668 . |
    | 722. 10.53066 . |
    +------------------------+

    +-------------------------+
    | 879. 27.4907 61872 |
    | 871. 27.63227 32272 |
    | 369. 27.68548 . |
    | 182. 27.69099 28223 |
    | 205. 27.93502 51800 |
    +-------------------------+

    regression without log

    Code:
     xtreg  govex_gdp immi  depratio unem_perlab urbanpop_pertot pop femalepop_pertot i.year ,fe cluster (country)
    Fixed-effects (within) regression Number of obs = 657
    Group variable: country Number of groups = 33

    R-sq: within = 0.2989 Obs per group: min = 2
    between = 0.0395 avg = 19.9
    overall = 0.0627 max = 29

    F(32,32) = .
    corr(u_i, Xb) = -0.7913 Prob > F = .

    (Std. Err. adjusted for 33 clusters in country)
    ----------------------------------------------------------------------------------
    | Robust
    govex_gdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -----------------+----------------------------------------------------------------
    immi | 1.14e-06 5.01e-07 2.28 0.029 1.24e-07 2.16e-06
    depratio | .0329021 .0603795 0.54 0.590 -.0900868 .1558911
    unem_perlab | .1200443 .0409341 2.93 0.006 .0366643 .2034244
    urbanpop_pertot | .2526613 .0645373 3.91 0.000 .1212032 .3841194
    pop | 7.90e-08 1.57e-07 0.50 0.618 -2.40e-07 3.98e-07
    femalepop_pertot | -.9128621 1.047256 -0.87 0.390 -3.046053 1.220329
    |
    year |
    1982 | -.0275117 .1441005 -0.19 0.850 -.3210348 .2660114
    1983 | .0287356 .3412937 0.08 0.933 -.666457 .7239282
    1984 | -.672402 .4074508 -1.65 0.109 -1.502352 .1575481
    1989 | -1.164444 .5968829 -1.95 0.060 -2.380255 .0513666
    1990 | -.7825131 .6641094 -1.18 0.247 -2.13526 .5702335
    1991 | -.3335672 .7901333 -0.42 0.676 -1.943016 1.275882
    1992 | -.4937047 .800642 -0.62 0.542 -2.124559 1.13715
    1993 | -.1212492 .7635012 -0.16 0.875 -1.67645 1.433952
    1994 | -.8561286 .7242583 -1.18 0.246 -2.331394 .6191373
    1995 | -.5426904 .8170002 -0.66 0.511 -2.206865 1.121485
    1996 | -.8923355 .8015554 -1.11 0.274 -2.52505 .7403794
    1997 | -1.156174 .8138863 -1.42 0.165 -2.814007 .5016577
    1998 | -1.249869 .8070755 -1.55 0.131 -2.893828 .3940899
    1999 | -1.177472 .8301466 -1.42 0.166 -2.868425 .5134814
    2000 | -1.617467 .8595532 -1.88 0.069 -3.36832 .1333854
    2001 | -1.411849 .8676217 -1.63 0.113 -3.179137 .3554382
    2002 | -1.143768 .8905438 -1.28 0.208 -2.957747 .6702102
    2003 | -.8068445 .9449567 -0.85 0.400 -2.731658 1.117969
    2004 | -1.323295 .8930714 -1.48 0.148 -3.142422 .4958319
    2005 | -1.32398 .9220267 -1.44 0.161 -3.202087 .5541268
    2006 | -1.543219 .9286263 -1.66 0.106 -3.434768 .3483313
    2007 | -1.860672 .9506458 -1.96 0.059 -3.797074 .0757304
    2008 | -1.130692 .9602998 -1.18 0.248 -3.086759 .8253748
    2009 | .1310738 .9863769 0.13 0.895 -1.87811 2.140258
    2010 | -.3822114 .9845546 -0.39 0.700 -2.387684 1.623261
    2011 | -1.07942 1.007688 -1.07 0.292 -3.132013 .9731725
    2012 | -1.391096 1.023083 -1.36 0.183 -3.475048 .6928561
    2013 | -1.530977 1.035502 -1.48 0.149 -3.640225 .5782712
    |
    _cons | 44.80689 54.43654 0.82 0.417 -66.0767 155.6905
    -----------------+----------------------------------------------------------------
    sigma_u | 4.7367037
    sigma_e | 1.38224
    rho | .92152662 (fraction of variance due to u_i)
    ----------------------------------------------------------------------------------

    regression with log



    Thank you
    Guest
    Last edited by sladmin; 02 May 2018, 08:02. Reason: anonymize poster

  • #2
    Do anyone know by any chances?

    Comment


    • #3
      It is simply not true that all your coefficients are positive. Look at femalepop_pertot. That one's negative. Whether the overall correlation between fitted and observed values will be positive or negative in this situation depends on the distributions of the different predictors. It could come out either way and there is no simple way to predict which way it will be.

      Comment


      • #4
        Dear Clyde

        Truly sorry for the misunderstanding. I mean the coefficient of immi is always positive regardless of whether the regressions have control variables or not.
        This contradicts the simple scatter plot of the correlation between govex_gdp and immi.

        Thank you
        Guest
        Last edited by sladmin; 02 May 2018, 08:02. Reason: anonymize poster

        Comment


        • #5
          OK, that's a different question from what I had understood. Nonetheless, it is still your expectation that is off, and there is nothing wrong with the analyses, nor anything surprising in the results. It is, in fact, quite common that the relationship between variables X and Y will be different when examined in isolation (the correlation between X and Y) and when adjusted for covariates ("control variables"--I hate that term, because except in experiments, you don't actually "control" anything--you just adjust for their effects). Indeed, it is not uncommon for the relationship even to change sign, as it has here. This is a very general phenomenon known as Simpson's paradox (or also called the Yule-Simpson paradox.) Wikipedia has an excellent page on Simpson's paradox and I recommend you read it. The explanation there relates only to discrete variables, but exactly the same phenomenon applies with continuous variables. In the context of continuous variables it is sometimes referred to as Lord's paradox.

          So do read that Wikipedia page and take it to heart.

          Comment

          Working...
          X