Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting inequality decomposition results

    Hello,

    I am trying to apply inequality decomposition techniques to the US Survey of Consumer Finances. I am using packages ineqfac (following Shorrocks 1982) and ineqrbd (following Fields 2003).

    I have four questions:

    1) What is the correct interpretation of binary variables in decomposition analysis? E.g.

    ineqrbd logY age educ Dhhsex blackhisp Dmarried kids Dse Dunemp wageinc bussefarminc intdivinc kginc ssretinc transfothinc debtpay [fw=fwt] if year==1989, noconstant noregr i2

    Regression-based decomposition of inequality in logY
    ---------------------------------------------------------------------------
    Decomp. | 100*s_f S_f 100*m_f/m I2_f I2_f/I2(total)
    ---------+-----------------------------------------------------------------
    residual | 10.1209 0.0004 1.9958 25.6742 5826.4009
    age | -11.4448 -0.0005 38.1775 0.0659 14.9631
    educ | 66.9315 0.0029 53.6437 0.0348 7.9077
    Dhhsex | 14.6494 0.0006 -2.0987 1.2746 289.2482
    blackhisp| -8.9524 -0.0004 1.3629 1.9308 438.1765
    Dmarried | -11.6565 -0.0005 1.9960 0.6976 158.3160
    kids | 9.1816 0.0004 4.3095 0.8291 188.1444
    Dse | 0.3671 0.0000 0.0635 4.0128 910.6409
    Dunemp | 8.3495 0.0004 -1.3938 1.2579 285.4603
    wageinc | 21.5828 0.0010 2.1922 1.2276 278.5796
    bussefarminc| -0.0430 -0.0000 -0.0022 563.7818 1.28e+05
    intdivinc| -1.2066 -0.0001 -0.0659 72.3670 1.64e+04
    kginc | 0.5308 0.0000 0.0133 582.2035 1.32e+05
    ssretinc | -0.6790 -0.0000 -0.4728 3.6536 829.1441
    transfothinc| 0.0037 0.0000 0.0025 132.0923 3.00e+04
    debtpay | 2.2650 0.0001 0.2764 3.8934 883.5458
    ---------+-----------------------------------------------------------------
    Total | 100.0000 0.0044 100.0000 0.0044 1.0000
    ---------------------------------------------------------------------------

    What would be the correct interpretation of contribution of a binary variable such as household sex (Dhhsex)?


    2) Why are the results volatile to logging the dependent var? Is it preferred to use log transformation?

    ineqrbd income age educ Dhhsex blackhisp Dmarried kids Dse Dunemp wageinc bussefarminc intdivinc kginc ssretinc transfothinc debtpay [fw=fwt] if year==1989, noconstant noregr

    Regression-based decomposition of inequality in income

    Decomp. 100*s_f S_f 100*m_f/m CV_f CV_f/CV(total)

    residual 30.8970 1.2925 -0.8500 -273.3506 -65.3455
    age 0.0014 0.0001 2.0031 0.3629 0.0867
    educ -0.3620 -0.0151 -50.5724 -0.2640 -0.0631
    Dhhsex -0.0192 -0.0008 0.5979 1.5984 0.3821
    blackhisp 0.0140 0.0006 -0.4948 -1.9605 -0.4687
    Dmarried -0.2489 -0.0104 9.9806 1.1798 0.2820
    kids -0.0453 -0.0019 -8.4874 -1.2887 -0.3081
    Dse 0.2869 0.0120 3.8114 2.8257 0.6755
    Dunemp -0.0973 -0.0041 3.8557 1.5863 0.3792
    wageinc 5.8093 0.2430 45.6681 1.5716 0.3757
    bussefarminc 0.2985 0.0125 0.4484 33.6926 8.0543
    intdivinc 7.6039 0.3181 7.9572 12.0565 2.8821
    kginc 30.1608 1.2617 6.2378 34.1966 8.1748
    ssretinc 0.2871 0.0120 9.5187 2.7098 0.6478
    transfothinc 0.2246 0.0094 1.1503 16.2856 3.8931
    debtpay 25.1892 1.0537 69.1754 2.7936 0.6678

    Total 100.0000 4.1832 100.0000 4.1832 1.0000


    3) If square terms are included, what would be their interpretation?

    . ineqrbd logY age age2 educ Dhhsex blackhisp Dmarried kids kids2 Dse Dunemp wageinc bussefarminc intdivinc kginc ssretinc transfothinc debtpay [fw=fwt] if year==1989, noconstant
    > noregr i2

    Regression-based decomposition of inequality in logY
    ---------------------------------------------------------------------------
    Decomp. | 100*s_f S_f 100*m_f/m I2_f I2_f/I2(total)
    ---------+-----------------------------------------------------------------
    residual | 21.6590 0.0010 0.8227 74.3799 1.69e+04
    age | -41.5669 -0.0018 138.6583 0.0659 14.9631
    age2 | 62.9746 0.0028 -67.4030 0.2414 54.7799
    educ | 31.1174 0.0014 24.9398 0.0348 7.9077
    Dhhsex | 6.9642 0.0003 -0.9977 1.2746 289.2482
    blackhisp| 0.6091 0.0000 -0.0927 1.9308 438.1765
    Dmarried | -1.2042 -0.0001 0.2062 0.6976 158.3160
    kids | 5.6279 0.0002 2.6416 0.8291 188.1444
    kids2 | -2.3810 -0.0001 -1.1313 1.9253 436.9110
    Dse | 0.5107 0.0000 0.0883 4.0128 910.6409
    Dunemp | -1.8055 -0.0001 0.3014 1.2579 285.4603
    wageinc | 14.8888 0.0007 1.5123 1.2276 278.5796
    bussefarminc| 0.1004 0.0000 0.0050 563.7818 1.28e+05
    intdivinc| 0.1786 0.0000 0.0098 72.3670 1.64e+04
    kginc | 0.6638 0.0000 0.0167 582.2035 1.32e+05
    ssretinc | 0.3625 0.0000 0.2524 3.6536 829.1441
    transfothinc| 0.0202 0.0000 0.0140 132.0923 3.00e+04
    debtpay | 1.2804 0.0001 0.1562 3.8934 883.5458
    ---------+-----------------------------------------------------------------
    Total | 100.0000 0.0044 100.0000 0.0044 1.0000
    ---------------------------------------------------------------------------

    Since the results change quite a bit, does it make sense to include square terms for decomposition?


    And finally:

    4) I have obtained drastically different results for factor decomposition using ineqrbd and ineqfac. As can be seen above, ineqrbd returns larger contrubution of wageinc (wage income) over e.g. business income (bussefarmic). But this result is reversed with ineqfac:

    ineqfac wageinc bussefarminc intdivinc kginc ssretinc transfothinc [fw=fwt] if year==1989, i2

    Inequality decomposition by factor components

    Factor 100*s_f S_f 100*m_f/m I2_f I2_f/I2(Total)

    wageinc 6.8806 0.7312 65.3736 1.2350 0.1162
    bussefarminc 67.5214 7.1760 11.0296 567.5948 53.4071
    intdivinc 5.0297 0.5345 6.4171 72.6795 6.8387
    kginc 18.8564 2.0040 5.4555 584.7025 55.0169
    ssretinc 0.1547 0.0164 8.1420 3.6715 0.3455
    transfothinc 1.5572 0.1655 3.5822 132.6103 12.4778

    Total 100.0000 10.6277 100.0000 10.6277 1.0000


    I tried to read up on that but it is still not clear to me why the results are so different. I noted that when using income instead of log of income in ineqrbd (see point 2) the results of ineqrbd and ineqfac are more consistent.


    Thank you for your help!

  • #2
    Hi, Did you get any help on this? I have exactly the same queries. Many thanks!

    Comment


    • #3
      Let me try to answer the questions in this post.

      Starting with questions 2 and 3: that you find different regression
      results should not come as a surprise. If you change the dependent
      variable or change a specification of a regression equation you can
      expect to get different results. In general statistician would favour
      regressions with normally distributed residuals and error terms with
      constant variance (i.e., homoskedasticity). One could try to find a
      transformation of the dependent variable and functional forms for the
      independent variables to achieve that. But one shouldn't expect
      regression results to be insensitive to such transformations.

      As for log transformation of income, that is almost universally done
      in income regressions. Likewise, adding a squared independent variable
      to the regression is intended to take possible nonlinearities into
      account, and also very common practice.

      My answer to question 1:

      The contribution to inequality of 'y' (say, log income), attributed to 'x', as computed by ineqrbd (not using
      the Fields option) equals

      s_f = b * cov(x,y) / var(y),

      where b is the regression coefficient for x from the underlying
      regression. Now let x be a dummy variable, defined as being a female
      respondent (x=1) or not (x=0). Then the covariance above amounts to

      cov(x,y) = share(female respondents) times ([average y of female respondents] - [average y of all respondents]).

      Take, for example, the case where b < 0 (so, other things equal, being
      female tends goes with lower y). Then if also average y of females is
      lower than the overall average, both b and cov(x,y) are negative so
      that s_f is positive: a positive share of y-inequality can be
      attributed to y-differences between females and non-females.

      On the other hand, let average female y be higher than overall average y, while
      still b < 0. Then s_f is negative and a negative share of inequality
      can be attributed to 'femality'. This makes sense. Consider adding a
      respondent to the survey with average characteristics for the
      variables other than x. Then if this respondent is female, including
      her reduces (since b < 0) average y among female respondents more than
      that it reduces the overall average of y. The result is a reduction of
      y-inequality.

      So to the posted question, or the more general question 'What is the
      correct interpretation of categorical variables in decomposition
      analysis?' I would answer: 'the degree to which average income
      differences across the different categories contribute to or detract
      from overall inequality, over and above the contributions of the other
      factors that are considered'.

      Comment

      Working...
      X