Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I obtain and show percentiles of an income variable on kdensity graph?

    Hello everyone!!

    I have the CPS 2015 data (which I got here: http://admin.nber.org/data/current-p...rvey-data.html
    and contains 81926 observations and 706 variables.)
    and I want to show the income threshold of the top percentiles
    (90,95,99,99.5, 99.9 & 99.99%) on the kdensity graph.

    I didn't find the answer in the graph editor and the kdensity help file, nor when I searched the web...

    Can someone please help me understand what am I missing?


    **I use Stata 14.
    *the graph commands:

    gen f_mar_income_out=(ffrval+fseval+fwsval+fretval+frn tval)/0.95
    drop if f_mar_income_out==f_mar_income_out[_n-1]
    gen f_mar_income_out_tsnts=f_mar_income_out/1000
    kdensity f_mar_income_out_tsnts, bwidth (100)

    *sum of all the variables used above:
    Variable Obs Mean Std. Dev. Min Max
    ffrval 81,926 281.0757 5550.639 -9999 750000
    fseval 81,926 2572.355 19685.59 -19998 1099999
    fwsval 81,926 54379.69 78971.12 0 2199998
    fretval 81,926 3139.306 13054.87 0 252000
    frntval 81,926 649.1127 5731.067 -19998 199998
    f_mar_income_out_tsnts 81,926 64.23319 86.21664 -21.0505 2315.787
    *example of the inc. var. which percentiles I need to obtain:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float f_mar_income_out_tsnts
    48.42105
    16.842106
    0
    23.157894
    0
    113.68421
    49.47368
    2.588421
    53.68421
    0
    9.473684
    9.368421
    31.578947
    162.10527
    121.05264
    0
    72.63158
    123.6863
    0
    21.05263
    22.10526
    48.42105
    0
    36.842106
    55.38
    17.894737
    0
    1.0526316
    33.68421
    94.73685
    140.00105
    78.94736
    15.68421
    24.210526
    29.473684
    42.10526
    0
    26.31579
    25.26316
    73.68421
    10.736842
    30.526316
    63.15789
    175.78947
    81.05264
    44.21263
    21.89474
    362.5263
    0
    38.42105
    16
    0
    23.157894
    0
    115.58
    101.05264
    63.15789
    32.105263
    52.63263
    32.105263
    87.87369
    134.73685
    154.7379
    110
    199.3158
    37.157894
    8.421053
    0
    63.15789
    0
    12.63158
    165.26315
    249.4737
    0
    18.947369
    0
    10.526316
    126.31579
    10.526316
    36.843155
    32.631577
    19.130526
    12.63158
    3.7894735
    144.21053
    42.10526
    63.15789
    38.94737
    15.789474
    87.10526
    16.842106
    31.578947
    0
    22.989475
    42.10526
    67.36842
    -2.1052632
    42.61895
    0
    40
    end
    Last edited by Yana Volter; 29 Aug 2016, 17:40.

  • #2
    What do you mean by showing percentile on the kdensity graph?
    The first idea that comes to my mind is to draw xlines (i.e. vertical lines) corresponding to the percentiles you want.
    short example :
    Code:
    sysuse auto.dta,clear
    su weight,de
    local med=r(p50)
    local p5=r(p5)
    local p95=r(p95)
    
    kdensity weight, xline(`p5', lcolor(gray) lpattern(dash) ) ///
                     xline(`med', lcolor(cranberry) lpattern(dot)) ///
                     xline(`p95', lcolor(maroon) lpattern(dash))
    But perhaps that's not what you want. In that case please be more specific : how do you want these values to appear?

    Best,
    Charlie

    Comment


    • #3
      You can do what (I understand) you want to do by using twoway kdensity and overlaying the percentile values using the xline() option having first created those values. Consider something like the following:
      Code:
      _pctile income [w = weight_var], percentiles(90 95 99 99.5)
      local p90 =  r(r1)
      local p95 =  r(r2)
      local p99 = r(r3)
      local p99_5 = r(r4)
      
      twoway   ///
          (kdensity income [aw = weight_var] ,  ///
          xline(`p90' `p95' `p99' `p99_5')
      You say that you are using US Current Population Survey data. Your calculations appear to ignore 2 things.
      (1) You appear not to be using the weights.
      (2) CPS income variables are top-coded (or used to be -- there have been some changes in practice recently). Those maximum values reported in your summarize output look suspiciously like top-codes to me. If this is the case, simply adding together these variables is not necessarily appropriate. (The income variable that is the sum of the top-coded components is itself top-coded, but in a complicated way. Observations with top-coded total income are not necessarily the observations with the greatest total income.) How to deal with top-coded income variables in CPS data has been the subject of a number of recent papers: do a web-search on "Burkhauser CPS topcode", and there are articles in the Review of Income and Wealth, Review of Economics and Statistics, Journal of the Royal Statistical Society (Series A), etc etc.
      In addition,
      (3) you might find it more appropriate to examine the density of log(income) rather than income itself. It depends on what you are trying to do, e.g. which parts of the distribution you want to focus on

      Comment

      Working...
      X