How do I obtain and show percentiles of an income variable on kdensity graph?

Yana Volter

Join Date: Oct 2015
Posts: 2

How do I obtain and show percentiles of an income variable on kdensity graph?

29 Aug 2016, 17:33

Hello everyone!!

I have the CPS 2015 data (which I got here: http://admin.nber.org/data/current-p...rvey-data.html
and contains 81926 observations and 706 variables.)
and I want to show the income threshold of the top percentiles
(90,95,99,99.5, 99.9 & 99.99%) on the kdensity graph.

I didn't find the answer in the graph editor and the kdensity help file, nor when I searched the web...

Can someone please help me understand what am I missing?

**I use Stata 14.
*the graph commands:

gen f_mar_income_out=(ffrval+fseval+fwsval+fretval+frn tval)/0.95
drop if f_mar_income_out==f_mar_income_out[_n-1]
gen f_mar_income_out_tsnts=f_mar_income_out/1000
kdensity f_mar_income_out_tsnts, bwidth (100)

*sum of all the variables used above:

Variable	Obs	Mean	Std. Dev.	Min	Max

ffrval	81,926	281.0757	5550.639	-9999	750000
fseval	81,926	2572.355	19685.59	-19998	1099999
fwsval	81,926	54379.69	78971.12	0	2199998
fretval	81,926	3139.306	13054.87	0	252000
frntval	81,926	649.1127	5731.067	-19998	199998
f_mar_income_out_tsnts	81,926	64.23319	86.21664	-21.0505	2315.787

*example of the inc. var. which percentiles I need to obtain:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float f_mar_income_out_tsnts
48.42105
16.842106
0
23.157894
0
113.68421
49.47368
2.588421
53.68421
0
9.473684
9.368421
31.578947
162.10527
121.05264
0
72.63158
123.6863
0
21.05263
22.10526
48.42105
0
36.842106
55.38
17.894737
0
1.0526316
33.68421
94.73685
140.00105
78.94736
15.68421
24.210526
29.473684
42.10526
0
26.31579
25.26316
73.68421
10.736842
30.526316
63.15789
175.78947
81.05264
44.21263
21.89474
362.5263
0
38.42105
16
0
23.157894
0
115.58
101.05264
63.15789
32.105263
52.63263
32.105263
87.87369
134.73685
154.7379
110
199.3158
37.157894
8.421053
0
63.15789
0
12.63158
165.26315
249.4737
0
18.947369
0
10.526316
126.31579
10.526316
36.843155
32.631577
19.130526
12.63158
3.7894735
144.21053
42.10526
63.15789
38.94737
15.789474
87.10526
16.842106
31.578947
0
22.989475
42.10526
67.36842
-2.1052632
42.61895
0
40
end

Last edited by Yana Volter; 29 Aug 2016, 17:40.

Tags: None

Charlie Joyez

Join Date: Dec 2014

Posts: 421
#2

30 Aug 2016, 01:49

What do you mean by showing percentile on the kdensity graph?
The first idea that comes to my mind is to draw xlines (i.e. vertical lines) corresponding to the percentiles you want.
short example :

Code:

sysuse auto.dta,clear su weight,de local med=r(p50) local p5=r(p5) local p95=r(p95) kdensity weight, xline(`p5', lcolor(gray) lpattern(dash) ) /// xline(`med', lcolor(cranberry) lpattern(dot)) /// xline(`p95', lcolor(maroon) lpattern(dash))

But perhaps that's not what you want. In that case please be more specific : how do you want these values to appear?

Best,
Charlie
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#3

30 Aug 2016, 02:02

You can do what (I understand) you want to do by using twoway kdensity and overlaying the percentile values using the xline() option having first created those values. Consider something like the following:

Code:

_pctile income [w = weight_var], percentiles(90 95 99 99.5) local p90 = r(r1) local p95 = r(r2) local p99 = r(r3) local p99_5 = r(r4) twoway /// (kdensity income [aw = weight_var] , /// xline(`p90' `p95' `p99' `p99_5')

You say that you are using US Current Population Survey data. Your calculations appear to ignore 2 things.
(1) You appear not to be using the weights.
(2) CPS income variables are top-coded (or used to be -- there have been some changes in practice recently). Those maximum values reported in your summarize output look suspiciously like top-codes to me. If this is the case, simply adding together these variables is not necessarily appropriate. (The income variable that is the sum of the top-coded components is itself top-coded, but in a complicated way. Observations with top-coded total income are not necessarily the observations with the greatest total income.) How to deal with top-coded income variables in CPS data has been the subject of a number of recent papers: do a web-search on "Burkhauser CPS topcode", and there are articles in the Review of Income and Wealth, Review of Economics and Statistics, Journal of the Royal Statistical Society (Series A), etc etc.
In addition,
(3) you might find it more appropriate to examine the density of log(income) rather than income itself. It depends on what you are trying to do, e.g. which parts of the distribution you want to focus on
Comment

Announcement

How do I obtain and show percentiles of an income variable on kdensity graph?

Comment

Comment