top observations in a variable

Joseph Chav

Join Date: Nov 2022

Posts: 7
#1

top observations in a variable

08 Nov 2022, 17:11

I have a variable, let's say X, with 100+ categories. Then, I want to see the distribution of Y in these categories. However, I only want the top 10/ or 5 percent in my graph. Is there any way to specify it in the below command

graph hbar Y, over( X, sort(1) ) by(year)

Thank you
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

08 Nov 2022, 17:20

Code:

h egen rank
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 36053

09 Nov 2022, 02:39

The top 10 categories will have more than 5% of the total!

Your question could be read in various ways and here is one in the absence of a data example from you.

In this dataset there are 13 categories and we suppose that we want to show just the 5 most frequent.

Code:

webuse nlswork, clear

bysort occ_code : gen freq = _N
egen tag = tag(occ_code)
egen rank = rank(-freq) if tag, unique 
sort rank occ_code
l rank freq occ_code if rank < .

graph hbar (asis) freq if rank <= 5, over(occ_code, sort(rank)) l1title(Occupation code) ytitle(Frequency) blabel(bar) ysc(alt r(0 12000))

Click image for larger version

Name: top5.png
Views: 1
Size: 14.4 KB
ID: 1688725

And here is some more technique:

Code:

gen cu_freq = sum(freq) if rank < . 
su cu_freq, meanonly

gen cu_prob = 1 - cu_freq / r(max)

list rank cu_* occ_code if rank < . 


graph hbar (asis) freq if cu_prob >= 0.2 & cu_prob < ., over(occ_code, sort(rank)) l1title(Occupation code) ytitle(Frequency) blabel(bar) ysc(alt r(0 12000))

Comment

Joseph Chav

Join Date: Nov 2022

Posts: 7
#4

09 Nov 2022, 09:29

Thanks Jared! Thank you so much Nick, for your detailed comment and the resouces.
Comment

Announcement

top observations in a variable

Comment

Comment

Comment