Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • top observations in a variable

    I have a variable, let's say X, with 100+ categories. Then, I want to see the distribution of Y in these categories. However, I only want the top 10/ or 5 percent in my graph. Is there any way to specify it in the below command

    graph hbar Y, over( X, sort(1) ) by(year)

    Thank you

  • #2
    Code:
    h egen rank

    Comment


    • #3
      The top 10 categories will have more than 5% of the total!

      Your question could be read in various ways and here is one in the absence of a data example from you.

      In this dataset there are 13 categories and we suppose that we want to show just the 5 most frequent.


      Code:
      webuse nlswork, clear
      
      bysort occ_code : gen freq = _N
      egen tag = tag(occ_code)
      egen rank = rank(-freq) if tag, unique 
      sort rank occ_code
      l rank freq occ_code if rank < .
      
      graph hbar (asis) freq if rank <= 5, over(occ_code, sort(rank)) l1title(Occupation code) ytitle(Frequency) blabel(bar) ysc(alt r(0 12000))
      Click image for larger version

Name:	top5.png
Views:	1
Size:	14.4 KB
ID:	1688725

      And here is some more technique:



      Code:
      gen cu_freq = sum(freq) if rank < . 
      su cu_freq, meanonly
      
      gen cu_prob = 1 - cu_freq / r(max)
      
      list rank cu_* occ_code if rank < . 
      
      
      graph hbar (asis) freq if cu_prob >= 0.2 & cu_prob < ., over(occ_code, sort(rank)) l1title(Occupation code) ytitle(Frequency) blabel(bar) ysc(alt r(0 12000))
      See also

      https://journals.sagepub.com/doi/pdf...6867X221106436

      https://www.stata-journal.com/articl...article=st0496

      Comment


      • #4

        Thanks Jared! Thank you so much Nick, for your detailed comment and the resouces.

        Comment

        Working...
        X