Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing entires below a certain frequency in an hbar

    I am trying to make a bar graph of the frequency of names within a population, but there are many many names for which there is only one instance.

    Is there a way to exclude all entries that have an occurrence below 5?

    My current idea on how to do this was to encode the string name1 variable, and then by first name, generate a sum of the number of instances, and then exclude all those below a frequency of 5. This, however, shifts the entire bar chart rightward by 5, so is not idea.

    Code:
    encode name1, generate(firstname)
    bys firstname: gen number=sum(1)
    graph hbar (sum) one if number>=5, over(firstname) ytitle(frequency)
    Thank you for your time!

  • #2
    Since you did not include a data example (for future reference: you can do this using dataex), I made up something using another dataset.

    The code below keeps your dataset in its current form. If you don't care for this, you can also achieve the same result using a -contract- command that produces a dataset of frequencies.

    Code:
    sysuse nlsw88, clear
    tab industry
    bys industry: egen freq = count(idcode)
    graph hbar (count) idcode if freq>=20, over(industry, sort(freq)) ytitle(frequency)
    This drops three industries which have a frequency lower than 20 (the -tab- command above is only to show you the original frequencies of all industries):
    Code:
    ------------------------+-----------------------------------
      Ag/Forestry/Fisheries |         17        0.76        0.76
                     Mining |          4        0.18        0.94
               Construction |         29        1.30        2.24
              Manufacturing |        367       16.44       18.68
     Transport/Comm/Utility |         90        4.03       22.72
     Wholesale/Retail trade |        333       14.92       37.63
    Finance/Ins/Real estate |        192        8.60       46.24
        Business/Repair svc |         86        3.85       50.09
          Personal services |         97        4.35       54.44
      Entertainment/Rec svc |         17        0.76       55.20
      Professional services |        824       36.92       92.11
      Public administration |        176        7.89      100.00
    ------------------------+-----------------------------------
                      Total |      2,232      100.00
    The code produces this graph:
    Click image for larger version

Name:	Screenshot 2022-09-10 at 11.51.04 PM.png
Views:	1
Size:	146.0 KB
ID:	1681440

    Last edited by Hemanshu Kumar; 10 Sep 2022, 12:25.

    Comment

    Working...
    X