I am trying to make a bar graph of the frequency of names within a population, but there are many many names for which there is only one instance.
Is there a way to exclude all entries that have an occurrence below 5?
My current idea on how to do this was to encode the string name1 variable, and then by first name, generate a sum of the number of instances, and then exclude all those below a frequency of 5. This, however, shifts the entire bar chart rightward by 5, so is not idea.
Thank you for your time!
Is there a way to exclude all entries that have an occurrence below 5?
My current idea on how to do this was to encode the string name1 variable, and then by first name, generate a sum of the number of instances, and then exclude all those below a frequency of 5. This, however, shifts the entire bar chart rightward by 5, so is not idea.
Code:
encode name1, generate(firstname) bys firstname: gen number=sum(1) graph hbar (sum) one if number>=5, over(firstname) ytitle(frequency)

Comment