Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I display the top five observations (by frequency) in a table?

    Hello,
    I am trying to look at the top five recorded diagnoses (numerically coded) in a very large dataset. But given that the dataset is so large, when I type this command
    tab diagnosiscode, sort
    I get a very long table and I cannot scroll to the top of the table to see the most frequent diagnoses.

    I tried typing
    tab diagnosiscode in 1/5, sort
    but I get an error message

    varlist required
    May you help me please. How can I view the five most frequent diagnosis in a very large dataset?

  • #2
    See groups from the Stata Journal.

    A 2017 post here about groups being updated on SSC has an example that does exactly that.
    Last edited by Nick Cox; 19 Mar 2022, 15:54.

    Comment


    • #3
      On your other problem
      I get a very long table and I cannot scroll to the top of the table
      see the output of
      Code:
      help scrollbufsize
      for advice on how you can increase how far back you can scroll in the Stata's Results window. There are better ways to accomplish what you want in this case, as Nick discussed, but being able to scroll back farther can also be useful for solving other problems.

      Comment


      • #4
        Many thanks and appreciation Nick Cox

        I checked the groups command and it worked perfectly well.

        This is the syntaxes for my request:

        groups diagnosiscode , select(5) order(h)
        For those interested in reading the article on groups, following is the link:
        https://journals.sagepub.com/doi/pdf...867X1701700314

        Thank you again

        Comment


        • #5
          Hi William Lisowski

          Thank you for sharing this information. I tried to increase how far I can scroll back in Stata's results window using this command:

          set scrollbufsize #2000000
          but I got an error message:

          # found where number expected
          r(198);
          Can you help me please

          Comment


          • #6
            Remove the number sign.

            Comment


            • #7
              Danah Abdul -

              The answer from Leonardo Guizzetti is right on target.

              It is most unfortunate that, since the Stata documentation standard is to use italics to highlight something (a parameter, a variable name, a number) that you are meant to replace rather than copy, that the documentation for scrollbufsize tells us
              Code:
                      set scrollbufsize #
              
                      10000 <= # <= 2000000
              where the difference between their italicized number sign and a standard number sign is pretty much impossible to see unless the two of them are side-by-side
              Code:
              normal     #
              in italics #
              The next time I give this advice I'll try to remember to give an example.

              Added in edit: And it's depressing to realize, having written the above, that anyone less than a third of my age probably knows the character "#" that I call the "number sign" as the "hash" character.
              Last edited by William Lisowski; 19 Mar 2022, 18:30.

              Comment


              • #8
                Originally posted by William Lisowski View Post
                Added in edit: And it's depressing to realize, having written the above, that anyone less than a third of my age probably knows the character "#" that I call the "number sign" as the "hash" character.
                I used to call this the pound sign but then I'd just get blank looks from people. It's funny how many names this symbol has, and that they reasonably correlate with one's age (or at least experience with technology).

                Comment


                • #9
                  Google Jargon File ASCII for a collection of names for symbols. That source is a little old. I too call # hash and that seems common too. Full disclosure: I am a bit younger than William.

                  Comment


                  • #10
                    http://catb.org/jargon/html/A/ASCII.html is the link implied in my previous.

                    Comment


                    • #11
                      Hi all,

                      I am relatively new to Stata and am using Stata MP 18.0 for Mac.

                      I wonder has the groups command been discontinued please? I am trying to look at the top ten recorded diagnoses (coded as string) in a large dataset.

                      I tried to use the groups command with the following code to see the top 10 diagnoses:
                      groups diagnosis1 , select(10) order(h)
                      I received the following error message:
                      command groups is unrecognized
                      r(199);
                      When I tried to search for the command in the manual using the following code, I received the same error message as above.
                      help groups
                      Alternatively, is there any workaround e.g. using the table command to view the top ten diagnoses in descending order please?

                      Any help would be appreciated, thank you.

                      Comment


                      • #12
                        As mentioned briefly in #2 groups is a community-contributed command written by me and (although of older vintage) published recently through the Stata Journal. It must be installed before it can be used. Further, any minor difficulty in installing it can also be blamed on me, in this sense. Somewhere StataCorp claims rights to all the words in the English language as possible command names; not quite that, but there is official (= company) advice against using ordinary English words as command names. In choosing groups as a command name, I deliberately ignored that as I was unwilling to invent some weird Klingon-sounding name for the command. But groups is unsurprisingly a commonly used word for other reasons and so any search is likely to yield many false positives.

                        The solution is easy with an otherwise unpredictable detail. If you missed, or have forgotten, the lesson in Hogwarts, the spell needed is st0496:

                        Code:
                        . search st0496, entry
                        
                        Search of official help files, FAQs, Examples, and Stata Journals
                        
                        SJ-18-1 st0496_1  . . . . . . . . . . . . . . . . . Software update for groups
                                (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                                Q1/18   SJ 18(1):291
                                groups exited with an error message if weights were specified;
                                this has been corrected
                        
                        SJ-17-3 st0496  . . . . .  Speaking Stata: Tables as lists: The groups command
                                (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                                Q3/17   SJ 17(3):760--773
                                presents command for listing group frequencies and percents and
                                cumulations thereof; for various subsetting and ordering by
                                frequencies, percents, and so on; for reordering of columns;
                                and for saving tabulated data to new datasets
                        This shows that the 2017 paper is a longer account, and that is the paper linked in #4 of this thread, but the files should be downloaded using the 2018 link that this search will reveal.

                        See also https://journals.sagepub.com/doi/pdf...6867X221106436 for some technique to do with "the largest five" where "five" is just an example, and not the principle.

                        Comment

                        Working...
                        X