Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting the number of categories of a given variable

    Hi,

    Is there a command that I could use if I want to count how many categories a given variable in my data set is comprised of? Rather than having to manually count them?

    Thanks in advance

  • #2
    Are the categories indicated in another variable? If that variable was called "category" you could input

    Code:
    codebook category
    and it would indicate the number of unique values held by variable "category". This would tell you how many categories there are. There is also a "count unique" command but I forget it.

    Comment


    • #3
      Or you can install Nick Cox's -distinct- from Stata Journal (-findit distinct.ado, follow the first link, and then click on "click here to install). Then -distinct my_variable_name- will do it for you.

      And, you can also do it from first principles:

      Code:
      by my_variable, sort: gen counter = 1 if _n == 1
      replace counter = sum(counter)
      display "Number of distinct values of my_variable is: " =counter[_N]

      Comment


      • #4
        Following Clyde's pointer: http://www.stata-journal.com/sjpdf.h...iclenum=dm0042 contains a discussion of this territory.

        Comment


        • #5
          Christina:
          if your categorical variable is -label-led, you can take advantage of it:
          Code:
          . sysuse auto.dta
          (1978 Automobile Data)
          . label list origin
          origin:
                     0 Domestic
                     1 Foreign
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            As Carlo will know, the fact that value labels have been associated with a variable still leaves open possibilities that

            * there are unlabelled values in the data

            * values in the data have no associated labels.

            Often the easiest way to get at the number of distinct values present is to fire up tabulate and retrieve the number of rows as r(r). tabulate won't play with very large numbers of rows, one of various reasons for distinct (SJ).

            Comment


            • #7
              Nick is correct.
              My habit is to label categorical variables whenever feasible, including a -label define...- that equals 9999 for missing values.
              Sometimes, when I lose my bearing during long Stata sessions, I type -label list <whetever>- before -tabulate-, just to be sure that I have labelled all that I could; this approach sometimes ease my mind, sometimes put me on the right path to sniff out the culprit.
              Obviously, this differs from calculating how many observations belong to each category: hence, my previous reply was not that helpful for the original poster.

              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Thank you all very much. Both codebook and distinct work perfectly for what I want to do

                Comment

                Working...
                X