Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • count from levelsof

    Dear Stata users,

    I have a dataset and a part of it is below:

    ----------------------- copy starting from the next line -----------------------
    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int redovar str4 astkom2
    2000 "1402"
    1996 "1480"
    1996 "1480"
    1997 "0191"
    1998 "1283"
    1996 "0882"
    1997 "0191"
    1997 "0680"
    1996 ""
    1997 "1880"
    1996 "1280"
    1996 ""
    1997 "0191"
    1999 "1490"
    1997 ""
    1996 "1480"


    I want to calculate the frequency of unique values of the variable "astkom2" for each year ("redovar" in this case). After that, I want to create a variable and want to store that frequency by year. The levelsof command shows the unique values. But it would be helpful if someone tells how to calculate and store the frequency of unique values as a new variable. As you may see there are some repetition. It is because the data is an individual level data (invidual id is not shown in the above excerpt from the original dataset for simplicity).

    Thanks in advance!

    Zariab Hossain
    Uppsala University

  • #2
    there are official commands that do this (e.g., -codebook-, -inspect-) and there are also use-written commands available (e.g., -distinct-; use search to find and install)

    added: note that -codebook- does not save the number of distinct values but the other two do
    Last edited by Rich Goldstein; 08 Aug 2023, 05:57.

    Comment


    • #3
      Following @Rich Goldstein's hint, note the 2008 survey at https://journals.sagepub.com/doi/pdf...867X0800800408

      Although the distinct command does what it was intended to do, for your purposes focus on p.563 of the paper which gives a direct method with technique worth knowing and which doesn't entail any software download.

      It is a two-step: tag each distinct occurrence just once and then add up the tags as desired. The sum over 1s and 0s is precisely the number of 1s, which is also called counting.

      Note that the code ignores missing (empty) values. If that is not what you want, omit the if qualifier.

      See also Section 2 of the paper, which strongly urges use of the term distinct, as already mentioned. The word unique still has a primary sense of occurring once only, and is (in our view) better avoided. All your English teachers who squawked at misuse or abuse of the word are thereby respected.



      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int redovar str4 astkom2
      2000 "1402"
      1996 "1480"
      1996 "1480"
      1997 "0191"
      1998 "1283"
      1996 "0882"
      1997 "0191"
      1997 "0680"
      1996 ""
      1997 "1880"
      1996 "1280"
      1996 ""
      1997 "0191"
      1999 "1490"
      1997 ""
      1996 "1480"
      end 
      
      egen tag = tag(astkom2 redovar) if !missing(astkom2)
      
      egen wanted = total(tag), by(redovar)
      
      sort redovar astkom2 
      
      list, sepby(redovar)
      
           +----------------------------------+
           | redovar   astkom2   tag   wanted |
           |----------------------------------|
        1. |    1996               0        3 |
        2. |    1996               0        3 |
        3. |    1996      0882     1        3 |
        4. |    1996      1280     1        3 |
        5. |    1996      1480     1        3 |
        6. |    1996      1480     0        3 |
        7. |    1996      1480     0        3 |
           |----------------------------------|
        8. |    1997               0        3 |
        9. |    1997      0191     0        3 |
       10. |    1997      0191     1        3 |
       11. |    1997      0191     0        3 |
       12. |    1997      0680     1        3 |
       13. |    1997      1880     1        3 |
           |----------------------------------|
       14. |    1998      1283     1        1 |
           |----------------------------------|
       15. |    1999      1490     1        1 |
           |----------------------------------|
       16. |    2000      1402     1        1 |
           +----------------------------------+

      Comment


      • #4
        The simplest way is to use distinct command here . use this code distinct astkom2

        Comment


        • #5
          #4 Bilal Ahmad The OP wants to create a variable and distinct does not do that. The latest update of distinct does include a companion command distinctgen. The method in #3 requires no downloads, an imperative for some users.

          Comment


          • #6
            Thanks a lot Nick for your great advice as always. I solved the problem.

            Comment

            Working...
            X