Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a count variable capturing # of times a value occurs in categorical variables

    Dear Stata users,

    I have a panel data set with 14 string variables each of which indicates whether this type of service is provided (one variable per service), and if it is, by what means.
    These variables can take values "NP", "Comb", "Other gov", "Own gov", "Private", "RDC".

    I would like to create a count variable "total" indicating the number of services provided for each observations. This variable would most likely count the number of "NP"s in all 14 variable for a given id-year, and subtract that from 14 (because 14 is the maximum number of services provided, subtracting the number of those not provided would give me what I need). I have not been able to find any way of doing this in Stata, so I wanted to ask for advice. Please let me know if this information is insufficient or unclear.

    Thank you!

  • #2
    Please check out the help for -dataex- and supply a minimal example of your data that illustrates your problem, plus how do you want the solution to look like.

    Otherwise if you want to have a short at a solution yourself, check out the -egenmore- package

    egen count = rcount(x), cond(@=="NP")

    I have the feeling that this might do what you want, but I cannot be sure before seeing your actual data.

    Comment


    • #3
      Your description implies a

      an identifier -- not explained otherwise

      year variable -- presumably numeric

      string variables.

      So, this may help:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(id year) str4(frog toad newt)
      1 2017 "frog" "toad" "newt"
      1 2018 "frog" ""     "newt"
      2 2017 "frog" "toad" ""    
      2 2018 ""     "toad" "newt"
      3 2017 ""     "toad" "newt"
      3 2018 ""     ""     "newt"
      end
      
      . egen distinct = rowsvals(frog toad newt)
      
      . list, sepby(id)
      
           +-------------------------------------------+
           | id   year   frog   toad   newt   distinct |
           |-------------------------------------------|
        1. |  1   2017   frog   toad   newt          3 |
        2. |  1   2018   frog          newt          2 |
           |-------------------------------------------|
        3. |  2   2017   frog   toad                 2 |
        4. |  2   2018          toad   newt          2 |
           |-------------------------------------------|
        5. |  3   2017          toad   newt          2 |
        6. |  3   2018                 newt          1 |
           +-------------------------------------------+
      Here rowsvals() is a function which must be installed using ssc install egenmore.

      Discussions include

      FAQ . . . . . . . . . Counting distinct strings across a set of variables
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      11/06 How do I count the number of distinct strings
      across a set of variables?

      https://www.stata.com/support/faqs/d...tinct-strings/

      SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
      (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
      Q1/09 SJ 9(1):137--157
      shows how to exploit functions, egen functions, and Mata
      for working rowwise; rowsort and rowranks are introduced

      https://www.stata-journal.com/sjpdf....iclenum=pr0046 (especially Section 7)

      If that's not an answer, please follow FAQ Advice #12 and give us an explicit data example rather than a verbal description.

      I don't need to spell out how to subtract from 3, or 14, or any other number.

      Comment


      • #4
        See Michael, how different people might interpret the same description in different ways...

        I interpreted what you said as you having 14 variables each of which might take the values frog toad newt, and then you want 3 new variables frogcount, toadcount, and newtcount...

        In short you need to provide a data example with -dataex- if you want to get reasonable advice.
        "

        Comment


        • #5
          Thank you for the prompt replies. I see where the misunderstanding is. Here is a sample of my data by FAQ Advice #12:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long id int year str11(g1_acc g1_arch g1_land g1_gis g1_tc g1_le g1_cr) str9 g1_pp str11(g1_ta g1_tb g1_td g1_ub g1_vr g1_elect)
          1 2000 "Own govt." "NP"        "NP"          "NP"      "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "NP"        "Own govt." ""         
          1 2001 "Own govt." "Own govt." "NP"          "NP"      "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Private"   "Own govt." "NP"        "Own govt." ""         
          1 2002 "Own govt." "Own govt." "NP"          "NP"      "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "NP"        "Own govt." ""         
          1 2003 "Own govt." "Own govt." "NP"          "NP"      "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "NP"        "Own govt." ""         
          1 2004 "Own govt." "Own govt." "NP"          "Private" "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Private"   "Private"   "NP"        "Own govt." ""         
          1 2005 "Own govt." "Own govt." "NP"          "Private" "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Private"   "Private"   "NP"        "Own govt." ""         
          1 2006 "Own govt." "Own govt." "NP"          "Private" "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Private"   "Private"   "NP"        "Own govt." ""         
          1 2007 "Own govt." "Comb."     "NP"          "NP"      "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Private"   "Own govt." "Own govt." "Own govt." ""         
          1 2008 "Own govt." "Comb."     "NP"          "NP"      "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Private"   "Own govt." "Own govt." "Own govt." ""         
          1 2009 "Own govt." "Comb."     "Own govt."   "Comb."   "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." ""         
          1 2010 "Own govt." "Comb."     "Own govt."   "Comb."   "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." ""         
          1 2011 "Own govt." "Comb."     "Own govt."   "Comb."   "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." ""         
          1 2012 "Own govt." "Comb."     "Own govt."   "Comb."   "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." ""         
          1 2013 "Own govt." "Comb."     "Own govt."   "Comb."   "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." ""         
          1 2014 "Own govt." "Comb."     "Own govt."   "Comb."   "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "NP"        "Own govt." ""         
          1 2015 "Own govt." "Comb."     "Own govt."   "Comb."   "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "NP"        "Own govt." "Own govt."
          2 2000 "Own govt." "NP"        "NP"          "RDC"     "NP"        "Own govt." "Comb."     "Own govt." "Private"   "Private"   "Own govt." "NP"        "Own govt." ""         
          2 2001 "Own govt." "Private"   "Other govt." "RDC"     "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Private"   "Own govt." "NP"        "Own govt." ""         
          2 2002 "Own govt." "Private"   "Comb."       "RDC"     "NP"        "Own govt." "Own govt." "Own govt." "Own govt." "Private"   "Comb."     "NP"        "Own govt." ""         
          2 2003 "Own govt." "Own govt." "RDC"         "RDC"     "NP"        "Own govt." "Comb."     "Own govt." "Own govt." "Private"   "Private"   "NP"        "Own govt." ""         
          2 2004 "Own govt." "NP"        "Comb."       "RDC"     "Own govt." "Own govt." "Comb."     "Own govt." "Own govt." "Private"   "Own govt." "NP"        "Own govt." ""         
          2 2005 "Own govt." "NP"        "NP"          "RDC"     "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Private"   "Private"   "NP"        "Own govt." ""         
          2 2006 "Own govt." "NP"        "NP"          "RDC"     "Own govt." "Own govt." "Own govt." "Own govt." "Own govt." "Private"   "Private"   "NP"        "Own govt." ""         
          2 2007 "Own govt." "Comb."     "Comb."       "RDC"     "NP"        "Own govt." "Comb."     "Own govt." "Own govt." "Private"   "Private"   "NP"        "Own govt." ""         
          2 2008 "Own govt." "Comb."     "Comb."       "RDC"     "NP"        "Own govt." "Comb."     "Own govt." "Own govt." "Private"   "Private"   "NP"        "Own govt." ""         
          end
          label values id id
          label def id 1 "Appling", modify
          label def id 2 "Atkinson", modify
          I would like to generate a variable "total" that will count the number of times "NP" shows up in each row. Thus, in row 1 "total" would be equal to "5" as there are 5 "NP"s. I do not need unique values in each row, i only need the count of specific values by row. Hopefully that helps.

          Comment


          • #6
            Thanks for the data example. As Joro says, it really is a good idea.

            I would find this easier if we had numeric variables. So all we have to do is
            Code:
            encode
            consistently and that is a known problem. In the code below comments flag community-contributed commands which must be installed for this code to work. Naturally, there are other ways to do it.

            Code:
            * ssc install multencode
            multencode g1*, gen(G1-G14)
            
            label list g1_acc
            
            forval j = 1/6 {
                 egen count`j' = anycount(G*), values(`j')
                 label var count`j' "`: label (G`j') `j''"
            }
            
            describe count*
            
            * ssc install tab_chi
            tabm count* , transpose
            Results:

            Code:
            . label list g1_acc
            g1_acc:
                       1 Comb.
                       2 NP
                       3 Other govt.
                       4 Own govt.
                       5 Private
                       6 RDC
            
            . describe count*
            
                          storage   display    value
            variable name   type    format     label      variable label
            --------------------------------------------------------------------------------------------------------------------------------------------------
            count1          byte    %8.0g                 Comb.
            count2          byte    %8.0g                 NP
            count3          byte    %8.0g                 Other govt.
            count4          byte    %8.0g                 Own govt.
            count5          byte    %8.0g                 Private
            count6          byte    %8.0g                 RDC
            
            
            . tabm count* , transpose
            
                       |                             variable
                values |     Comb.         NP  Other gov  Own govt.    Private        RDC |     Total
            -----------+------------------------------------------------------------------+----------
                     0 |        10          0         24          0         10         16 |        60
                     1 |         4          5          1          0          4          8 |        22
                     2 |         9          8          0          0          8          1 |        26
                     3 |         2          7          0          0          3          0 |        12
                     4 |         0          4          0          0          0          0 |         4
                     5 |         0          1          0          3          0          0 |         4
                     6 |         0          0          0          2          0          0 |         2
                     7 |         0          0          0          7          0          0 |         7
                     8 |         0          0          0          4          0          0 |         4
                     9 |         0          0          0          3          0          0 |         3
                    10 |         0          0          0          6          0          0 |         6
            -----------+------------------------------------------------------------------+----------
                 Total |        25         25         25         25         25         25 |       150

            Comment


            • #7
              Thank you so much, Nick, this is beautiful! It also gave me other counts, which I will need in the future. I really appreciate all the help!

              Comment


              • #8
                This is simpler, with no community-contributed stuff. The basic idea is similar to that underlying the rcount() function mentioned in #2

                Code:
                tokenize `" "Comb." "NP" "Other govt." "Own govt." "Private" "RDC" "' 
                
                forval j = 1/6 { 
                    gen count`j' = 0 
                    label var count`j' "``j''" 
                    quietly foreach v of var g1* { 
                        replace count`j' = count`j' + (`v' == "``j''") 
                    }
                }

                Comment


                • #9
                  As usual Nick's code above is authoritative and more than worthy of studying if one aspires to be an advanced Stata user.

                  Yet the -egenmore- function again contributed by Nick, and mentioned in #2, would do the trick automatically for a rudimentary Stata user:

                  Code:
                  . ssc install egenmore
                  checking egenmore consistency and verifying not already installed...
                  
                  
                  . egen npcount = rcount( g1_acc- g1_elect), cond(@=="NP")
                  
                  
                  . list id year npcount, clean
                  
                               id   year   npcount  
                    1.    Appling   2000         5  
                    2.    Appling   2001         4  
                    3.    Appling   2002         4  
                    4.    Appling   2003         4  
                    5.    Appling   2004         3  
                    6.    Appling   2005         3  
                    7.    Appling   2006         3  
                    8.    Appling   2007         3  
                    9.    Appling   2008         3  
                   10.    Appling   2009         1  
                   11.    Appling   2010         1  
                   12.    Appling   2011         1  
                   13.    Appling   2012         1  
                   14.    Appling   2013         1  
                   15.    Appling   2014         2  
                   16.    Appling   2015         2  
                   17.   Atkinson   2000         4  
                   18.   Atkinson   2001         2  
                   19.   Atkinson   2002         2  
                   20.   Atkinson   2003         2  
                   21.   Atkinson   2004         2  
                   22.   Atkinson   2005         3  
                   23.   Atkinson   2006         3  
                   24.   Atkinson   2007         2  
                   25.   Atkinson   2008         2  
                  
                  .


                  Comment


                  • #10
                    I wrote that function rcount()

                    Code:
                    *! 1.0.0 NJC 12 February 2001
                    but as the help for egenmore spells out, I now don't recommend it over a loop. I don't know what beginners might now think. The @ syntax is a little quirky.

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      I wrote that function rcount()

                      Code:
                      *! 1.0.0 NJC 12 February 2001
                      but as the help for egenmore spells out, I now don't recommend it over a loop. I don't know what beginners might now think. The @ syntax is a little quirky.
                      I have ended up recommending -egen, rcount- second or third time in the last three days. Hence I conclude that it is a useful function that does something that cannot be done easily in other ways. Even if one understands loops to a point of being able to write smoothly, without much pausing and doubting, and without any errors the loop in #8, the loop is still 8 lines of code. I feel the same way regarding both -egen, xtile- and -egen, rcount-, whatever the disadvantages in terms of speed or quirkiness, I rather write 1 line of code than 8 lines of code (unless I have very good reasons to insist on speed or non-quirky code).

                      And I actually do not find the @ syntax of -egen, rcount- quirky at all. It reminds me of the good old way in which we used to loop in Stata 7, with -for- and then referring to what I would call today a macro by X, Y, Z.

                      Comment


                      • #12
                        There is a tiny impasse then. I no longer recommend my own progeny in this particular case and you do. That's fine.

                        (It's hard to guess what fraction of users go back to Stata 7. I guess totally wildly 1%.)

                        Comment


                        • #13
                          Nick, your progeny looks beautiful, as far as I am concerned.

                          Otherwise in my view looping and graphing in Stata 7 was wicked. I dont teach Stata anymore, but if I were teaching Stata today, in defiance to the establishment I would teach Stata 7 looping and graphing even in today's versions.

                          Here is how the task would have been done with a loop in Stata 7:

                          Code:
                          . egen npcount = rcount( g1_acc- g1_elect), cond(@=="NP")
                          
                          . gen npcount7 = 0
                          
                          . for var g1_acc- g1_elect: replace npcount7 = npcount7 + (X=="NP")
                          
                          
                          *** omitted Stata output ***
                          
                          
                          . assert npcount== npcount7
                          
                          .
                          You might call the syntax of Stata 7 quirky, but to me the syntax:

                          Code:
                          . gen npcount7 = 0
                          
                          . for var g1_acc- g1_elect: replace npcount7 = npcount7 + (X=="NP")
                          looks wicked compared to the modern proper ways of doing it that you presented in #8.

                          Comment


                          • #14
                            for is completely undocumented now (the command had nothing to do with for in Mata, for those reading).

                            Those for loops were good when they worked but for a start they didn't mesh with local macro syntax. I remember answering many, many questions debugging people's loops and that is a part of my past I don't miss one bit.

                            Comment


                            • #15
                              Hi,

                              I have a similar problem. I think, first step would be to change to panel (by idhh) but then I am not sure how to go about it.

                              I have occupational information for 3 generation. I want to do the following

                              1. How many (count) 1 "Professionals" from gen1_ocp is professional in gen2h_ocp. And then put the count value in the variable value_ocp, replace source_ocp==1 "Professionals" and dest_ocp==1 "Professionals", and lastly replace from_ocp==1 to_ocp==2 as from_ocp is gen1_ocp and to_ocp gen2h_ocp.

                              if it was gen2h_ocp and gen3_ocp then it would be from_ocp==2 to_ocp==3

                              If gen1_ocp==1 & gen2h_ocp==2 then count that value and replace value_ocp, replace source_ocp==1 and dest_ocp==2, from_ocp==1 to_ocp==2

                              and continue this for all the combinations of 6 occupational categories


                              ----------------------- copy starting from the next line -----------------------
                              Code:
                              * Example generated by -dataex-. For more info, type help dataex
                              clear
                              input str10 idhh float(gen2h_ocp gen1_ocp gen3_ocp source_ocp dest_ocp from_ocp value_ocp to_ocp)
                              "0102010101" 6 5 4 . . . . .
                              "0102010601" 3 3 6 . . . . .
                              "0102010701" 6 6 6 . . . . .
                              "0102011301" 4 1 2 . . . . .
                              "0102011601" 3 2 6 . . . . .
                              "0102011701" 4 4 3 . . . . .
                              "0102012001" 3 3 2 . . . . .
                              "0102020101" 6 6 6 . . . . .
                              "0102020201" 6 3 2 . . . . .
                              "0102020401" 3 3 6 . . . . .
                              "0102020901" 4 4 1 . . . . .
                              "0102021101" 3 3 3 . . . . .
                              "0102021301" 3 3 3 . . . . .
                              "0102021401" 3 3 3 . . . . .
                              "0102021501" 3 3 3 . . . . .
                              "0102021601" 6 6 6 . . . . .
                              "0102021901" 4 2 6 . . . . .
                              "0102022001" 3 3 3 . . . . .
                              "0102030201" 3 3 4 . . . . .
                              "0102030401" 6 3 1 . . . . .
                              "0102030701" 4 4 4 . . . . .
                              "0102030802" 2 2 3 . . . . .
                              "0102031001" 3 6 6 . . . . .
                              "0102031101" 3 3 2 . . . . .
                              "0102031401" 6 3 1 . . . . .
                              "0102031701" 4 3 4 . . . . .
                              "0102040501" 3 4 5 . . . . .
                              "0102040801" 3 3 2 . . . . .
                              "0102040901" 5 5 4 . . . . .
                              "0102041201" 3 3 1 . . . . .
                              "0102041302" 3 3 3 . . . . .
                              "0102041303" 2 3 2 . . . . .
                              "0102041501" 3 3 2 . . . . .
                              "0102050401" 3 3 4 . . . . .
                              "0102050501" 6 3 6 . . . . .
                              "0102050601" 3 3 6 . . . . .
                              "0102050901" 2 2 5 . . . . .
                              "0102051001" 3 4 6 . . . . .
                              "0102051301" 6 6 4 . . . . .
                              "0102051501" 6 6 5 . . . . .
                              "0102051601" 6 6 6 . . . . .
                              "0102051701" 6 6 3 . . . . .
                              "0102051901" 6 4 4 . . . . .
                              "0102060201" 6 3 1 . . . . .
                              "0102061001" 5 5 1 . . . . .
                              "0102061501" 3 3 3 . . . . .
                              "0102070101" 2 2 3 . . . . .
                              "0102070201" 1 2 1 . . . . .
                              "0102070601" 3 3 6 . . . . .
                              "0102071401" 3 3 2 . . . . .
                              "0102071501" 3 3 1 . . . . .
                              "0102080201" 6 6 6 . . . . .
                              "0102080301" 3 3 6 . . . . .
                              "0102081001" 2 3 1 . . . . .
                              "0102081301" 6 3 3 . . . . .
                              "0102081501" 3 3 6 . . . . .
                              "0103010201" 5 6 2 . . . . .
                              "0103010401" 2 2 1 . . . . .
                              "0103010801" 5 5 5 . . . . .
                              "0103010901" 5 5 6 . . . . .
                              "0103040601" 5 5 5 . . . . .
                              "0103050301" 3 3 4 . . . . .
                              "0103050501" 5 3 6 . . . . .
                              "0103050601" 3 3 6 . . . . .
                              "0103051501" 5 5 5 . . . . .
                              "0103060101" 5 5 2 . . . . .
                              "0103071401" 4 5 1 . . . . .
                              "0103080901" 2 2 2 . . . . .
                              "0103090609" 4 4 2 . . . . .
                              "0103091401" 1 4 1 . . . . .
                              "0105010101" 3 6 5 . . . . .
                              "0105010601" 6 6 6 . . . . .
                              "0105010701" 3 6 3 . . . . .
                              "0105010801" 3 6 6 . . . . .
                              "0105011101" 3 1 2 . . . . .
                              "0105011301" 3 3 6 . . . . .
                              "0105011401" 3 6 4 . . . . .
                              "0105011501" 3 6 3 . . . . .
                              "0105011601" 3 3 2 . . . . .
                              "0105020701" 6 4 3 . . . . .
                              "0105020801" 4 6 3 . . . . .
                              "0105021101" 6 3 3 . . . . .
                              "0105021301" 4 6 3 . . . . .
                              "0105021401" 3 6 3 . . . . .
                              "0105021701" 3 3 1 . . . . .
                              "0105021901" 3 3 1 . . . . .
                              "0105022001" 3 6 3 . . . . .
                              "0105040201" 6 3 6 . . . . .
                              "0105040301" 2 3 3 . . . . .
                              "0105040701" 3 3 6 . . . . .
                              "0105041201" 3 3 3 . . . . .
                              "0105041301" 3 3 3 . . . . .
                              "0105041401" 3 6 6 . . . . .
                              "0105041701" 3 3 3 . . . . .
                              "0105041801" 3 3 6 . . . . .
                              "0105041901" 3 3 1 . . . . .
                              "0105042001" 3 6 2 . . . . .
                              "0105050501" 6 6 6 . . . . .
                              "0105050601" 3 3 3 . . . . .
                              "0105050801" 2 3 3 . . . . .
                              end
                              label values gen2h_ocp Occupational_codes
                              label values gen1_ocp Occupational_codes
                              label values gen3_ocp Occupational_codes
                              label def Occupational_codes 1 "Professionals", modify
                              label def Occupational_codes 2 "Clerical", modify
                              label def Occupational_codes 3 "Farmers", modify
                              label def Occupational_codes 4 `""High skilled" "labor""', modify
                              label def Occupational_codes 5 `""Low skilled" "labor""', modify
                              label def Occupational_codes 6 `""Agri. &" "other labor""', modify
                              ------------------ copy up to and including the previous line ------------------

                              Please let me know if anything is unclear.

                              Anustup

                              Comment

                              Working...
                              X