Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count unique appearance of values across multiple variables in wide format

    Hello!

    It has been a while since I have used Stata and I need a bit of help.

    I have a wide dataset with variables activ1-active27. Across these variables, the values I am interested in are 3001-3008 (these values represent specific categories of activities a participant has engaged in).

    I would like to count:
    1) the unique appearance of 3001-3008 across activ1-active27 for each idpriv
    2) the total count of any value within the range 3001-3008 across activ1-activ27 for each idpriv

    I have tried a number of loops without success. While I would love to try -egenmore-, I cannot add packages because I am working on a disconnected server.

  • #2
    Hope this is what you want:
    Code:
    /*For the question 1)*/
    foreach n of numlist 3001/3008 {
     by idpriv: egen count`n'=anycount(activ1-active27), values(`n')
     }
    
    /*For the question 2)*/
    by idpriv: egen total_count=anycount(activ1-active27), values(3001/3008)
    edit: sorry I can't delete the answer, but the egen anycount function may not be combined with by, you can use Clyde Schechter's code instead.
    Last edited by Chen Samulsion; 26 Mar 2023, 20:59. Reason: wrong

    Comment


    • #3
      2) the total count of any value within the range 3001-3008 across activ1-activ27 for each idpriv
      The -egen, anycount()- function is part of official Stata and does not require -egenmore-.
      Code:
      egen wanted1 = anycount(activ1-activ27), values(3001(1)3008)
      I'm not sure what you mean when you say you want this "for each idpriv." The above will count the number of occurrences of any value 3001 through 3008 in all of the variables activ1-activ27 in each observation. If you have multiple observations per idpriv, then it isn't clear what you want. Perhaps you want to then take the total of each of those:
      Code:
      by idpriv: egen wanted2 = total(wanted1)
      I have not responded to your first request because I do not understand what it means.

      Added: Crossed with #2. I don't think the code proposed there will work. According to the -egen- help file, -egen, anycount()- cannot be combined with -by-.
      Last edited by Clyde Schechter; 26 Mar 2023, 20:56.

      Comment


      • #4
        I suspect that Chen Samulsion is pointing in the right direction but we lack a data example -- and as explained in FAQ Advice #12 a fake example with the right flavour can be just as helpful as real data. See https://www.statalist.org/forums/help#stata

        The word unique can mean something more or less, erm, unique to each problem. See e.g. Section 2 of https://journals.sagepub.com/doi/pdf...867X0800800408

        Please consider this and then tell us if your real problem is qualitatively different. For example, are there multiple observations for each identifier?

        Code:
        clear 
        set obs 3
        gen idpriv = _n 
        
        forval j = 1/3 {
            gen activ`j' = runiformint(1, 9)
        }
        
        forval a = 4/6  {
            egen any`a' = anycount(activ*), value(`a')
        }
        
        egen all = rowtotal(any*)
        
        list 
             +--------------------------------------------------------------+
             | idpriv   activ1   activ2   activ3   any4   any5   any6   all |
             |--------------------------------------------------------------|
          1. |      1        1        6        7      0      0      1     1 |
          2. |      2        5        8        8      0      1      0     1 |
          3. |      3        5        5        8      0      2      0     2 |
             +--------------------------------------------------------------+

        Comment


        • #5
          Thank you all for your help. It is much appreciated!

          For the initial suggestion, I received an "invalid name" error. That's my fault for not including a data example.


          Re: the data, it looks exactly like what Nick shared above (without value labels attached).

          idpriv activ1 activ2 activ3 Unique Music Grps Total Music Grps
          1 3001 5031 3001 1 2
          2 1011 3008 3002 2 1
          3 2031 3005 . 1 1
          • activ1 is the first club activity a high schooler participated in, activ2 is the second, and so on for activ3-activ27
          • Labels for 3001-3008 are unique music-related groups (band, chorus, etc)
          I want to count unique music groups for each high schooler (unique appearances of 3001-3008). I also want to check if the unique count matches the total count for those groups.

          Thank you again!

          Comment


          • #6
            Identifier 2 looks like 2 and 2 to me.

            If you check out the anymatch() function of egen as well as anycount() and rowtotal() I think you have all the tools you need.

            Code:
            help egen

            Comment


            • #7
              Thank you!

              Comment


              • #8
                Please help me count unique appearance of ID across variable. If the ID appears multiple times, i want to count only once for any combination. I want to later aggregate the created column and find a count of how many unique Investor_ID are therein my data set

                I tried to use #6, my code was as below and i got values 1 in the new colum totalIDs that is created. So i get a huge aggregate for different combinations. My aim is to count once the var investor_ID for any combination of the other variable.
                Code:
                 egen totaIDs = anycount(investor_id), values (1(1)113)
                My sample data as given by dataex is as below
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input int investor_id byte investor_type str3 ticker float qdate
                  2 2 "PUG" 206
                  4 3 "BGK" 234
                  4 3 "PUG" 191
                  4 3 "BGK" 234
                  4 3 "BGK" 234
                  5 1 "BGK" 237
                  5 1 "BGK" 236
                  5 1 "BGK" 235
                  5 1 "BGK" 235
                  5 1 "BGK" 236
                  5 1 "BGK" 238
                  5 1 "BGK" 238
                  5 1 "BGK" 238
                  5 1 "BGK" 236
                  5 1 "BGK" 235
                  5 1 "BGK" 235
                  7 2 "PUG" 209
                  7 2 "BGK" 208
                 15 1 "BGK" 236
                 15 1 "BGK" 236
                 15 1 "BGK" 214
                 15 1 "BGK" 231
                 15 1 "BGK" 238
                 15 1 "BGK" 231
                 15 1 "BGK" 207
                 15 1 "BGK" 237
                 15 1 "BGK" 239
                 15 1 "BGK" 240
                 15 1 "BGK" 239
                 34 5 "PUG" 208
                 35 1 "BGK" 228
                 35 1 "BGK" 224
                 35 1 "BGK" 221
                 35 1 "BGK" 225
                 35 1 "BGK" 225
                 40 5 "BGK" 203
                 40 5 "BGK" 207
                 40 5 "BGK" 207
                 40 5 "BGK" 207
                 40 5 "BGK" 208
                 41 2 "PUG" 191
                 41 2 "BGK" 208
                 41 2 "PUG" 191
                 50 6 "PUG" 206
                 56 2 "PUG" 206
                 56 2 "BGK" 208
                 70 2 "BGK" 238
                 71 2 "BGK" 239
                 71 2 "BGK" 233
                 71 2 "BGK" 239
                 71 2 "BGK" 233
                 71 2 "BGK" 233
                 71 2 "BGK" 233
                 71 2 "BGK" 239
                 72 5 "PUG" 206
                 72 5 "PUG" 206
                 72 5 "PUG" 206
                 72 5 "PUG" 206
                 73 2 "PUG" 206
                 74 6 "PUG" 206
                 81 3 "PUG" 206
                 82 4 "BGK" 207
                 83 4 "PUG" 206
                 83 4 "BGK" 200
                 86 4 "PUG" 206
                 93 7 "PUG" 206
                 96 2 "BGK" 237
                 97 2 "PUG" 206
                 97 2 "BGK" 230
                 97 2 "BGK" 208
                107 2 "BGK" 208
                107 2 "PUG" 206
                109 6 "PUG" 193
                109 6 "BGK" 190
                110 2 "BGK" 208
                113 7 "PUG" 206
                end
                format %tq qdate
                I will be grateful for your kind feedback

                Comment


                • #9
                  #8 is nothing to do with the thread focus on data in wide format (layout). You have long layout, and solutions are quite different.

                  The number of distinct investors is obtainable from

                  Code:
                  tab investor_id 
                  
                  di r(r) 
                  provided it's not too large. Otherwise consult this 2008 paper and its updates to see if it offers a solution. dm0042 is thus a search term to find related threads here.

                  Code:
                  . search dm0042, entry
                  
                  Search of official help files, FAQs, Examples, and Stata Journals
                  
                  SJ-20-4 dm0042_3  . . . . . . . . . . . . . . . . Software update for distinct
                          (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
                          Q4/20   SJ 20(4):1028--1030
                          sort() option has been added
                  
                  SJ-15-3 dm0042_2  . . . . . . . . . . . . . . . . Software update for distinct
                          (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
                          Q3/15   SJ 15(3):899
                          improved table format and display of large numbers of
                          observations
                  
                  SJ-12-2 dm0042_1  . . . . . . . . . . . . . . . . Software update for distinct
                          (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
                          Q2/12   SJ 12(2):352
                          options added to restrict output to variables with a minimum
                          or maximum of distinct values
                  
                  SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
                          (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
                          Q4/08   SJ 8(4):557--568
                          shows how to answer questions about distinct observations
                          from first principles; provides a convenience command

                  Comment

                  Working...
                  X