Count the number of unique observation conditional on variable

stanley schindler

Join Date: May 2019

Posts: 9
#1

Count the number of unique observation conditional on variable

14 May 2019, 11:35

Hi Statlist users,

I am currently having an issue counting the unique number of observation conditional on another variable.

In particular, I want to do the following:

Session_district Facility_id Count

1001 A 2

1001 B 3

1001 A 2

1002 B 3

1002 A 2

1003 B 3

1003 B 3

1001 C 1

1001 C 1

1001 C 1

For each facility_id, I want to count the number of observation based on the unique value of session_district.
Hence, as illustrated in the example, although facility_id will occur distinctly 3 times, because there are only two unique session_district, the count would be 2.

I tried code:
egen count = nvals(session_district), by(facility_id) would give me count for distinct observations by session_district, i.e. facility_id A would get count equal to 3, instead of 2, which is what I want.

Could someone provide some help with this? I could not really find the answer to this elsewhere.
Thanks!
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

14 May 2019, 12:19

Welcome to Statalist.

Code:

cls
* Example generated by -dataex-. To install: ssc install dataex
clear
input int session_district str1 facility_id byte count
1001 "A" 2
1001 "B" 3
1001 "A" 2
1002 "B" 3
1002 "A" 2
1003 "B" 3
1003 "B" 3
1001 "C" 1
1001 "C" 1
1001 "C" 1
end
egen tocount = tag(facility_id session_district)
bysort facility_id (session_district) : egen newcount = total(tocount)
list, sepby(facility_id) abbreviate(16) noobs\

Code:

. list, sepby(facility_id) abbreviate(16) noobs

  +-------------------------------------------------------------+
  | session_district   facility_id   count   tocount   newcount |
  |-------------------------------------------------------------|
  |             1001             A       2         0          2 |
  |             1001             A       2         1          2 |
  |             1002             A       2         1          2 |
  |-------------------------------------------------------------|
  |             1001             B       3         1          3 |
  |             1002             B       3         1          3 |
  |             1003             B       3         1          3 |
  |             1003             B       3         0          3 |
  |-------------------------------------------------------------|
  |             1001             C       1         0          1 |
  |             1001             C       1         1          1 |
  |             1001             C       1         0          1 |
  +-------------------------------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35724
#3

14 May 2019, 12:23

Your example is very puzzling. You show a data example with a variable Count and also

Code:

egen count = nvals(session_district), by(facility_id)

as supposedly what produced it. In principle the result of the nvals() function from egenmore (from SSC, as you are asked to explain) will always be constant for each distinct value of the by() argument.

Correcting for inconsistencies in upper and lower case I get this, which seems right to me:

Code:

clear input Session_district str1 Facility_id Count 1001 A 2 1001 B 3 1001 A 2 1002 B 3 1002 A 2 1003 B 3 1003 B 3 1001 C 1 1001 C 1 1001 C 1 end sort Session_district Facility_id egen wanted = nvals(Facility_id), by(Session_district) list, sepby(Session_district) +--------------------------------------+ | Sessio~t Facili~d Count wanted | |--------------------------------------| 1. | 1001 A 2 3 | 2. | 1001 A 2 3 | 3. | 1001 B 3 3 | 4. | 1001 C 1 3 | 5. | 1001 C 1 3 | 6. | 1001 C 1 3 | |--------------------------------------| 7. | 1002 A 2 2 | 8. | 1002 B 3 2 | |--------------------------------------| 9. | 1003 B 3 1 | 10. | 1003 B 3 1 | +--------------------------------------+

Please note from http://www.statalist.org/forums/help#stata key advice for questions here:

* Show a data example using dataex

* Show the exact code you used, not a rewriting

* Explain community-contributed commands you refer to.

https://www.stata-journal.com/sjpdf....iclenum=dm0042 makes a case that "unique" is never the best word unless you're referring to values that occur precisely once: "distinct" is recommended instead.

Note: William Lisowski uses in #2 a method also recommended in dm0042, as making use in two lines of "official" egen functions, rather than downloading a community-contributed one. But naturally it is the same answer either way.

Last edited by Nick Cox; 14 May 2019, 12:26.
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

14 May 2019, 13:29

I was confused in #3 by the sort order in #1, which seemed haphazard. Sorry about that. But if this is what you wanted then it matches your result, so I still don't understand the question, but for a different reason.

Code:

     +--------------------------------------+
     | Sessio~t   Facili~d   Count   wanted |
     |--------------------------------------|
  1. |     1001          A       2        2 |
  2. |     1001          A       2        2 |
  3. |     1002          A       2        2 |
     |--------------------------------------|
  4. |     1001          B       3        3 |
  5. |     1002          B       3        3 |
  6. |     1003          B       3        3 |
  7. |     1003          B       3        3 |
     |--------------------------------------|
  8. |     1001          C       1        1 |
  9. |     1001          C       1        1 |
 10. |     1001          C       1        1 |
     +--------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

15 May 2019, 01:37

#4 was incomplete. Here's the whole story:

Code:

clear
input Session_district    str1 Facility_id    Count
1001    A    2
1001    B    3
1001    A    2
1002    B    3
1002    A    2
1003    B    3
1003    B    3
1001    C    1
1001    C    1
1001    C    1
end

sort  Facility_id Session_district
egen wanted = nvals(Session_district), by(Facility_id) 

list, sepby(Facility_id) 

     +--------------------------------------+
     | Sessio~t   Facili~d   Count   wanted |
     |--------------------------------------|
  1. |     1001          A       2        2 |
  2. |     1001          A       2        2 |
  3. |     1002          A       2        2 |
     |--------------------------------------|
  4. |     1001          B       3        3 |
  5. |     1002          B       3        3 |
  6. |     1003          B       3        3 |
  7. |     1003          B       3        3 |
     |--------------------------------------|
  8. |     1001          C       1        1 |
  9. |     1001          C       1        1 |
 10. |     1001          C       1        1 |
     +--------------------------------------+

Announcement

Count the number of unique observation conditional on variable

Comment

Comment

Comment

Comment