Counter of unique observations contingent on a second variable

Filipp Sabitzer

Join Date: Jun 2018

Posts: 18
#1

Counter of unique observations contingent on a second variable

05 Jul 2018, 04:12

Hi all,

I have a question regarding building a counter as a new variable. If we consider the following table (I tried to use dataex, is this a correct use of it?):

input int(A B)
1 1 1
1 2 2
1 2 2
1 3 3
1 3 3
1 3 3
1 4 4
1 4 4
2 5 1
2 5 1
2 6 2
2 5 2
2 7 3

The first two variables are as given in my dataset. The third variable was added manually by me and is the variable I would like to create. It is supposed to show the number of unique counts of observations of variable B for a given variable A. For example: Variable A takes on the value 1 in the first eight observations. For these observations I would like to start a counter starting at 1 increasing incrementally for each new unique observation in variable B. The same process starts again with the next observation 2 in variable A. Again I would like to start a counter starting at 1 increasing incrementally for each new unique observation in variable B.

I would appreciate any help. Thank you.
Tags: None
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#2

05 Jul 2018, 04:24

Code:

gen uniqueCounter = 1 bysort A (B): replace uniqueCounter = uniqueCounter[_n-1] + 1 *(B != B[_n-1]) if _n != 1

Does that do what you want? The second line operates by A-value and sorts the data by the B variable. The value of the counter is equal to the value in the previous row (_n-1) and adds one to it if the current value of B is different from the value of B in the previous row (B[_n-1]). The second line does nothing for the first observation within each A-group, because there, B[_n-1] would be missing (that's the if _n != 1) part.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35807

05 Jul 2018, 04:48

This is longer but may be a little easier to think through than Jesse's code.

The trick is to tag the first occurrence of each distinct B within A with 1 (other observations being tagged with 0) and then to add them up.

The trickery is to get the sort order right. This wasn't my first attempt.

In a real problem you may have an identifier already, say a time variable. Given the example, I had to invent one.

Code:

clear

input int(A B C)
1 1 1
1 2 2
1 2 2
1 3 3
1 3 3
1 3 3
1 4 4
1 4 4
2 5 1
2 5 1
2 6 2
2 5 2
2 7 3
end

gen long id = _n
bysort A B (id): gen wanted = _n == 1
bysort A (id): replace wanted = sum(wanted)

list, sepby(A)


     +-------------------------+
     | A   B   C   id   wanted |
     |-------------------------|
  1. | 1   1   1    1        1 |
  2. | 1   2   2    2        2 |
  3. | 1   2   2    3        2 |
  4. | 1   3   3    4        3 |
  5. | 1   3   3    5        3 |
  6. | 1   3   3    6        3 |
  7. | 1   4   4    7        4 |
  8. | 1   4   4    8        4 |
     |-------------------------|
  9. | 2   5   1    9        1 |
 10. | 2   5   1   10        1 |
 11. | 2   6   2   11        2 |
 12. | 2   5   2   12        2 |
 13. | 2   7   3   13        3 |
     +-------------------------+

See also the FAQ How do I calculate the number of distinct values seen so far?
https://www.stata.com/support/faqs/d...stinct-values/

Last edited by Nick Cox; 05 Jul 2018, 05:37.

Comment

Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#4

05 Jul 2018, 06:28

1. Jesse's solution could be coded in 1-line coding, which might be also easier to explain the logic.

Code:

bys A (B): gen D = sum(B != B[_n-1])

2. There is a difference in the output of this solution and Nick's one in the observation 12, which makes me confused about what Filipp is seeking for.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35807
#5

05 Jul 2018, 07:11

For why people should (usually) say "distinct", not "unique" see https://www.stata-journal.com/sjpdf....iclenum=dm0042 p.558
1 like
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#6

05 Jul 2018, 08:38

Very “enjoyable” is your paper, Nick sensei.

And your solution at #3, given the description of Fillip at #1, is not merely an easier one as you said, but exactly the right solution for Fillip. Jesse’s and mine are going for other (if not wrong) dimension.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35807
#7

06 Jul 2018, 02:24

Romalpa: Thanks for your very nice remarks.
Comment

Announcement

Counter of unique observations contingent on a second variable

Comment

Comment

Comment

Comment

Comment

Comment