Count Unique Sub-Group Values within Group in Long Data

Rebecca Ivester

Join Date: Jul 2019

Posts: 36
#1

Count Unique Sub-Group Values within Group in Long Data

02 May 2025, 00:59

I cannot for the life of me figure out how to use a counter or egen or _n to create the variable "Cond_Num_Within_Person," depicted in the third column, below. Help!

I want the counter to (1) reset at each new person and (2) only augment when/if a new condition is seen within that individual. I have manually inputted what I hope for in column three.

Person_ID Condition Cond_Num_Within_Person

Abby T 1

Abby T 1

Abby T 1

Abby C 2

Ben T 1

Ben T 1

Ben T 1

Carl T 1

Carl C 2

Carl C 2

Carl C 2
Tags: counter, _n

Nick Cox

Join Date: Mar 2014
Posts: 35725

02 May 2025, 02:08

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str4 person_id str1 condition byte cond_num_within_person
"Abby" "T" 1
"Abby" "T" 1
"Abby" "T" 1
"Abby" "C" 2
"Ben"  "T" 1
"Ben"  "T" 1
"Ben"  "T" 1
"Carl" "T" 1
"Carl" "C" 2
"Carl" "C" 2
"Carl" "C" 2
end


gen long id = _n
bysort person_id (id) : gen wanted = sum(condition != condition[_n-1])

list, sepby(person_id condition)

Code:

     +----------------------------------------------+
     | person~d   condit~n   cond_n~n   id   wanted |
     |----------------------------------------------|
  1. |     Abby          T          1    1        1 |
  2. |     Abby          T          1    2        1 |
  3. |     Abby          T          1    3        1 |
     |----------------------------------------------|
  4. |     Abby          C          2    4        2 |
     |----------------------------------------------|
  5. |      Ben          T          1    5        1 |
  6. |      Ben          T          1    6        1 |
  7. |      Ben          T          1    7        1 |
     |----------------------------------------------|
  8. |     Carl          T          1    8        1 |
     |----------------------------------------------|
  9. |     Carl          C          2    9        2 |
 10. |     Carl          C          2   10        2 |
 11. |     Carl          C          2   11        2 |
     +----------------------------------------------+

Notes:

1. The extra id variable is there to ensure that when you sort by identifier the sort order is otherwise preserved. You may (indeed should) have a variable indicating time or order that you could use instead.

2. You want to bump up a count whenever the condition changes. Within groups of observations, you can compare with the previous value. That works too for the first observation for each person. If the observation number _n is 1 then _n - 1 is 0 and Stata evaluates any reference to condition[0] with an empty string "", which is different from "T" or "C". If empty or missing strings were possible as values of condition[1], you would just need more complicated code.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35725
#3

02 May 2025, 02:28

https://www.stata-journal.com/articl...article=dm0029 surveys several principles here.

tsspell from SSC is a pertinent command, but it requires some extra steps to be applicable in your case.

I would strongly recommend the term distinct here. In contrast unique still carries the primary meaning of occurring once only, not at all the key point here. For that distinction belaboured at length, see Section 2 of https://journals.sagepub.com/doi/epd...867X0800800408
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#4

02 May 2025, 03:12

A variant of the problem is that you only want to increment the counter if this is a condition never experienced before by that person. So anyone going C T C or T C T would be matched by 1 2 1 not 1 2 3. If that is what you want, please flag with an extended data example.
Comment

Announcement