Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count Unique Sub-Group Values within Group in Long Data

    I cannot for the life of me figure out how to use a counter or egen or _n to create the variable "Cond_Num_Within_Person," depicted in the third column, below. Help!

    I want the counter to (1) reset at each new person and (2) only augment when/if a new condition is seen within that individual. I have manually inputted what I hope for in column three.
    Person_ID Condition Cond_Num_Within_Person
    Abby T 1
    Abby T 1
    Abby T 1
    Abby C 2
    Ben T 1
    Ben T 1
    Ben T 1
    Carl T 1
    Carl C 2
    Carl C 2
    Carl C 2

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4 person_id str1 condition byte cond_num_within_person
    "Abby" "T" 1
    "Abby" "T" 1
    "Abby" "T" 1
    "Abby" "C" 2
    "Ben"  "T" 1
    "Ben"  "T" 1
    "Ben"  "T" 1
    "Carl" "T" 1
    "Carl" "C" 2
    "Carl" "C" 2
    "Carl" "C" 2
    end
    
    
    gen long id = _n
    bysort person_id (id) : gen wanted = sum(condition != condition[_n-1])
    
    list, sepby(person_id condition)
    Code:
         +----------------------------------------------+
         | person~d   condit~n   cond_n~n   id   wanted |
         |----------------------------------------------|
      1. |     Abby          T          1    1        1 |
      2. |     Abby          T          1    2        1 |
      3. |     Abby          T          1    3        1 |
         |----------------------------------------------|
      4. |     Abby          C          2    4        2 |
         |----------------------------------------------|
      5. |      Ben          T          1    5        1 |
      6. |      Ben          T          1    6        1 |
      7. |      Ben          T          1    7        1 |
         |----------------------------------------------|
      8. |     Carl          T          1    8        1 |
         |----------------------------------------------|
      9. |     Carl          C          2    9        2 |
     10. |     Carl          C          2   10        2 |
     11. |     Carl          C          2   11        2 |
         +----------------------------------------------+
    Notes:

    1. The extra id variable is there to ensure that when you sort by identifier the sort order is otherwise preserved. You may (indeed should) have a variable indicating time or order that you could use instead.

    2. You want to bump up a count whenever the condition changes. Within groups of observations, you can compare with the previous value. That works too for the first observation for each person. If the observation number _n is 1 then _n - 1 is 0 and Stata evaluates any reference to condition[0] with an empty string "", which is different from "T" or "C". If empty or missing strings were possible as values of condition[1], you would just need more complicated code.

    Comment


    • #3
      https://www.stata-journal.com/articl...article=dm0029 surveys several principles here.

      tsspell from SSC is a pertinent command, but it requires some extra steps to be applicable in your case.

      I would strongly recommend the term distinct here. In contrast unique still carries the primary meaning of occurring once only, not at all the key point here. For that distinction belaboured at length, see Section 2 of https://journals.sagepub.com/doi/epd...867X0800800408

      Comment


      • #4
        A variant of the problem is that you only want to increment the counter if this is a condition never experienced before by that person. So anyone going C T C or T C T would be matched by 1 2 1 not 1 2 3. If that is what you want, please flag with an extended data example.

        Comment

        Working...
        X