Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting and flagging distinct strings

    Apologies if this is covered elsewhere, it seems straightforward but I'm unable to get to a suitable solution.

    I have a dataset of panel data with multiple observations for each instance of name. Within these there may be multiple instances of id, I am trying to make a flag / count of these, ie trying to generate col flag:

    id name flag
    12 mark 1
    12 mark 1
    12 mark 1
    14 mark 2
    14 mark 2
    14 mark 2
    12 luke 1
    12 luke 1
    12 luke 1
    12 luke 1
    14 luke 2
    14 luke 2
    14 luke 2
    15 luke 3
    15 luke 3

  • #2
    Maybe someone has a better idea. My first take

    Code:
    egen id_name = tag(name id)
    bys name: egen ids_per_name = total(id_name)
    Last edited by Marc Kaulisch; 26 Jun 2025, 08:09.

    Comment


    • #3
      Thanks Marc. This nearly worked. In the end I used the following to create a second dataset and merged m:1 back on to the original, if useful for anyone else in the future:

      keep name id
      gen name_id= name + id
      duplicates drop name_id, force
      bysort name: gen x=_n

      Comment


      • #4
        Thanks for providing the column of output you want, because honestly otherwise it wasn't clear to me what count you needed.

        I don't think #2 achieves what you want. Here is some code that does what you need and shows the difference with #2:

        Code:
        clear
        input byte id str4 name byte flag
        12 mark 1
        12 mark 1
        12 mark 1
        14 mark 2
        14 mark 2
        14 mark 2
        12 luke 1
        12 luke 1
        12 luke 1
        12 luke 1
        14 luke 2
        14 luke 2
        14 luke 2
        15 luke 3
        15 luke 3
        end
        
        gen x = _n // I'll use this just to restore the original sort order below
        
        * code from #2
        egen id_name = tag(name id)
        bys name: egen ids_per_name = total(id_name)
        
        * my solution is basically just this one line
        bys name (id): gen wanted = sum(id != id[_n-1])
        
        sort x
        drop x id_name
        which produces:
        Code:
        . list , noobs sepby(id)
        
          +--------------------------------------+
          | id   name   flag   ids_pe~e   wanted |
          |--------------------------------------|
          | 12   mark      1          2        1 |
          | 12   mark      1          2        1 |
          | 12   mark      1          2        1 |
          |--------------------------------------|
          | 14   mark      2          2        2 |
          | 14   mark      2          2        2 |
          | 14   mark      2          2        2 |
          |--------------------------------------|
          | 12   luke      1          3        1 |
          | 12   luke      1          3        1 |
          | 12   luke      1          3        1 |
          | 12   luke      1          3        1 |
          |--------------------------------------|
          | 14   luke      2          3        2 |
          | 14   luke      2          3        2 |
          | 14   luke      2          3        2 |
          |--------------------------------------|
          | 15   luke      3          3        3 |
          | 15   luke      3          3        3 |
          +--------------------------------------+

        Comment

        Working...
        X