Counting and flagging distinct strings

John Guzek

Join Date: Apr 2018

Posts: 4
#1

Counting and flagging distinct strings

26 Jun 2025, 07:48

Apologies if this is covered elsewhere, it seems straightforward but I'm unable to get to a suitable solution.

I have a dataset of panel data with multiple observations for each instance of name. Within these there may be multiple instances of id, I am trying to make a flag / count of these, ie trying to generate col flag:

id name flag
12 mark 1
12 mark 1
12 mark 1
14 mark 2
14 mark 2
14 mark 2
12 luke 1
12 luke 1
12 luke 1
12 luke 1
14 luke 2
14 luke 2
14 luke 2
15 luke 3
15 luke 3
Tags: None
Marc Kaulisch

Join Date: Jan 2016

Posts: 184
#2

26 Jun 2025, 08:05

Maybe someone has a better idea. My first take

Code:

egen id_name = tag(name id) bys name: egen ids_per_name = total(id_name)

Last edited by Marc Kaulisch; 26 Jun 2025, 08:09.
Comment
John Guzek

Join Date: Apr 2018

Posts: 4
#3

26 Jun 2025, 08:32

Thanks Marc. This nearly worked. In the end I used the following to create a second dataset and merged m:1 back on to the original, if useful for anyone else in the future:

keep name id
gen name_id= name + id
duplicates drop name_id, force
bysort name: gen x=_n
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1409

26 Jun 2025, 08:46

Thanks for providing the column of output you want, because honestly otherwise it wasn't clear to me what count you needed.

I don't think #2 achieves what you want. Here is some code that does what you need and shows the difference with #2:

Code:

clear
input byte id str4 name byte flag
12 mark 1
12 mark 1
12 mark 1
14 mark 2
14 mark 2
14 mark 2
12 luke 1
12 luke 1
12 luke 1
12 luke 1
14 luke 2
14 luke 2
14 luke 2
15 luke 3
15 luke 3
end

gen x = _n // I'll use this just to restore the original sort order below

* code from #2
egen id_name = tag(name id)
bys name: egen ids_per_name = total(id_name)

* my solution is basically just this one line
bys name (id): gen wanted = sum(id != id[_n-1])

sort x
drop x id_name

which produces:

Code:

. list , noobs sepby(id)

  +--------------------------------------+
  | id   name   flag   ids_pe~e   wanted |
  |--------------------------------------|
  | 12   mark      1          2        1 |
  | 12   mark      1          2        1 |
  | 12   mark      1          2        1 |
  |--------------------------------------|
  | 14   mark      2          2        2 |
  | 14   mark      2          2        2 |
  | 14   mark      2          2        2 |
  |--------------------------------------|
  | 12   luke      1          3        1 |
  | 12   luke      1          3        1 |
  | 12   luke      1          3        1 |
  | 12   luke      1          3        1 |
  |--------------------------------------|
  | 14   luke      2          3        2 |
  | 14   luke      2          3        2 |
  | 14   luke      2          3        2 |
  |--------------------------------------|
  | 15   luke      3          3        3 |
  | 15   luke      3          3        3 |
  +--------------------------------------+

Announcement

Counting and flagging distinct strings

Comment

Comment

Comment