Add some kind of identificator to observations when some of them has the same value

enrique labrada

Join Date: Nov 2023

Posts: 6
#1

Add some kind of identificator to observations when some of them has the same value

09 Nov 2023, 20:25

How could add an identificator to an observation when some of them have the same value?

For example,

idhh rel
11 Jefe(a)
11 Conyuge
11 Hijo
11 Hijo
11 Nieto(a)
11 Nieto(a)

For example, idhh = Household ID, and rel = relationship. So I need to add a suffix or tag to those observations that are the same. For example, Hijo01, Hijo02, or Nieto(a)01, Nieto(a)02. I can´t do it one by one, because my dataset contains 83,000 observations, and many households have the same characteristic.
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17724

10 Nov 2023, 00:12

Enrique:
welcome to this forum.
Do you mean something along the following lines?

Code:

. bysort idhl: gen wanted=1 if rel=="Hijo" | rel=="Nieto" | rel=="Nieto(a)"


. list

     +-------------------------+
     | idhl       rel   wanted |
     |-------------------------|
  1. |   11      Jefe        . |
  2. |   11   Conyuge        . |
  3. |   11      Hijo        1 |
  4. |   11      Hijo        1 |
  5. |   11     Nieto        1 |
     |-------------------------|
  6. |   11     Nieto        1 |
     +-------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35754
#3

10 Nov 2023, 01:21

See also

Code:

help duplicates
1 like
Comment
enrique labrada

Join Date: Nov 2023

Posts: 6
#4

11 Nov 2023, 17:11

Thank you Carlo and Nick!! But how could I differentiate the observations when they are the same? For example how could I get:

From this:
idhh rel
11 Jefe(a)
11 Conyuge
11 Hijo
11 Hijo
11 Nieto(a)
11 Nieto(a)

To this:

idhh rel
11 Jefe(a)
11 Conyuge
11 Hijo01
11 Hijo02
11 Nieto(a)01
11 Nieto(a)02

Is there a way that I could add any kind of suffix such as: 01,02,03, when I have the same observations?

Last edited by enrique labrada; 11 Nov 2023, 17:13.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17724

12 Nov 2023, 02:20

Enrique:
do you mean something along the following lines?

Code:

. bysort idhh: gen wanted=1 if rel=="Hijo" | rel=="Nieto" | rel=="Nieto(a)"

. bysort idhh rel: gen count=sum( wanted)

. egen final=concat( rel count)

. list

     +------------------------------------------+
     | idhh      rel   wanted   count     final |
     |------------------------------------------|
  1. |   11   Conyug        .       0   Conyug0 |
  2. |   11     Hijo        1       1     Hijo1 |
  3. |   11     Hijo        1       2     Hijo2 |
  4. |   11     Jefe        .       0     Jefe0 |
  5. |   11    Nieto        1       1    Nieto1 |
     |------------------------------------------|
  6. |   11    Nieto        1       2    Nieto2 |
     +------------------------------------------+


drop wanted count

.

Kind regards,
Carlo
(Stata 19.0)

Comment

enrique labrada

Join Date: Nov 2023

Posts: 6
#6

12 Nov 2023, 11:10

Thank you very much Carlo!! I think with that syntax I can solve my problem!!
Comment
enrique labrada

Join Date: Nov 2023

Posts: 6
#7

12 Nov 2023, 16:10

Hi hello!! How could I drop the "." of the observations?

idhh relationship
11 Conyuge..
11 Hijo1.
11 Hijo2.
11 Jefe(a)..
11 Nieto.1
11 Nieto.2

I could do it in the editor in a manual form, but I have 83,000 observations. So it would be hard to do it in that way.

thank you!!
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4485
#8

12 Nov 2023, 18:06

because of the way you provided your example (please read about -dataex- in the FAQ), I can't be sure that the following is what you want but see

Code:

help subinstr()
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17724

13 Nov 2023, 00:56

Enrique:
I echo Rich's wise advise to be clearer about your issue(s) at your first post, so that interested listers can help you out with a comprehensive solution (if any exists).
Just for fun: in some parts of Italy we say that "things should not be presented by instalments"

.
That said, you may want to try something along the following lines:

Code:

. split relationship, p(.)

. list

     +----------------------------------------+
     | idhh   relatio~p   relati~1   relati~2 |
     |----------------------------------------|
  1. |   11   Conyuge..    Conyuge            |
  2. |   11      Hijo1.      Hijo1            |
  3. |   11      Hijo2.      Hijo2            |
  4. |   11        Jefe       Jefe            |
  5. |   11     Nieto.1      Nieto          1 |
     |----------------------------------------|
  6. |   11     Nieto.2      Nieto          2 |
     +----------------------------------------+

. drop relationship relationship2 relationship

. list

     +-----------------+
     | idhh   relati~1 |
     |-----------------|
  1. |   11    Conyuge |
  2. |   11      Hijo1 |
  3. |   11      Hijo2 |
  4. |   11       Jefe |
  5. |   11      Nieto |
     |-----------------|
  6. |   11      Nieto |
     +-----------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35754
#10

13 Nov 2023, 01:55

I agree with Carlo Lazzaro and Rich Goldstein that your existing and desired variables are not very clear. This may help, but if your rel is really a numeric variable with value labels, you will need something else.

Indeed #1 and #2 ask for suffixes such as 01 and 02, but #7 asks for 1 and 2.

Code:

clear input dhh str42 rel 11 Jefe(a) 11 Conyuge 11 Hijo 11 Hijo 11 Nieto(a) 11 Nieto(a) end bysort dhh rel : gen rel2 = cond(_N == 1, rel, rel + strofreal(_n, "%02.0f")) list, sepby(rel) +-------------------------+ | dhh rel rel2 | |-------------------------| 1. | 11 Conyuge Conyuge | |-------------------------| 2. | 11 Hijo Hijo01 | 3. | 11 Hijo Hijo02 | |-------------------------| 4. | 11 Jefe Jefe | |-------------------------| 5. | 11 Nieto Nieto01 | 6. | 11 Nieto Nieto02 | +-------------------------+

If you don't want the leading zeros, just add strofreal(_n). If you want an extra space, which in my view would look better, add " " + strofreal(_n) -- and so on.

Leading zeros have a point for large families as otherwise Nieto10 would sort before Nieto2 and so on.

Last edited by Nick Cox; 13 Nov 2023, 02:14.
1 like
Comment

Announcement