Remove observations when there are more than 2 rows with the same ID

Jess Milligan

Join Date: Apr 2019

Posts: 6
#1

Remove observations when there are more than 2 rows with the same ID

03 Apr 2019, 15:41

Hi,

I have a data set that uniquely defines a household via 2 variables: conglome vivienda

Within a household, codperso identifies individuals.

A variable p210 tells me if a person has a spouse living in the same household.

I would like to delete all observations that have households with more than 2 individuals who have a spouse in the household i.e. households with more than one married couple living in it.

I have done the following so far:

. sort conglome vivienda
. quietly by conglome vivienda: gen dup = cond(_N==1,0,_n) if p210==1
. tabulate dup

dup Freq. Percent Cum.

0 15 1.13 1.13
1 618 46.64 47.77
2 618 46.64 94.42
3 41 3.09 97.51
4 33 2.49 100.00

Essentially, for any household that has dup reaching 3 or 4, I want to delete all observations in that household (not just the observations for which dup == 3 | dup == 4).

Could anyone advise on a solution?

Thank you!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

03 Apr 2019, 16:19

Code:

by conglome vivienda, sort: egen to_drop = max(inlist(dup, 3, 4)) drop if to_drop
1 like
Comment
Jess Milligan

Join Date: Apr 2019

Posts: 6
#3

04 Apr 2019, 07:35

That worked perfectly - thank you so much Clyde!
Comment

Announcement