Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove observations when there are more than 2 rows with the same ID

    Hi,

    I have a data set that uniquely defines a household via 2 variables: conglome vivienda

    Within a household, codperso identifies individuals.

    A variable p210 tells me if a person has a spouse living in the same household.

    I would like to delete all observations that have households with more than 2 individuals who have a spouse in the household i.e. households with more than one married couple living in it.

    I have done the following so far:

    . sort conglome vivienda
    . quietly by conglome vivienda: gen dup = cond(_N==1,0,_n) if p210==1
    . tabulate dup

    dup Freq. Percent Cum.

    0 15 1.13 1.13
    1 618 46.64 47.77
    2 618 46.64 94.42
    3 41 3.09 97.51
    4 33 2.49 100.00

    Essentially, for any household that has dup reaching 3 or 4, I want to delete all observations in that household (not just the observations for which dup == 3 | dup == 4).

    Could anyone advise on a solution?

    Thank you!



  • #2
    Code:
    by conglome vivienda, sort: egen to_drop = max(inlist(dup, 3, 4))
    drop if to_drop

    Comment


    • #3
      That worked perfectly - thank you so much Clyde!

      Comment

      Working...
      X