Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting duplicate ids w.r.t to girls in a household

    Hi everyone,

    I have data on 590 households in the long format. Each household has 1 to 5 girls. My dataset, however, shows a lot of missing values since not every household has five girls. Is there a way I could only keep data for girls who are in the house?

    My variables include:

    id = household id
    girl = girl number in the house (1-5)
    nr_cr = total number of girls in the house
    female_name
    age

    I'm looking for something like

    id girl nr_cr female_name age

    299 1 1 aisha 19
    300 1 2 chiara 11
    300 2 2 doreen 12
    301 1 3 ananda 11
    301 2 3 annie 18
    301 3 3 chelsea 13

    Any help in this regard will be appreciated.

    Thanks.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id byte(girl nr_cr) str24 female_name_ byte female_old_
    60 1 1 "Nantumbwe Joan"     14
    60 2 1 ""                    .
    60 3 1 ""                    .
    60 4 1 ""                    .
    60 5 1 ""                    .
    61 1 2 "Nakato Silvia"      15
    61 2 2 "Babirye"            13
    61 3 2 ""                    .
    61 4 2 ""                    .
    61 5 2 ""                    .
    62 1 2 "Namakadde winnie"   18
    62 2 2 "NSuna Annet"        19
    62 3 2 ""                    .
    62 4 2 ""                    .
    62 5 2 ""                    .
    63 1 1 "Namutebi Mariam"    14
    63 2 1 ""                    .
    63 3 1 ""                    .
    63 4 1 ""                    .
    63 5 1 ""                    .
    64 1 3 "Nakayemba shakira"  17
    64 2 3 "Shadia Najjuko"     15
    64 3 3 "Sauda Nakasaga"     13
    64 4 3 ""                    .
    64 5 3 ""                    .
    65 1 1 "Nambatya Catherine" 19
    65 2 1 ""                    .
    65 3 1 ""                    .
    65 4 1 ""                    .
    65 5 1 ""                    .
    end


  • #2
    in your example data, age is always missing if name is always missing; but this may not be true in the full data set; what do you want to do if only one of the two is missing? here is some code that drops observations if both variables are missing:
    Code:
    drop if female_name_=="" & female_old_==.

    Comment

    Working...
    X