Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting number of siblings using by and _n

    Hi there,

    I'm working on a hierarchical dataset where individuals are nested within the household. I would like to count the number of brothers/sisters the individual has in the household. For each household, I have the person's id, their father/mother's id and gender.

    Code:
    clear
    input double(pid fid code gender codef codem)
    110001101 110001 101 1 201 202
    110001102 110001 102 0 204 205
    110002101 110002 101 1 201 202
    110002102 110002 102 0 203   .
    110002103 110002 103 0 101 102
    110003101 110003 101 1 201 202
    110003102 110003 102 0 203 204
    110003103 110003 103 0 101 102
    110005101 110005 101 0 102 103
    110005102 110005 102 1 201 104
    110005103 110005 103 0 202 203
    110005104 110005 104 0 204 205
    110006101 110006 101 1 201 202
    110006102 110006 102 0 203 204
    110006103 110006 103 0 101 102
    110007101 110007 101 0 102 104
    110007102 110007 102 1 201   .
    110007103 110007 103 1 102 104
    110007104 110007 104 0 202   .
    110009101 110009 101 0 201 202
    end
    
    label var pid "Individual ID"
    label var fid "household ID"
    label var code "ID within household"
    label var gender "gender"
    label var codef "father ID within household"
    label var codem "mother ID within household"


    In counting the number of brothers, for example, I'd like to first identify anyone who is male, and shares the same father id/ mother id, to be one's brother. And then counting the number of brothers one has in the household.

    I'd like to do this without looping through the data. Here's my try:

    Code:
    bysort fid  : g sib_male=sum((gender[_n+1]==1)  &((codef==codef[_n+1] & codef<. ) |  ///
                                                      (codem==codem[_n+1] & codem <. ) ) )
    bysort fid : replace sib_male=sib_male[_N]
    However, it seems that the code is incorrect, as it assigns everyone in the family has the same number of brothers.

    Can ayone help me figure out the correct way to conduct this? Many Thanks!
    Last edited by Donghui Wang; 06 Dec 2018, 09:15.

  • #2
    On the assumption that you want to count only full brothers (i.e. the brothers have the same mother and the same father), this will do it:

    Code:
    clear
    input double(pid fid code gender codef codem)
    110001101 110001 101 1 201 202
    110001102 110001 102 0 204 205
    110002101 110002 101 1 201 202
    110002102 110002 102 0 203   .
    110002103 110002 103 0 101 102
    110003101 110003 101 1 201 202
    110003102 110003 102 0 203 204
    110003103 110003 103 0 101 102
    110005101 110005 101 0 102 103
    110005102 110005 102 1 201 104
    110005103 110005 103 0 202 203
    110005104 110005 104 0 204 205
    110006101 110006 101 1 201 202
    110006102 110006 102 0 203 204
    110006103 110006 103 0 101 102
    110007101 110007 101 0 102 104
    110007102 110007 102 1 201   .
    110007103 110007 103 1 102 104
    110007104 110007 104 0 202   .
    110009101 110009 101 0 201 202
    end
    
    label var pid "Individual ID"
    label var fid "household ID"
    label var code "ID within household"
    label var gender "gender"
    label var codef "father ID within household"
    label var codem "mother ID within household"
    
    isid fid code
    
    //    CREATE A FILE OF ALL MALES
    preserve
    keep if gender == 1
    drop pid gender
    rename code* code*_brother 
    tempfile males
    save `males'
    restore
    
    
    //    MATCH EACH PERSON WITH ALL MALES IN THE SAME HOUSEHOLD
    joinby fid using `males', unmatched(master)
    //    REMOVE SELF-MATCHES AND MATCHES WITH FATHER
    replace code_brother = . if inlist(code_brother, codef, code)
    //    REMOVE IF NO COMMON PARENT
    replace code_brother = . if (codef != codef_brother) | (codem != codem_brother)
    collapse (count) num_brothers = code_brother (firstnm) gender codef codem, by(fid code)
    If you want to count half-brothers (i.e. they share one parent, but not necessarily both), change the | in -replace code_brother = . if (codef != codef_brother) | (codem != codem_brother)- to &.

    Comment


    • #3
      Thanks a lot !

      Comment

      Working...
      X