Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • extract individual information from hierarchical data when number of individual is uncertain

    Hey, statalisters, I encounter a problem when dealing with a survey dataset with household-individual hierarchical data structure. The data structure is like
    Code:
    hhid relhead isparents     char_ind  edu_parents_fa  edu_parents_mo····
    50     1        0                           .
    50     3        1                           .
    50     3        1                           .
    50     5        0                           .
    67     1        0                           1
    67     3        1                           1
    67     4        0                           1
    67     5        0                           1
    end
    where hhid is household id; relhead is relationship to householdhead,of which 3 represents head's parents who live in the households; isparents is a user-generated dummy represents the parents; char_ind represents full sets of individual characteristics of household members
    what I want to do is extract the demographic information of parents, what make the situation more complicated is there might parent(s) not live in the household in the sample,and their information are always asked and record in some separated variable (columns),say char_parents_mo. For example, see household 67, there is only one parent like in the household, and the other parent's information are record in the char_parents_#, alternatively, for household 57, both parents live in the family and their information are record in the corresponding char_ind variables, the char_parents_# variable for this household contain missing value. But , I don't know what's exact case for each household previously, both parents live in, one lives in or none live in. So how can I extract parents' information and create some household common variable (say, "the highest education attainment of head's parents for household i)"

    the associated question in the questionnaire is like, both variable value can be missing
    mother father
    Education attainment
    One strategy come into my mind is
    Code:
    // detect how many parents live in the household 
    bys hhid : egen num_parents = count (isparents)
    // give parents order, or further, identify father and mother
    bys hhid : egen order_parents = group(isparents)
    gen isfather = (isparents==1&gender==1)
    gen ismother = (isparents==1&gender==2)
    *========================
    *=== Both parents live in  ===
    *========================
    bys hhid:gen edu_parents_fa=edu_ind if isfather==1&num_parents==2 // father's education attainment
    bys hhid:gen edu_parents_mo=edu_ind if ismother==1&num_parents==2
    bys hhid :gen pubservant_parents_fa = (pubservant==1) if isfather==1&num_parents==2 // check if any parent is public servant
    bys hhid: gen pubservant_parents_mo = (pubservant==1) if ismother==1&num_parents==2
    ······
    
    *========================
    *=== one parents lives in  ===
    *========================
    bys hhid:gen edu_parents_fa1=edu_ind if isfather==1&num_parents==1
    bys hhid:gen edu_parents_mo1=edu_ind if ismother==1&num_parents==1
    bys hhid:gen  pubservant_parents_fa1 = (pubservant==1) if isfather==1&num_parents==1
    bys hhid:gen  pubservant_parents_mo1= (pubservant==1) if ismother==1&num_parents==1
    ······
    
    *================
    *== None live in   ==
    *================
    /*trivial*/
    
    *=====================
    *== Gen group variable ==
    *=====================
    gen highest_par_edu = rowmax(edu_parents_*)
    // compare and determine the highest education attainment of parents
    // There is a problem that because we don't know which parent (mother or father) live in the household, if we want to compare varlist with the same suffix using  wildcard, there can be an error of "variable has   been created" . In the second case (only one live in household)valist edu_parents_* may contains more than two variables 
    
    gen pubservant_par= (pubservant_parents_fa==1|pubservant_parents_mo==1)
    // determine whether there is a public servant
    It seems can work (I haven't tried ,as the data is placed on a separated computer for confidential reason, I need to write data previously). But It looks lumpy. I wonder if some more cute code can be presented to achieve similar goal. Moreover, there may also exist some household with only one parent's record (probably due to record error), then how can I deal with it

    Thank you so much for your valuable suggestion.


    Last edited by Zhang_Lu; 27 Feb 2015, 21:36.

  • #2
    Cross-posted at http://stackoverflow.com/questions/2...rarchical-data

    Please see the FAQ Advice for our policy on cross-posting, which is that you should tell us about it.

    Comment


    • #3
      Well, appologize for that, may be you can help to delete this thread here.

      Comment


      • #4
        No; I can't delete anything except occasionally my own posts. And, as said, the policy is just that we should know about it, as we now do. Cross-posting is not itself out of order.

        Comment


        • #5
          Exactly what survey is this--please provide a link to documentation if possible.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment

          Working...
          X