Hey, statalisters, I encounter a problem when dealing with a survey dataset with household-individual hierarchical data structure. The data structure is like
where hhid is household id; relhead is relationship to householdhead,of which 3 represents head's parents who live in the households; isparents is a user-generated dummy represents the parents; char_ind represents full sets of individual characteristics of household members
what I want to do is extract the demographic information of parents, what make the situation more complicated is there might parent(s) not live in the household in the sample,and their information are always asked and record in some separated variable (columns),say char_parents_mo. For example, see household 67, there is only one parent like in the household, and the other parent's information are record in the char_parents_#, alternatively, for household 57, both parents live in the family and their information are record in the corresponding char_ind variables, the char_parents_# variable for this household contain missing value. But , I don't know what's exact case for each household previously, both parents live in, one lives in or none live in. So how can I extract parents' information and create some household common variable (say, "the highest education attainment of head's parents for household i)"
the associated question in the questionnaire is like, both variable value can be missing
One strategy come into my mind is
It seems can work (I haven't tried ,as the data is placed on a separated computer for confidential reason, I need to write data previously). But It looks lumpy. I wonder if some more cute code can be presented to achieve similar goal. Moreover, there may also exist some household with only one parent's record (probably due to record error), then how can I deal with it
Thank you so much for your valuable suggestion.
Code:
hhid relhead isparents char_ind edu_parents_fa edu_parents_mo···· 50 1 0 . 50 3 1 . 50 3 1 . 50 5 0 . 67 1 0 1 67 3 1 1 67 4 0 1 67 5 0 1 end
what I want to do is extract the demographic information of parents, what make the situation more complicated is there might parent(s) not live in the household in the sample,and their information are always asked and record in some separated variable (columns),say char_parents_mo. For example, see household 67, there is only one parent like in the household, and the other parent's information are record in the char_parents_#, alternatively, for household 57, both parents live in the family and their information are record in the corresponding char_ind variables, the char_parents_# variable for this household contain missing value. But , I don't know what's exact case for each household previously, both parents live in, one lives in or none live in. So how can I extract parents' information and create some household common variable (say, "the highest education attainment of head's parents for household i)"
the associated question in the questionnaire is like, both variable value can be missing
mother | father | |
Education attainment |
Code:
// detect how many parents live in the household bys hhid : egen num_parents = count (isparents) // give parents order, or further, identify father and mother bys hhid : egen order_parents = group(isparents) gen isfather = (isparents==1&gender==1) gen ismother = (isparents==1&gender==2) *======================== *=== Both parents live in === *======================== bys hhid:gen edu_parents_fa=edu_ind if isfather==1&num_parents==2 // father's education attainment bys hhid:gen edu_parents_mo=edu_ind if ismother==1&num_parents==2 bys hhid :gen pubservant_parents_fa = (pubservant==1) if isfather==1&num_parents==2 // check if any parent is public servant bys hhid: gen pubservant_parents_mo = (pubservant==1) if ismother==1&num_parents==2 ······ *======================== *=== one parents lives in === *======================== bys hhid:gen edu_parents_fa1=edu_ind if isfather==1&num_parents==1 bys hhid:gen edu_parents_mo1=edu_ind if ismother==1&num_parents==1 bys hhid:gen pubservant_parents_fa1 = (pubservant==1) if isfather==1&num_parents==1 bys hhid:gen pubservant_parents_mo1= (pubservant==1) if ismother==1&num_parents==1 ······ *================ *== None live in == *================ /*trivial*/ *===================== *== Gen group variable == *===================== gen highest_par_edu = rowmax(edu_parents_*) // compare and determine the highest education attainment of parents // There is a problem that because we don't know which parent (mother or father) live in the household, if we want to compare varlist with the same suffix using wildcard, there can be an error of "variable has been created" . In the second case (only one live in household)valist edu_parents_* may contains more than two variables gen pubservant_par= (pubservant_parents_fa==1|pubservant_parents_mo==1) // determine whether there is a public servant
Thank you so much for your valuable suggestion.
Comment