Matching characteristics of parents with children in the same household

Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#1

Matching characteristics of parents with children in the same household

27 Jan 2021, 08:32

Hello all,

Using household survey panel data I am trying to match the child's characteristics (age, gender,education) with the characteristics of their fathers and mothers. Let me explain the data first.

The 'IDHouse' represents each household within which there are multiple persons identified by 'A001A'. Each individual is given unique id based on 'IDHouse' and 'A001A' as 'ID', and each id has information for T=5 (Juli-November), where 'V1013' represents the time variable (t). Some characteristics of id change over time, for example, if id has a positive result for the COVID19 test in the month t.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(ID IDHouse) byte(A001A V1013 A003 A005) float(WoInc covid19) 1 1 1 5 2 5 1600 . 1 1 1 6 2 5 1600 . 1 1 1 7 2 5 1700 . 1 1 1 8 2 5 1700 1 1 1 1 9 2 5 1700 1 1 1 1 10 2 5 1700 1 1 1 1 11 2 5 1700 1 2 1 5 5 2 5 800 . 2 1 5 6 2 5 800 . 2 1 5 7 2 5 1045 . 2 1 5 8 2 5 1045 . 2 1 5 9 2 5 1045 . 2 1 5 10 2 5 1045 . 2 1 5 11 2 5 1045 . 3 1 5 5 2 5 0 . 3 1 5 6 2 5 0 . 3 1 5 7 2 5 0 . 3 1 5 8 2 5 0 1 3 1 5 9 2 5 0 1 3 1 5 10 2 5 0 1 3 1 5 11 2 5 0 1 4 1 10 5 1 2 0 . 4 1 10 6 1 2 0 . 4 1 10 7 1 2 0 . 4 1 10 8 1 2 0 . 4 1 10 9 1 2 0 . 4 1 10 10 1 2 0 . 4 1 10 11 1 2 0 . 5 2 1 5 1 7 3000 . 5 2 1 6 1 7 2000 . 5 2 1 11 1 7 3000 1 6 3 1 5 1 2 0 . 6 3 1 6 1 2 0 . 6 3 1 7 1 2 0 . 6 3 1 8 1 2 0 . 6 3 1 9 1 2 0 . 6 3 1 10 1 2 0 . 6 3 1 11 1 2 0 . 7 3 2 5 2 5 1000 . 7 3 2 6 2 5 1000 . 7 3 2 7 2 5 1000 1 7 3 2 8 2 5 1000 1 7 3 2 9 2 5 1000 1 7 3 2 10 2 5 1045 1 7 3 2 11 2 5 1400 1 8 3 4 5 2 7 0 . 8 3 4 6 2 7 0 . 8 3 4 7 2 7 0 . 8 3 4 8 2 7 0 . 8 3 4 9 2 7 0 . end

Essentially, I will work only with children, this means the values 4,5 and 6 by 'A001A'; and their parents (values 1,2 and 3 by 'A001A').
Then, my Stata code should identify the children and parents within the households and create new variables matching the children with the characteristics of their parents, such as gender, education, income and infection with COVID19 (respectively 'A003', 'A005', 'WoInc' and 'covid19').

Any suggestions?
Many thanks for any assistance received.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

27 Jan 2021, 11:00

I'm confused by your data. Why are there three parents instead of 2? Also, you refer to A001A ranging from 1 to 6, but in the data the values are 1, 2, 4, 5, and 10. So how do I correctly identify parents and children?

And I'm not entirely sure what you want the end result to look like. I think you want a single observation for each child, containing that child's original personal information plus additional variables showing the gender, education, income and covid19 status of each of that child's parents. Is that correct?
Comment

Vincent Li

Join Date: Dec 2016
Posts: 57

28 Jan 2021, 01:00

I'm not sure I understand you correctly. I'll separate the data set into children data set and parents data set and then merge them together. The code is like this:

Code:

use  "your original data set",clear

keep if A001A>=4&A001A<=6    //to create children data set
rename * *_c
rename IDHouse_c IDHouse
rename V1013_c V1013

save "children data set",replace

use  "your original data set",clear

keep if A001A>=1&A001A<=3     //to create parent data set
rename * *_p
rename IDHouse_p IDHouse
rename V1013_p V1013

save "parent data set",replace

use "children data set",clear
merge m:m IDHouse V1013 using "parent data set"
sort ID_c V1013    //to make it convenient when checking whether the information from parent data set is completely merged 
drop if _merge==2
drop _merge

save "a new data set"

don't know if you want to let time variable (V1013) of the child correspond to that of the parent. Also, it's possible to get wrong results when using -m:m merge- so we should be careful.

Look forward to other better solutions.

Comment

Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#4

28 Jan 2021, 02:14

Hello all,

many thanks for your replies.
I apologise for the confusion. The dataset is quite complex.

1. Three parents instead of 2.
A001A = 1 for the Head of household
A001A = 2 for the Partner of Head with different gender (heterosexual marriage)
A001A = 3 for the Partner of Head with the same gender (homosexual marriage)

2. Range of A001A
In the dataset, A001A ranges between 1 and 19, because we have different persons inside the household (Son/Daughter, Household servant, etc.)

3. How do I correctly identify parents and children?
Parents are the heads of the household and their partners (A001A = 1,2 and 3);
Children are the children of the head and his/her partner (A001A = 4), the children of the head only (A001A = 5), and the children of the partner only (A001A = 6).

4. Result to look like
Yes, you are right. I have already all these information for all household members in single lines. Then, I need to create (only for the children) new variables contatining the characteristics of their parents (gender, education, income and covid19 status).

@Vicent Li
Your code has duplicated the observations of children from 'IDHouse'=3. Note that in the original data we have 19 observations of children (14 for 'IDHouse'=1 and 5 for 'IDHouse'=3). But after the merge we have 24 observations of children (14 for 'IDHouse'=1 and 10 for 'IDHouse'=3).

Best Regards
Comment

Tharcisio Leone

Join Date: Sep 2019
Posts: 37

28 Jan 2021, 04:09

Hello everyone,

I think I got it.

Code:

// Matching characteristics of parents with children
by IDHouse V1013, sort: egen Head_sex = total(cond(A001A == 1, A003, .)) // Sex of Head

by IDHouse V1013, sort: egen Head_edu = total(cond(A001A == 1, A005, .)) // Education Head
by IDHouse V1013, sort: egen ParDS_edu = total(cond(A001A == 2, A005, .)) // Education Partner of Head (different Sex)
by IDHouse V1013, sort: egen ParSS_edu = total(cond(A001A == 3, A005, .)) // Education Partner of Head (same Sex)

by IDHouse V1013, sort: egen Head_covid = total(cond(A001A == 1, covid19, .)) // Head had COVID19
by IDHouse V1013, sort: egen ParDS_covid = total(cond(A001A == 2, covid19, .)) // Partner of Head (different Sex) had COVID19
by IDHouse V1013, sort: egen ParSS_covid = total(cond(A001A == 3, covid19, .)) // Partner of Head (same Sex) had COVID19
by IDHouse V1013, sort: egen MemHH_covid = total(cond(A001A>=7, covid19, .)) // Any oder household member had COVID19
replace MemHH_covid=1 if MemHH_covid>=1 // Transforming in dummy

But I am not satisfied with the codes. Any idea for the use of foreach or forvalues?
Look forward to other better solutions.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#6

28 Jan 2021, 13:04

Why would you want to replace these with -foreach- or -forvalues-? You came up with great, efficient, transparent, elegant code here, and you want to replace it with mediocre code that will run slower and be harder to understand? When in Stata you have a choice between -by- and -foreach/forvalues-, always go with -by-. And when you don't have the choice, if you are working with a large data set and what you want to do cannot be done with -by-, think about using the user-written -runby- command (by Robert Picard and me, available from SSC), which is like -by- for blocks of code instead of single commands.

Regarding #3, fortunately Tharcisio Leone recognized that what the -merge m:m- command produced was data salad, not usable results. -merge m:m- is a trap for the unwary. It puts data sets together in a way that is almost never what is wanted. (I have been using Stata daily since 1994 and in all that time I have only once encountered a situation where what -merge m:m- does would be useful; even then, there was a better way.) It produces results that look like a successful -merge- if you don't look too closely, but careful inspection will almost always reveal that the match-ups it makes are the wrong ones. Frankly, the following rule is close to exceptionless:
If you are thinking of using -merge m:m- either 1) you don't understand your data structure correctly, or 2) you really need -joinby- or -cross-, not -merge-.
Comment
Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#7

29 Jan 2021, 07:00

Dear Clyde,

thank you very much for the valuable feedback.
I have 8 other variables that I want to match with the children. Since that for each variable I have the matching for A001A== 1, A001A==2 and A001A==3, I would need to write 24 lines of codes using egen XX = total(cond(...)).

Therefore, the loops would allow me to run the same command for several variables at once without having to write separate lines of code.
I do not have so much experience with Stata as you, but in my opinion, the (most of the) loops are easy to proof, and they would keep my do-file concise and clean by minimizing the space taken up by repetitive commands.They are also safer than repeating code.

I did not know about your user-written -runby- command. But I will try to use it in my specification.

Last edited by Tharcisio Leone; 29 Jan 2021, 07:06.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#8

29 Jan 2021, 11:44

OK, I misunderstood your purpose and intent with regard to loops. I did not realize you were interested in doing the same thing with other variables. Here's how to loop over the variables of interest. You will need to expand the list of macros at the top of the code to cover each variable and provide the appropriate suffix for the result variable.

Code:

local A003 sex local A005 edu local covid19 covid foreach v of varlist A003 A005 covid19 { by IDHouse V1013, sort: egen Head_``v'' = total(cond(A001A == 1, `v', .)) by IDHouse V1013, sort: egen ParDS_``v'' = total(cond(A001A == 2, `v', .)) by IDHouse V1013, sort: egen ParSS_``v'' = total(cond(A001A == 3, `v', .)) } by IDHouse V1013, sort: egen MemHH_covid = total(cond(A001A>=7, covid19, .)) // Any oder household member had COVID19 replace MemHH_covid=1 if MemHH_covid>=1 // Transforming in dummy

The code for MemHH is not part of the loop because it is idiosyncratic and does not follow the pattern.

It is, in principle, possible to further "loopify" this code by looping over the values of A001A (1 to 3) within the loop over variables. But I recommend against it because it will make the code very opaque, and, in general, I tend to avoid making loops that will only iterate 2 or 3 times: it's just simpler to write the separate commands.

Note: no sample data provided, so code is untested. Beware of typos or other errors.
1 like
Comment

Announcement