Creating variables to characterize parents with household survey data

Mateo Jimenez

Join Date: Mar 2022

Posts: 3
#1

Creating variables to characterize parents with household survey data

23 Jun 2022, 14:18

Dear Stata community,

I am conducting a study on child labor in my country, using data from Household Surveys. Since my regression will be performed only on the analysis units (i.e. minors, identified with the dummy child), I am interested in creating variables that include relevant information about their parents, such as age, race, education, etc. My data has the usual format, with both the household and individual identifiers (hh_id and id, respectively), and a particular variable that identifies the children's mother and father (or tutor) by their own ID (below, I include father_id):

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str22 hh_id byte(id gender age) float child int father_id "111-00416110273-A-0021" 1 1 42 0 997 "111-00416110273-A-0021" 2 2 36 0 997 "111-00416110273-A-0021" 3 1 19 0 1 "111-00416110273-A-0021" 4 2 13 1 1 "111-00416110273-A-0021" 5 1 3 0 1 "111-00416110273-A-0021" 6 2 86 0 997 "111-00416110273-A-0031" 1 1 44 0 997 "111-00416110273-A-0031" 2 2 32 0 997 "111-00416110273-A-0031" 3 2 11 1 1 "111-00416110273-A-0031" 4 2 3 0 1 "111-00416110273-A-0051" 1 1 41 0 997 "111-00416110273-A-0051" 2 2 31 0 997 "111-00416110273-A-0051" 3 2 14 1 1 "111-00416110273-A-0051" 4 2 13 1 1 "111-00416110273-A-0051" 5 1 8 1 1 end label values gender gender label def gender 1 "1.men", modify label def gender 2 "2.women", modify

Note that if the father or mother is not present in the household, this is recorded as '997'.

My question is how can I create a variable that, for example, shows the father's age for each child, considering the possibility that within a household there may be different fathers for different children. Thank you all in advance for any commentary or suggestion.
Tags: None

Daniel Schaefer

Join Date: Mar 2020
Posts: 842

23 Jun 2022, 15:35

Hi Mateo,

I am hoping someone else has an elegant way to do this! In the mean time, I believe this works:

Code:

* set 997 to missing.
replace father_id = . if father_id == 997
gen father_age = .
* loop through each child:
forv child_n = 1/`=_N'{
    * loop through each father:
    forv father_n = 1/`=_N'{
        * find a matching child/father pair.
        if father_id[`child_n'] == id[`father_n'] ///
        & hh_id[`child_n'] == hh_id[`father_n'] ///
        & child[`child_n'] == 1{
            *assign father's values to child observation.
            replace father_age = age[`father_n'] in `child_n'
        }
    }
}

Can we safely assume that the father is always the member of the household with id == 1? If so, that might be a clue to a more elegant solution...

Comment

Ali Atia

Join Date: May 2020
Posts: 737

23 Jun 2022, 15:37

Here is one way:

Code:

preserve
keep hh_id id age //add additional variables here
rename (id age /*add additional variables here*/) father_=
tempfile temp
save `temp'
restore
merge m:1 hh_id father_id using `temp', keep(1 3) nogen

Additional variables can be pulled in as needed. Simply add the variables as specified on lines 2 and 3. For instance, to add a 'race' variable, simply change lines 2 and 3 to the following:

Code:

keep hh_id id age race //add additional variables here
rename (id age race /*add additional variables here*/) father_=

Comment

Mateo Jimenez

Join Date: Mar 2022

Posts: 3
#4

24 Jun 2022, 14:56

Thank you for suggestions, Daniel and Ali.

Unfortunately, #2 did work, but it took a lot of time (I am dealing with 30K+ observations), whereas #3 did not run. After browsing on this forum, I have found what I wanted on the following post:

https://www.statalist.org/forums/for...-per-household

Using Nick's commentary, this is my code:

Code:

rangestat father_age= age, int(id father_id father_id) by(hh_id)

Though it seems the most popular solution for matching ids within a household is using frames, which I am barely familiar with. Nevertheless, I appreciate your help!
1 like
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 842
#5

24 Jun 2022, 15:09

Excellent, thank you Mateo, for following up and re-posting the best solution.

Yes, the amount of time my solution will take grows quadratically with respect to the number of observations in your dataset. Glad to know there is a generic and computationally efficient way to do this!
1 like
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 842
#6

24 Jun 2022, 15:17

As an addendum, in a sense, what Ali Atia does is similar to the way one might use frames to do this. At the risk of oversimplifying somewhat, Ali essentially creates two datasets - one for fathers and one for children - then merges them together using the relevant ids. One could easily imagine doing this with frames, which basically provide a convenient way to manage multiple datasets simultaneously in memory at once.

Last edited by Daniel Schaefer; 24 Jun 2022, 15:24.
Comment

Announcement