Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating variables to characterize parents with household survey data

    Dear Stata community,

    I am conducting a study on child labor in my country, using data from Household Surveys. Since my regression will be performed only on the analysis units (i.e. minors, identified with the dummy child), I am interested in creating variables that include relevant information about their parents, such as age, race, education, etc. My data has the usual format, with both the household and individual identifiers (hh_id and id, respectively), and a particular variable that identifies the children's mother and father (or tutor) by their own ID (below, I include father_id):
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str22 hh_id byte(id gender age) float child int father_id
    "111-00416110273-A-0021" 1 1 42 0 997
    "111-00416110273-A-0021" 2 2 36 0 997
    "111-00416110273-A-0021" 3 1 19 0   1
    "111-00416110273-A-0021" 4 2 13 1   1
    "111-00416110273-A-0021" 5 1  3 0   1
    "111-00416110273-A-0021" 6 2 86 0 997
    "111-00416110273-A-0031" 1 1 44 0 997
    "111-00416110273-A-0031" 2 2 32 0 997
    "111-00416110273-A-0031" 3 2 11 1   1
    "111-00416110273-A-0031" 4 2  3 0   1
    "111-00416110273-A-0051" 1 1 41 0 997
    "111-00416110273-A-0051" 2 2 31 0 997
    "111-00416110273-A-0051" 3 2 14 1   1
    "111-00416110273-A-0051" 4 2 13 1   1
    "111-00416110273-A-0051" 5 1  8 1   1
    end
    label values gender gender
    label def gender 1 "1.men", modify
    label def gender 2 "2.women", modify
    Note that if the father or mother is not present in the household, this is recorded as '997'.

    My question is how can I create a variable that, for example, shows the father's age for each child, considering the possibility that within a household there may be different fathers for different children. Thank you all in advance for any commentary or suggestion.

  • #2
    Hi Mateo,

    I am hoping someone else has an elegant way to do this! In the mean time, I believe this works:

    Code:
    * set 997 to missing.
    replace father_id = . if father_id == 997
    gen father_age = .
    * loop through each child:
    forv child_n = 1/`=_N'{
        * loop through each father:
        forv father_n = 1/`=_N'{
            * find a matching child/father pair.
            if father_id[`child_n'] == id[`father_n'] ///
            & hh_id[`child_n'] == hh_id[`father_n'] ///
            & child[`child_n'] == 1{
                *assign father's values to child observation.
                replace father_age = age[`father_n'] in `child_n'
            }
        }
    }
    Can we safely assume that the father is always the member of the household with id == 1? If so, that might be a clue to a more elegant solution...

    Comment


    • #3
      Here is one way:
      Code:
      preserve
      keep hh_id id age //add additional variables here
      rename (id age /*add additional variables here*/) father_=
      tempfile temp
      save `temp'
      restore
      merge m:1 hh_id father_id using `temp', keep(1 3) nogen
      Additional variables can be pulled in as needed. Simply add the variables as specified on lines 2 and 3. For instance, to add a 'race' variable, simply change lines 2 and 3 to the following:

      Code:
      keep hh_id id age race //add additional variables here
      rename (id age race /*add additional variables here*/) father_=

      Comment


      • #4
        Thank you for suggestions, Daniel and Ali.

        Unfortunately, #2 did work, but it took a lot of time (I am dealing with 30K+ observations), whereas #3 did not run. After browsing on this forum, I have found what I wanted on the following post:

        https://www.statalist.org/forums/for...-per-household

        Using Nick's commentary, this is my code:

        Code:
        rangestat father_age= age, int(id father_id father_id) by(hh_id)
        Though it seems the most popular solution for matching ids within a household is using frames, which I am barely familiar with. Nevertheless, I appreciate your help!

        Comment


        • #5
          Excellent, thank you Mateo, for following up and re-posting the best solution.

          Yes, the amount of time my solution will take grows quadratically with respect to the number of observations in your dataset. Glad to know there is a generic and computationally efficient way to do this!

          Comment


          • #6
            As an addendum, in a sense, what Ali Atia does is similar to the way one might use frames to do this. At the risk of oversimplifying somewhat, Ali essentially creates two datasets - one for fathers and one for children - then merges them together using the relevant ids. One could easily imagine doing this with frames, which basically provide a convenient way to manage multiple datasets simultaneously in memory at once.
            Last edited by Daniel Schaefer; 24 Jun 2022, 15:24.

            Comment

            Working...
            X