Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching characteristics of parents with children in the same household

    Hello all,

    Using household survey panel data I am trying to match the child's characteristics (age, gender,education) with the characteristics of their fathers and mothers. Let me explain the data first.

    The 'IDHouse' represents each household within which there are multiple persons identified by 'A001A'. Each individual is given unique id based on 'IDHouse' and 'A001A' as 'ID', and each id has information for T=5 (Juli-November), where 'V1013' represents the time variable (t). Some characteristics of id change over time, for example, if id has a positive result for the COVID19 test in the month t.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(ID IDHouse) byte(A001A V1013 A003 A005) float(WoInc covid19)
    1 1  1  5 2 5 1600 .
    1 1  1  6 2 5 1600 .
    1 1  1  7 2 5 1700 .
    1 1  1  8 2 5 1700 1
    1 1  1  9 2 5 1700 1
    1 1  1 10 2 5 1700 1
    1 1  1 11 2 5 1700 1
    2 1  5  5 2 5  800 .
    2 1  5  6 2 5  800 .
    2 1  5  7 2 5 1045 .
    2 1  5  8 2 5 1045 .
    2 1  5  9 2 5 1045 .
    2 1  5 10 2 5 1045 .
    2 1  5 11 2 5 1045 .
    3 1  5  5 2 5    0 .
    3 1  5  6 2 5    0 .
    3 1  5  7 2 5    0 .
    3 1  5  8 2 5    0 1
    3 1  5  9 2 5    0 1
    3 1  5 10 2 5    0 1
    3 1  5 11 2 5    0 1
    4 1 10  5 1 2    0 .
    4 1 10  6 1 2    0 .
    4 1 10  7 1 2    0 .
    4 1 10  8 1 2    0 .
    4 1 10  9 1 2    0 .
    4 1 10 10 1 2    0 .
    4 1 10 11 1 2    0 .
    5 2  1  5 1 7 3000 .
    5 2  1  6 1 7 2000 .
    5 2  1 11 1 7 3000 1
    6 3  1  5 1 2    0 .
    6 3  1  6 1 2    0 .
    6 3  1  7 1 2    0 .
    6 3  1  8 1 2    0 .
    6 3  1  9 1 2    0 .
    6 3  1 10 1 2    0 .
    6 3  1 11 1 2    0 .
    7 3  2  5 2 5 1000 .
    7 3  2  6 2 5 1000 .
    7 3  2  7 2 5 1000 1
    7 3  2  8 2 5 1000 1
    7 3  2  9 2 5 1000 1
    7 3  2 10 2 5 1045 1
    7 3  2 11 2 5 1400 1
    8 3  4  5 2 7    0 .
    8 3  4  6 2 7    0 .
    8 3  4  7 2 7    0 .
    8 3  4  8 2 7    0 .
    8 3  4  9 2 7    0 .
    end

    Essentially, I will work only with children, this means the values 4,5 and 6 by 'A001A'; and their parents (values 1,2 and 3 by 'A001A').
    Then, my Stata code should identify the children and parents within the households and create new variables matching the children with the characteristics of their parents, such as gender, education, income and infection with COVID19 (respectively 'A003', 'A005', 'WoInc' and 'covid19').

    Any suggestions?
    Many thanks for any assistance received.

  • #2
    I'm confused by your data. Why are there three parents instead of 2? Also, you refer to A001A ranging from 1 to 6, but in the data the values are 1, 2, 4, 5, and 10. So how do I correctly identify parents and children?

    And I'm not entirely sure what you want the end result to look like. I think you want a single observation for each child, containing that child's original personal information plus additional variables showing the gender, education, income and covid19 status of each of that child's parents. Is that correct?

    Comment


    • #3
      I'm not sure I understand you correctly. I'll separate the data set into children data set and parents data set and then merge them together. The code is like this:
      Code:
      use  "your original data set",clear
      
      keep if A001A>=4&A001A<=6    //to create children data set
      rename * *_c
      rename IDHouse_c IDHouse
      rename V1013_c V1013
      
      save "children data set",replace
      
      use  "your original data set",clear
      
      keep if A001A>=1&A001A<=3     //to create parent data set
      rename * *_p
      rename IDHouse_p IDHouse
      rename V1013_p V1013
      
      save "parent data set",replace
      
      use "children data set",clear
      merge m:m IDHouse V1013 using "parent data set"
      sort ID_c V1013    //to make it convenient when checking whether the information from parent data set is completely merged 
      drop if _merge==2
      drop _merge
      
      save "a new data set"
      don't know if you want to let time variable (V1013) of the child correspond to that of the parent. Also, it's possible to get wrong results when using -m:m merge- so we should be careful.

      Look forward to other better solutions.

      Comment


      • #4
        Hello all,

        many thanks for your replies.
        I apologise for the confusion. The dataset is quite complex.

        1. Three parents instead of 2.
        A001A = 1 for the Head of household
        A001A = 2 for the Partner of Head with different gender (heterosexual marriage)
        A001A = 3 for the Partner of Head with the same gender (homosexual marriage)

        2. Range of A001A
        In the dataset, A001A ranges between 1 and 19, because we have different persons inside the household (Son/Daughter, Household servant, etc.)

        3. How do I correctly identify parents and children?
        Parents are the heads of the household and their partners (A001A = 1,2 and 3);
        Children are the children of the head and his/her partner (A001A = 4), the children of the head only (A001A = 5), and the children of the partner only (A001A = 6).

        4. Result to look like
        Yes, you are right. I have already all these information for all household members in single lines. Then, I need to create (only for the children) new variables contatining the characteristics of their parents (gender, education, income and covid19 status).



        @Vicent Li
        Your code has duplicated the observations of children from 'IDHouse'=3. Note that in the original data we have 19 observations of children (14 for 'IDHouse'=1 and 5 for 'IDHouse'=3). But after the merge we have 24 observations of children (14 for 'IDHouse'=1 and 10 for 'IDHouse'=3).

        Best Regards

        Comment


        • #5
          Hello everyone,

          I think I got it.

          Code:
          // Matching characteristics of parents with children
          by IDHouse V1013, sort: egen Head_sex = total(cond(A001A == 1, A003, .)) // Sex of Head
          
          by IDHouse V1013, sort: egen Head_edu = total(cond(A001A == 1, A005, .)) // Education Head
          by IDHouse V1013, sort: egen ParDS_edu = total(cond(A001A == 2, A005, .)) // Education Partner of Head (different Sex)
          by IDHouse V1013, sort: egen ParSS_edu = total(cond(A001A == 3, A005, .)) // Education Partner of Head (same Sex)
          
          by IDHouse V1013, sort: egen Head_covid = total(cond(A001A == 1, covid19, .)) // Head had COVID19
          by IDHouse V1013, sort: egen ParDS_covid = total(cond(A001A == 2, covid19, .)) // Partner of Head (different Sex) had COVID19
          by IDHouse V1013, sort: egen ParSS_covid = total(cond(A001A == 3, covid19, .)) // Partner of Head (same Sex) had COVID19
          by IDHouse V1013, sort: egen MemHH_covid = total(cond(A001A>=7, covid19, .)) // Any oder household member had COVID19
          replace MemHH_covid=1 if MemHH_covid>=1 // Transforming in dummy
          But I am not satisfied with the codes. Any idea for the use of foreach or forvalues?
          Look forward to other better solutions.

          Comment


          • #6
            Why would you want to replace these with -foreach- or -forvalues-? You came up with great, efficient, transparent, elegant code here, and you want to replace it with mediocre code that will run slower and be harder to understand? When in Stata you have a choice between -by- and -foreach/forvalues-, always go with -by-. And when you don't have the choice, if you are working with a large data set and what you want to do cannot be done with -by-, think about using the user-written -runby- command (by Robert Picard and me, available from SSC), which is like -by- for blocks of code instead of single commands.

            Regarding #3, fortunately Tharcisio Leone recognized that what the -merge m:m- command produced was data salad, not usable results. -merge m:m- is a trap for the unwary. It puts data sets together in a way that is almost never what is wanted. (I have been using Stata daily since 1994 and in all that time I have only once encountered a situation where what -merge m:m- does would be useful; even then, there was a better way.) It produces results that look like a successful -merge- if you don't look too closely, but careful inspection will almost always reveal that the match-ups it makes are the wrong ones. Frankly, the following rule is close to exceptionless:
            If you are thinking of using -merge m:m- either 1) you don't understand your data structure correctly, or 2) you really need -joinby- or -cross-, not -merge-.

            Comment


            • #7
              Dear Clyde,

              thank you very much for the valuable feedback.
              I have 8 other variables that I want to match with the children. Since that for each variable I have the matching for A001A== 1, A001A==2 and A001A==3, I would need to write 24 lines of codes using egen XX = total(cond(...)).

              Therefore, the loops would allow me to run the same command for several variables at once without having to write separate lines of code.
              I do not have so much experience with Stata as you, but in my opinion, the (most of the) loops are easy to proof, and they would keep my do-file concise and clean by minimizing the space taken up by repetitive commands.They are also safer than repeating code.

              I did not know about your user-written -runby- command. But I will try to use it in my specification.
              Last edited by Tharcisio Leone; 29 Jan 2021, 07:06.

              Comment


              • #8
                OK, I misunderstood your purpose and intent with regard to loops. I did not realize you were interested in doing the same thing with other variables. Here's how to loop over the variables of interest. You will need to expand the list of macros at the top of the code to cover each variable and provide the appropriate suffix for the result variable.

                Code:
                local A003 sex
                local A005 edu
                local covid19 covid
                
                foreach v of varlist A003 A005 covid19 {
                    by IDHouse V1013, sort: egen Head_``v'' = total(cond(A001A == 1, `v', .))
                    by IDHouse V1013, sort: egen ParDS_``v'' = total(cond(A001A == 2, `v', .))
                    by IDHouse V1013, sort: egen ParSS_``v'' = total(cond(A001A == 3, `v', .))
                }
                
                
                by IDHouse V1013, sort: egen MemHH_covid = total(cond(A001A>=7, covid19, .)) // Any oder household member had COVID19
                replace MemHH_covid=1 if MemHH_covid>=1 // Transforming in dummy
                The code for MemHH is not part of the loop because it is idiosyncratic and does not follow the pattern.

                It is, in principle, possible to further "loopify" this code by looping over the values of A001A (1 to 3) within the loop over variables. But I recommend against it because it will make the code very opaque, and, in general, I tend to avoid making loops that will only iterate 2 or 3 times: it's just simpler to write the separate commands.

                Note: no sample data provided, so code is untested. Beware of typos or other errors.

                Comment

                Working...
                X