Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem trying to match ID from different data sets.

    Hi all,

    I need to link children to their parents through an id number (called household_number in the data). I have a data set for children an also for parents.
    I added some screenshots which shows the data I use. My goal is that I can link every child to his/her parents, and to delete id's with no children. Is there anyone who can help me out?
    Thanks in advance!
    Attached Files

  • #2
    Sorry for the large screenshots, but here is my data. I need to link Household_number to household_number. Because I want to match birthweight of the child with the other variables.
    For instance, household number 6 exists out of 4 persons. Now I want to link the child with household_number 6 to the rows of Household_number 6. So I can link the birthweight with parents education level etc.

    input byte(Sex Race Age Relation_dad Relation_mom) long Household_number byte(Health_status Education_level Total_earnings_year) long household_number int birth_weight
    2 2 79 . . 1 3 9 . 4 3743
    2 2 55 . . 4 3 14 . 6 4310
    1 2 49 . . 4 3 15 7 8 3856
    1 2 16 1 1 4 3 10 . 10 3431
    1 2 37 . . 6 3 21 11 13 3714
    2 2 36 . . 6 2 18 11 18 2835
    1 2 9 1 1 6 1 3 . 30 3941
    1 2 7 1 1 6 1 0 . 38 3459
    1 2 29 . . 8 2 16 8 45 3856
    2 2 34 . . 8 1 15 8 47 2694
    1 2 3 1 1 8 1 96 . 50 2637
    1 4 75 . . 9 5 10 . 52 3317
    1 2 39 . . 10 2 18 11 57 3743
    2 2 33 . . 10 2 15 . 58 3346
    2 2 3 1 1 10 2 96 . 76 2949
    1 2 0 1 1 10 2 96 . 92 3516
    2 3 54 . . 13 2 15 11 99 4083
    1 3 10 . 1 13 1 5 . 100 3204
    1 2 85 . . 15 3 14 . 105 3261
    2 2 85 . . 15 2 14 . 119 4111
    1 3 85 . . 17 4 18 . 124 3431
    2 3 85 . . 17 4 18 . 125 2949
    2 3 60 1 1 17 3 18 99 127 3516
    1 2 31 . . 18 1 19 10 128 3204
    2 2 28 . . 18 1 19 8 132 2864
    2 2 1 1 1 18 1 96 . 151 4083
    1 2 65 . . 19 2 21 11 154 1446
    2 2 67 . . 19 2 19 . 157 3232
    1 2 68 . . 20 4 16 . 162 3346
    2 2 66 . . 20 2 18 11 166 3629
    1 3 76 . . 21 4 14 . 171 2523
    2 2 67 . . 22 4 14 . 204 2977
    1 2 69 . . 23 1 18 1 208 3884
    2 2 73 . . 23 1 19 . 212 2949
    1 2 65 . . 24 3 14 2 215 3487
    1 1 54 . . 25 3 10 97 217 3090
    2 2 53 . . 25 3 10 97 219 3119
    1 2 30 . . 27 2 16 9 222 3346
    2 2 32 . . 27 2 16 9 238 2041
    1 2 14 2 1 27 2 8 . 245 3402
    1 2 9 2 1 27 2 5 . 246 2694
    1 2 40 . . 29 1 15 10 250 3402
    1 2 46 . . 30 1 15 7 256 3232
    2 2 44 . . 30 2 16 99 260 2977
    1 2 15 1 1 30 1 9 . 268 2977
    2 2 43 . . 31 2 15 . 270 3998
    1 2 50 . . 31 2 14 . 275 3799
    2 1 20 . . 32 4 14 . 278 2864
    1 1 26 . . 32 2 14 7 283 2608
    2 2 52 . . 33 2 15 4 284 2495
    1 2 55 . . 33 3 15 8 289 3289
    2 3 38 . . 38 1 19 9 302 3232
    1 3 14 . 1 38 1 8 . 305 3629
    2 4 48 . . 40 1 19 8 314 3261
    2 1 61 . . 41 4 6 . 315 3629
    2 1 84 . . 43 4 6 . 322 3516
    2 2 38 . . 45 1 15 97 323 3629
    1 2 40 . . 45 2 15 97 328 2495
    1 2 18 1 1 45 1 11 97 334 3714

    Comment


    • #3
      Thanks for switching over to -dataex- in #2. Unfortunately, you only posted one of the data sets, and you don't even say which one it is, though I guess it's the children because there is a birthweight variable.

      Nevertheless, I think I see what your situation is. I believe you have the following situation: you have a household data set in which each household has a single observation, and you have another data set with information about children that may contain several children in each household, or perhaps just one, or perhaps none at all.

      Code:
      use household_data_set, clear
      
      merge 1:m household_number using children_data_set, keep(match) nogenerate
      will combine the data set, pairing up each child observation with the corresponding household observation. Also, because of the -keep(match)- observation, any households with no children will be removed from the data set.

      Correction: I see that I read too quickly and glossed over the distinction between household_number and Household_number. (I noticed it but thought it was just a typo.) But now I see in your -dataex- example that those are two different variables. So the above will be incorrect.

      Also, it appears that you actually have three data sets. One is just a list of households, another is a data set of parents, and the last is a data set of children. Any given household can have more than one parent, and more than one child. And you want to match up the children with a given household_number to the parents having that same value as their Household_number. Try this instead:

      Code:
      use child_data_set, clear
      clonevar link = household_number
      tempfile holding
      save `holding'
      use parent_data_set, clear
      clonevar link = Household_number
      joinby link using `holding' , unmatched(using)
      Added: You may be different, but I think I would go insane working with data that has one id called household_number and another called Household_number. It would be very hard to keep straight which of those was which. So I would change those names to something that would tell me.
      Last edited by Clyde Schechter; 07 Apr 2021, 15:27.

      Comment


      • #4
        Dear Mr Schechter,

        Thank you for responding. I have changed the variable names of household_number to HHX_Family and HHX_child. My goal is indeed to match the child through HHX with the family. This because I have to match the birthweight of the child with the socioeconomic status of the parents. I have added both datasets below. I still could not find the right way to match the HHX of the different data sets.
        In my last post, I already put the dataset of the child in the dataset of the family by myself. But I think it is better to merge them through Stata.
        The one below is the dataset of the family as a whole, including the child. the last data set is the data set of the child alone with its birthweight.
        So my final question is; if it is possible to merge those sets and link the HHX of both sets with each other, and drop the ones that don't match.

        Thank you in advance!

        With kind regards,
        Jord

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte(FPX Sex Race Age Relation_dad Relation_mom) long HHX_Family byte(Health_status Education_level Total_earnings_year)
        1 2 2 55 . .  4 3 14  .
        2 1 2 49 . .  4 3 15  7
        3 1 2 16 1 1  4 3 10  .
        1 1 2 37 . .  6 3 21 11
        2 2 2 36 . .  6 2 18 11
        3 1 2  9 1 1  6 1  3  .
        4 1 2  7 1 1  6 1  0  .
        1 1 2 29 . .  8 2 16  8
        2 2 2 34 . .  8 1 15  8
        3 1 2  3 1 1  8 1 96  .
        1 1 2 39 . . 10 2 18 11
        2 2 2 33 . . 10 2 15  .
        3 2 2  3 1 1 10 2 96  .
        4 1 2  0 1 1 10 2 96  .
        1 2 3 54 . . 13 2 15 11
        2 1 3 10 . 1 13 1  5  .
        3 2 3 60 1 1 17 3 18 99
        1 1 2 31 . . 18 1 19 10
        2 2 2 28 . . 18 1 19  8
        3 2 2  1 1 1 18 1 96  .
        1 1 2 65 . . 19 2 21 11
        1 1 2 65 . . 24 3 14  2
        1 1 1 54 . . 25 3 10 97
        2 2 2 53 . . 25 3 10 97
        1 1 2 30 . . 27 2 16  9
        2 2 2 32 . . 27 2 16  9
        3 1 2 14 2 1 27 2  8  .
        4 1 2  9 2 1 27 2  5  .
        1 1 2 40 . . 29 1 15 10
        1 1 2 46 . . 30 1 15  7
        2 2 2 44 . . 30 2 16 99
        3 1 2 15 1 1 30 1  9  .
        1 2 2 43 . . 31 2 15  .
        2 1 2 50 . . 31 2 14  .
        1 2 1 20 . . 32 4 14  .
        2 1 1 26 . . 32 2 14  7
        1 2 2 52 . . 33 2 15  4
        2 1 2 55 . . 33 3 15  8
        1 2 3 38 . . 38 1 19  9
        2 1 3 14 . 1 38 1  8  .
        1 2 4 48 . . 40 1 19  8
        1 2 1 61 . . 41 4  6  .
        1 2 2 38 . . 45 1 15 97
        2 1 2 40 . . 45 2 15 97
        3 1 2 18 1 1 45 1 11 97
        4 2 2 13 1 1 45 1  7  .
        5 1 2  9 1 1 45 2  2  .
        1 1 2 61 . 1 46 4 14  .
        1 2 2 59 . . 47 4 14  .
        2 2 2 17 . 1 47 4 10  .
        3 1 2 17 . 1 47 4 10  .
        1 2 2 40 . . 50 2 15  .
        2 1 2 46 . . 50 2 16 11
        3 1 2 13 1 1 50 2  7  .
        4 1 2 10 1 1 50 2  4  .
        5 1 2  5 1 1 50 2  0  .
        1 1 2 33 . . 51 1 15 97
        2 2 2 38 . . 51 1 18 97
        1 1 2 43 . . 52 3 19 97
        2 2 1 42 . . 52 1 19 97
        3 2 2  8 1 1 52 1  2  .
        4 2 2  5 1 1 52 1  0  .
        1 2 2 27 . . 57 3 17  9
        2 1 2 33 . . 57 1 14 10
        3 1 2  5 . 1 57 1  0  .
        1 1 2 26 . . 58 1 14  5
        2 2 5 24 . . 58 1 11  .
        3 2 5  4 1 1 58 1 96  .
        4 2 5  1 1 1 58 1 96  .
        1 2 3 65 . . 59 3 14 99
        2 1 3 33 . 1 59 3 15 99
        1 1 2 60 . . 61 3 16 99
        2 2 2 57 . . 61 4 14  .
        1 2 3 49 . . 66 3 15  9
        1 2 2 20 . . 67 1 15  3
        1 1 2 49 . . 68 2 19  8
        1 2 4 47 . . 72 1 18  9
        2 2 4 26 . 1 72 1 17 10
        3 1 4 23 . 1 72 1 16  9
        4 1 4 53 . . 72 1 14  7
        1 2 2 63 . . 75 2 15  8
        1 2 4 28 . . 76 2 13  .
        2 1 2 28 . . 76 3 13 99
        3 1 2 11 2 1 76 1  5  .
        4 2 2  3 1 1 76 1 96  .
        5 1 2  2 1 1 76 1 96  .
        1 1 2 65 . . 77 5 15  .
        2 2 2 65 . . 77 3 14  .
        1 2 2 64 . . 82 1 14  .
        2 1 2 61 . . 82 2 14  7
        3 1 2 32 1 1 82 2 15  .
        1 2 2 27 . . 87 2 18  2
        1 2 2 25 . . 87 1 18  6
        1 2 1 57 . . 89 2  7  5
        2 1 2 45 . . 91 1 18 11
        1 2 3 52 . . 91 3 19 99
        1 2 2 49 . . 92 3 18  .
        2 1 2 56 . . 92 2 18 11
        3 1 2 14 1 1 92 1  7  .
        4 2 2 11 1 1 92 1  5  .
        end
        ----------------------------------------

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte FPX long HHX_Child int birth_weight
        3   4 3743
        3   6 4310
        3   8 3856
        4  10 3431
        2  13 3714
        3  18 2835
        3  30 3941
        2  38 3459
        5  45 3856
        2  47 2694
        5  50 2637
        3  52 3317
        3  57 3743
        4  58 3346
        3  76 2949
        3  92 3516
        2  99 4083
        4 100 3204
        3 105 3261
        5 119 4111
        4 124 3431
        5 125 2949
        3 127 3516
        4 128 3204
        3 132 2864
        3 151 4083
        3 154 1446
        5 157 3232
        5 162 3346
        4 166 3629
        4 171 2523
        3 204 2977
        3 208 3884
        3 212 2949
        3 215 3487
        4 217 3090
        2 219 3119
        4 222 3346
        5 238 2041
        3 245 3402
        3 246 2694
        4 250 3402
        6 256 3232
        3 260 2977
        2 268 2977
        4 270 3998
        4 275 3799
        3 278 2864
        4 283 2608
        2 284 2495
        4 289 3289
        4 302 3232
        2 305 3629
        2 314 3261
        6 315 3629
        2 322 3516
        4 323 3629
        5 328 2495
        4 334 3714
        3 337 3317
        7 349 3913
        4 364 3402
        3 368 3998
        4 387 3572
        2 392 3090
        3 401 3487
        5 411 3629
        4 417 3289
        3 421 4083
        3 428 1162
        7 442 4763
        3 445 3402
        4 451 3204
        3 471 3232
        3 475 4111
        4 479 3998
        2 480 3487
        3 486 2977
        2 492 2722
        2 502 2268
        5 503 3176
        3 507 1985
        4 515 4990
        3 521 1758
        4 526 3629
        4 538 3346
        3 549 3374
        3 550 3062
        3 568 3714
        8 570 3431
        4 573 3856
        5 575 1815
        4 576 2949
        3 577 4054
        3 579 4281
        3 592 3913
        3 602 3176
        3 605 2438
        2 614 2949
        3 622 4281
        end

        Comment


        • #5
          Code:
          use child_data, clear
          clonevar link = HHX_Child
          tempfile holding
          save `holding'
          
          use household_data, clear
          clonevar link = HHX_Family
          merge m:1 link using `holding', keep(match) nogenerate
          works with the example data. Now, I am a bit surprised that in your child data set there is only one child per household. That seems unusual, and is perhaps just a coincidence that holds in the example but not in the full child data set. If that is the case, the code shown will fail at the -merge- command. That would call for replacing the -merge- command with:

          Code:
          joinby link using `holding', unmatched(none)
          Note: If you are using version 16 and there really is only one child per household, this can be done a bit more cleanly using frames:

          Code:
          use household_data, clear
          frame create kids
          frame kids: use child_data
          frlink m:1 HHX_Family, frame(kids HHX_Child)
          keep if !missing(kids)


          Comment


          • #6
            Dear Mr Schechter,

            Thank you for responding. It works now! They indeed only interviewed 1 child per family for the child data. That is why there is only 1 child per family in the data of the child.
            I have another question now, regarding to the children in the family. I want to investigate if sex is also an independent variable that may influence birth weight. For this reason, I need to link the birth weight given in the child data to the right child in the family data (there may be more than 1 child in the family). For example, look at the family with hhx of 6. They have 2 children, of which one of them gives information about birthweight. Through the FPX (person number within family), we can link the right child to the child in the family. So, the 3th child here is the one which is interviewed for birth weight (FPX 3). Is there any possible way to drop the children that are not interviewed for their birth_weight? So drop the child with HHX of 6 and FPX 4?
            Thank you for responding in advance!

            With kind regards,

            Jord

            (below I added the data merged, like you did in the previous message).


            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte(FPX_FAMILY Sex Race Age Relation_dad Relation_mom) long HHX_Family byte(Health_status Education_level Total_earnings_year) long link byte FPX_CHILD long HHX_Child int birth_weight
            1 2 2 55 . .   4 3 14  .   4 3   4 3743
            2 1 2 49 . .   4 3 15  7   4 3   4 3743
            3 1 2 16 1 1   4 3 10  .   4 3   4 3743
            1 1 2 37 . .   6 3 21 11   6 3   6 4310
            2 2 2 36 . .   6 2 18 11   6 3   6 4310
            3 1 2  9 1 1   6 1  3  .   6 3   6 4310
            4 1 2  7 1 1   6 1  0  .   6 3   6 4310
            1 1 2 29 . .   8 2 16  8   8 3   8 3856
            2 2 2 34 . .   8 1 15  8   8 3   8 3856
            3 1 2  3 1 1   8 1 96  .   8 3   8 3856
            1 1 2 39 . .  10 2 18 11  10 4  10 3431
            2 2 2 33 . .  10 2 15  .  10 4  10 3431
            3 2 2  3 1 1  10 2 96  .  10 4  10 3431
            4 1 2  0 1 1  10 2 96  .  10 4  10 3431
            1 2 3 54 . .  13 2 15 11  13 2  13 3714
            2 1 3 10 . 1  13 1  5  .  13 2  13 3714
            1 1 2 31 . .  18 1 19 10  18 3  18 2835
            2 2 2 28 . .  18 1 19  8  18 3  18 2835
            3 2 2  1 1 1  18 1 96  .  18 3  18 2835
            1 1 2 46 . .  30 1 15  7  30 3  30 3941
            2 2 2 44 . .  30 2 16 99  30 3  30 3941
            3 1 2 15 1 1  30 1  9  .  30 3  30 3941
            1 2 3 38 . .  38 1 19  9  38 2  38 3459
            2 1 3 14 . 1  38 1  8  .  38 2  38 3459
            1 2 2 38 . .  45 1 15 97  45 5  45 3856
            2 1 2 40 . .  45 2 15 97  45 5  45 3856
            3 1 2 18 1 1  45 1 11 97  45 5  45 3856
            4 2 2 13 1 1  45 1  7  .  45 5  45 3856
            5 1 2  9 1 1  45 2  2  .  45 5  45 3856
            1 2 2 59 . .  47 4 14  .  47 2  47 2694
            2 2 2 17 . 1  47 4 10  .  47 2  47 2694
            3 1 2 17 . 1  47 4 10  .  47 2  47 2694
            1 2 2 40 . .  50 2 15  .  50 5  50 2637
            2 1 2 46 . .  50 2 16 11  50 5  50 2637
            3 1 2 13 1 1  50 2  7  .  50 5  50 2637
            4 1 2 10 1 1  50 2  4  .  50 5  50 2637
            5 1 2  5 1 1  50 2  0  .  50 5  50 2637
            1 1 2 43 . .  52 3 19 97  52 3  52 3317
            2 2 1 42 . .  52 1 19 97  52 3  52 3317
            3 2 2  8 1 1  52 1  2  .  52 3  52 3317
            4 2 2  5 1 1  52 1  0  .  52 3  52 3317
            1 2 2 27 . .  57 3 17  9  57 3  57 3743
            2 1 2 33 . .  57 1 14 10  57 3  57 3743
            3 1 2  5 . 1  57 1  0  .  57 3  57 3743
            1 1 2 26 . .  58 1 14  5  58 4  58 3346
            2 2 5 24 . .  58 1 11  .  58 4  58 3346
            3 2 5  4 1 1  58 1 96  .  58 4  58 3346
            4 2 5  1 1 1  58 1 96  .  58 4  58 3346
            1 2 4 28 . .  76 2 13  .  76 3  76 2949
            2 1 2 28 . .  76 3 13 99  76 3  76 2949
            3 1 2 11 2 1  76 1  5  .  76 3  76 2949
            4 2 2  3 1 1  76 1 96  .  76 3  76 2949
            5 1 2  2 1 1  76 1 96  .  76 3  76 2949
            1 2 2 49 . .  92 3 18  .  92 3  92 3516
            2 1 2 56 . .  92 2 18 11  92 3  92 3516
            3 1 2 14 1 1  92 1  7  .  92 3  92 3516
            4 2 2 11 1 1  92 1  5  .  92 3  92 3516
            1 2 2 30 . .  99 3 14  6  99 2  99 4083
            2 1 2  7 . 1  99 2  0  .  99 2  99 4083
            1 2 2 44 . . 100 2 19  1 100 4 100 3204
            2 1 4 44 . . 100 2 18 11 100 4 100 3204
            3 2 2 13 1 1 100 2  7  . 100 4 100 3204
            4 2 2 11 1 1 100 2  4  . 100 4 100 3204
            5 1 2  5 1 1 100 2  0  . 100 4 100 3204
            1 1 2 46 . . 105 1 18 11 105 3 105 3261
            2 2 2 44 . . 105 1 17  8 105 3 105 3261
            3 1 2 14 1 1 105 2  8  . 105 3 105 3261
            1 2 1 37 . . 119 4 12  . 119 5 119 4111
            2 1 1 41 . . 119 2  6  4 119 5 119 4111
            3 1 1 17 1 1 119 5 11  . 119 5 119 4111
            4 2 1 15 1 1 119 3  9  . 119 5 119 4111
            5 1 1  9 1 1 119 3  3  . 119 5 119 4111
            6 2 1  3 1 1 119 3 96  . 119 5 119 4111
            7 2 1  0 1 1 119 3 96  . 119 5 119 4111
            1 2 1 38 . . 124 3 11  . 124 4 124 3431
            2 1 1 47 . . 124 3 14  7 124 4 124 3431
            3 2 1 18 1 1 124 2 12  3 124 4 124 3431
            4 2 1 17 1 1 124 2 11  . 124 4 124 3431
            5 2 1 12 1 1 124 2  6  . 124 4 124 3431
            1 2 1 54 . . 125 2  4  3 125 5 125 2949
            2 1 1 48 . . 125 1  6  4 125 5 125 2949
            3 2 1 34 . 1 125 1 10  1 125 5 125 2949
            4 2 1 14 . 1 125 1  7  . 125 5 125 2949
            5 2 1  3 . 1 125 1 96  . 125 5 125 2949
            6 2 1  2 . 1 125 1 96  . 125 5 125 2949
            1 2 2 28 . . 127 2 16  8 127 3 127 3516
            2 1 2 31 . . 127 2 14  9 127 3 127 3516
            3 2 2  6 1 1 127 2  0  . 127 3 127 3516
            4 1 2  3 1 1 127 2 96  . 127 3 127 3516
            5 2 2  0 1 1 127 2 96  . 127 3 127 3516
            1 2 2 33 . . 128 2 14  4 128 4 128 3204
            2 1 2 28 . . 128 3 14  6 128 4 128 3204
            3 1 2 14 . 1 128 1  6  . 128 4 128 3204
            4 2 2 12 . 1 128 1  6  . 128 4 128 3204
            1 2 2 50 . . 132 4 18  5 132 3 132 2864
            2 1 2 21 . 1 132 3 15  1 132 3 132 2864
            3 1 2 12 . 1 132 3  7  . 132 3 132 2864
            1 1 1 40 . . 151 3 13  5 151 3 151 4083
            2 2 1 37 . . 151 3 13  2 151 3 151 4083
            3 1 1 16 1 1 151 1  8  . 151 3 151 4083
            end

            Comment


            • #7
              [code]
              use child_data, clear
              clonevar link = HHX_Child
              tempfile holding
              save `holding'

              use household_data, clear
              clonevar link = HHX_household
              merge m:1 link FPX using `holding', keep(match) nogenerate
              [code]
              It's a very minor modification of the earlier code. The change is shown in red.

              Comment


              • #8
                Thank you so much! This is what I've been looking for. I have only 1 small (or big) question left. Do you know if it is possible to also keep the education level of the parents within the data? I have to link the parent's education level, and some other variables like income, with the birth weight of their child. The data dropped the education level of the parents. So for now, I've the data that matches the right child of the family data with the right child of the child data. So i now can already see what their relationship is to their parents (biological, adopted, etc). But my final step will be to also check the level of their parents education. Is this possible in Stata?
                Thank you for responding in advance!

                with kind regards,
                Jord

                Comment


                • #9
                  Yes, it can be done. But before I try to code that, I need you to clarify how to identify the parents in the household data set. From my inspection of your example data in #4, it appears that the parents are the people for whom the variables Relation_dad and Relation_mom both have missing values. But I'm not fully confident I actually understand those two variables. For the most part they are coded as 1/missing value. But then in household HHX_Family 27, I see that Relation_dad is coded as 2. And in HHX_Familyh 76, some of the Relation_dad's are 1, but one of them is 2. What does this all mean? Maybe it's irrelevant to the current question, but I don't want to use these variables until I understand exactly how they work.

                  Also, in many of the households there appear to be two parents. I'm guessing you want separate variables for mom and da for the education and earnings. Fair enough. But how do I tell which is which? There's a sex variable coded 1/2, but there is nothing indicating which is male and which is female.

                  Comment


                  • #10
                    Thank you for answering. Sorry for the confusion. For the sex variable, 1 indicates males and 2 indicates females. Also, the variables Relation_dad and Relation_mom only applies to the children who are living with those parents. So, for example, if it states 1, his or her mother/dad is biological or adoptive, 2 means step, 3 in-law and 9 means other/unknown. It is true that the values misses for parents because it applies only to persons living with mother/dad in household.
                    My main goal is to link the children which are interviewed as in the second datapart to their parents in the familyHHX. So, to drop the children who are not interviewed about birth weight. I have to check the average birthweight for children with parents with highest education, middle education and lowest education. For education there are also different kind of numbers, which ranges from 00 to 21 ( from never attended school to Doctoral degree) and 96 to 99 (not any information). Further I have to come up with an econometric model with birth weight as dependent variable, and the other variables as independent. But for doing that, I have to link the children to the parents in the right way as you stated. Thank you for helping me already.

                    With kind regards,
                    Jord

                    Comment


                    • #11
                      OK, so this should do it:

                      Code:
                      use Child_data, clear
                      clonevar link = HHX_Child
                      tempfile holding
                      save `holding'
                      
                      use Household_data, clear
                      clonevar link = HHX_Family
                      merge m:1 link using `holding', keep(match) nogenerate keepusing(birth_weight HHX_Child)
                      drop if missing(birth_weight)
                      
                      //  NOW REDUCE DATA TO ONE OBS PER CHILD, BRINGING INTO THAT OBS THE PARENTS'
                      //  EDUCATION AND INCOME
                      foreach v of varlist Education_level Total_earnings_year {
                          by HHX_Family, sort: egen mom_`v' = max(cond(Sex == 2 & missing(Relation_dad) & missing(Relation_mom), ///
                              `v', .))
                          by HHX_Family: egen dad_`v' = max(cond(Sex == 1 & missing(Relation_dad) & missing(Relation_mom), ///
                              `v', .))
                      }
                      keep if !missing(Relation_dad) & !missing(Relation_mom)
                      Note: This code will only function properly in household structures that contain at most one mother and at most one father. If the data can contain households with multiple fathers or multiple mothers, then you must clarify how you would want to handle that situation.
                      Last edited by Clyde Schechter; 13 Apr 2021, 11:52.

                      Comment


                      • #12
                        Thank you for replying. Stata keeps saying "invalid file specification" when I am using
                        "merge m:1 link using `holding', keep(match) nogenerate keepusing(birth_weight HHX_Child)" Do you know what might cause this? thank you in advance with kind regards, Jord
                        Last edited by Jord Sluis; 13 Apr 2021, 13:01.

                        Comment


                        • #13
                          Sorry, I did not run my dofile at once. My bad. The Sex that is given now, is it of the child or mother/father? and for education level, is it the education level of mother or father? or is it the highest one in the family ?
                          Thank you in advance.

                          Jord

                          Comment


                          • #14
                            The dataset now looks like as given below. But the birthweight of children that do not have a HHX_Child are also given the same birth weight as the child with HHX_Child in the family. I can now not calculate the average birth weight of children with parents with highest level of education and parents with lower level of education, since for families with more children, have a higher total birthweight than families with less children. So my question now is how I can show just the child with the level of birth weight, instead of that child with his or her siblings.
                            thank you in advance.

                            with kind regards,
                            Jord

                            Comment


                            • #15
                              Sorry. I got the linkage to the children wrong. This fixes it:

                              Code:
                              use child_data, clear
                              clonevar link = HHX_Child
                              tempfile holding
                              save `holding'
                              
                              use household_data, clear
                              clonevar link = HHX_Family
                              merge m:1 link FPX using `holding', keep(match) nogenerate keepusing(birth_weight HHX_Child)
                              drop if missing(birth_weight)
                              
                              //  NOW REDUCE DATA TO ONE OBS PER CHILD, BRINGING INTO THAT OBS THE PARENTS'
                              //  EDUCATION AND INCOME
                              foreach v of varlist Education_level Total_earnings_year {
                                  by HHX_Family, sort: egen mom_`v' = max(cond(Sex == 2 & missing(Relation_dad) & missing(Relation_mom), ///
                                      `v', .))
                                  by HHX_Family: egen dad_`v' = max(cond(Sex == 1 & missing(Relation_dad) & missing(Relation_mom), ///
                                      `v', .))
                              }
                              keep if !missing(Relation_dad) & !missing(Relation_mom)
                              Now the results contain only the child who was interviewed for birthweight. The variables Education_level and Total_earnings year refer to the education and income of the child. The parental education and income variables are at the end of the data set, with prefixes mom_ and dad_.

                              Comment

                              Working...
                              X