Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching parents to children

    Hi everyone! I am working with a panel data in which I have both parents and their (adult) children in the Id column. What I would like to do is to match those children with their parents, by using the father and mother identification numbers (variables father and mother). Basically, in my example, I would like to have a line like:

    Id (chidlren) year sex birth partner FatherId Fatheryear Fathergender Fbirth Fpartner MotherId .... Mpartner

    Where Id (children) are all the Ids for which I have information about both their parents (e.g. father != -5 & mother !=-5).

    I think that maybe a loop might work, but I don't know how to approach it.

    One major problem is that I don't have the same information for all my IDs: for example ID=1 has 7 years of information, ID=2 5 years and ID=3 6 years.

    Is it possible doing something in this case? And if not, if I had the same number of observations for each Id would it be possible?


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(id year) byte gender int birth long(partner father mother)
    1   1994 1 1950       2  -5  -5
    1   1995 1 1950       2  -5  -5
    1   1996 1 1950       2  -5  -5
    1   1997 1 1950       2  -5  -5
    1   1998 1 1950       2  -5  -5
    1   1999 1 1950       2  -5  -5
    1   2000 1 1950       2  -5  -5
    2   1994 2 1951       1  -5  -5
    2   1995 2 1951       1  -5  -5
    2   1996 2 1951       1  -5  -5
    2   1997 2 1951       1  -5  -5
    2   1998 2 1951       1  -5  -5
    3   2001 1 1983      -5   1   2
    3   2002 1 1983      -5   1   2
    3   2003 1 1983      -5   1   2
    3   2004 1 1983      -5   1   2
    3   2005 1 1983      -5   1   2
    3   2006 1 1983      -5   1   2
    end
    Thank you very much







  • #2
    There's something I don't understand about your request. Let's look at id 3, whose father is 1. We have data on id 3 for years 2001 through 2006. But the data on id 1 is from year 1994-2000. How do you want to link these up. Is the 1994 observation of id1 supposed to match with the 2001 observation on id3, and the 1995 observation of id1 with the 2002 observation of id3, etc. with always a 6 year difference? Is that true for all parent-child pairs: the difference in matching years is always 6 years? If so, how do we handle id 3's mother in year 2006, since id 2's data end at 1998. If not, how do you want to handle this. I assume it is important to match one-year of one person with one-year of another because partners may change over time, so we need to know which year to match with which year.

    Also, are you using the current version of Stata (16.1)? If not, state which version you are using. There will be a simpler solution available for version 16 than for earlier versions.

    Comment


    • #3

      Hi Clyde! Thank you for your answer.

      How do you want to link these up. Is the 1994 observation of id1 supposed to match with the 2001 observation on id3, and the 1995 observation of id1 with the 2002 observation of id3, etc. with always a 6 year difference? Is that true for all parent-child pairs: the difference in matching years is always 6 years?
      No, unfortunately the difference in matching years is not always a 6 year difference. It may vary with the individuals. BUT: the other variables that I have are all constant over time, except for income (there is a variable income also. It's not in the dataset that I showed because I wanted to understand the logic behind a merging parents to children. I did not think it was important. Sorry). I think that, for the moment, I would like to apply as general rule that the first year in which you observe a Father matches with the first year in which you observe an (adult) child. Then if the number of years in which you observe a child is smaller than the number of years in which you observe a father (the reverse is true. You stop matching when you have the last observation of the individual with less observations), you stop to match. Same for mothers.

      People don't change partner in my panel (I removed the few ones who change because they would make my analysis difficult and there were just few cases. I'm considering only married couples with children).

      My version of Stata is Stata 15.0.

      I hope I was clearer than before.



      Comment


      • #4
        Very clear, thank you.

        Code:
        by id (year), sort: gen seq_year = _n
        mvdecode partner father mother, mv(-5)
        preserve
        keep id year seq_year gender birth partner
        rename id link
        rename (gender birth partner) =1
        isid link seq_year, sort
        tempfile others
        save `others'
        restore
        
        
        //  GET FATHER INFO
        clonevar link = father
        merge m:1 link seq_year using `others', keep(master match) nogenerate
        rename *1 F*
        
        //  GET MOTHER INFO
        replace link = mother
        merge m:1 link seq_year using `others', keep(master match) nogenerate
        rename *1 M*
        should do it.

        General comment: magic number codes for missing values, like -5, are a really bad idea in Stata. Sooner or later they will get you in deep trouble once you start doing any calculations. They also make many aspects of data management more difficult. So best to replace those with actual Stata missing values. That's what the -mvdecode- command does. I strongly suggest you actually do this with all the numeric variables in the entire data set as your very first step before the rest of your data management.

        Comment


        • #5
          Thank you so much. I have understood the mechanism. I'll try with my whole dataset now.

          Comment

          Working...
          X