Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • m:m merge problem

    Hi All!
    I am working with Panel Study of Income Dynamics (PSID). I need to get details about the marriage history of the parents. The final goal is to understand if the marriage behavior of the parent affects child's cognitive skills measure by some test scores. I have two dta files, one has information on child id (pid_ch), her parent's id (pid_m (mom) and pid_d (dad)) in the following format (there are many missing values, especially for single mos)
    pid_ch pid_m pid_d
    1001 2002 4002
    1002 2002 .
    1003 4907 3500
    1004 4909 5473
    1005 5120 5473
    The other file has the marriage history of mother and father
    sex marriage year divorce year separation year no of marriages pid_m pid_d
    2 1935 1935 . 2 2002 4002
    1 1936 1970 . 2 2002 .
    2 1942 1950 1948 3 4907 3500
    2 1984 . . 1 4909 5473
    1 2008 . 2021 1 5120 5473
    Now I want to merge the two files, so I can link the child to the parent's marriage history. Once I have done that, I will use the test score data to see if there is any correlation between parent's marriage history (divorce or more than one marriage) on the child's outcome. The problem is that it seems that I need to do a m:m merge and that is strongly discouraged. I will be very grateful if someone could please help me with merging the data such that I can link children to their parent's marriage history.
    Many thanks in advance.

  • #2
    I can guarantee you that, whatever the solution turns out to be, it won't be m:m.

    Now, in the examples you show, there is no issue at all. -merge 1:1 pid_m pid_d- gets you the matching you need.

    The simplest case is that the child has exactly one set of parents at birth, and the parents have a single marriage record. If every real-life situation were like that, -merge 1:1- would do the job. It gets complicated because the parents at birth may divorce, or die, and new parents may enter the picture. It is even possible that a pair of parents may divorce, and then later remarry each other. I don't know what would be the best way to handle these situations for your purposes. Is it the parents at birth who matter for the purpose of understanding these cognitive skills outcomes? Or is it the parents at the time the outcomes are measured? Or are both important? Or perhaps all sets of parents from birth up to and including the time the outcomes are measured? Also if separation and divorce occur in separate years, how is this to be handled?

    My point is that you are not facing a technical data management problem here. You are facing a substantive problem in study design because life is complicated. What you first need to do is figure out which set(s) of parents you need to pair up with the child for the purposes of understanding effects on these cognitive skills outcomes. Once you develop a scheme for that, the technical aspects of achieving that pairing in the data will definitely not involve -merge m:m-. They will involve one or more of -merge 1:1-, -merge 1:m-, -joinby-, or -rangejoin-. (-rangejoin- by Robert Picard is available from SSC. Its use requires -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.)

    Comment


    • #3
      Dear Clyde!
      Thank you so much for your detailed reply. I indeed started with the 1:1 merge and received the following error message

      variables pid_m pid_d do not uniquely identify observations in the master data

      Perhaps, the imaginary data I shared makes it look like 1:1 merge can solve the problem. Here are the first 15 observations from the real dataset sorted by pid_m
      fist dataset with child id
      pid_ch pid_m pid_d
      2254170 0 0
      9308170 0 0
      5402004 0 0
      9308001 0 0
      6216177 0 0
      2301002 0 0
      1277001 0 0
      6495909 0 6495175
      830174 0 0
      5865001 0 0
      2032181 0 0
      6262170 0 0
      7015001 0 0
      5907171 0 0
      9303900 0 9303001
      The second data set on marriage history
      sex maryear Divyear sepyear #Mar pid_m pid_d
      2 1936 1970 9999 3 2001 2002
      2 1978 1981 1980 3 2170 2002
      2 1935 1935 9999 3 2901 2002
      2 1942 9999 9999 1 4001 4002
      2 1972 1998 1997 1 4172 4004
      2 1970 1980 1980 2 4174 4005
      2 1986 9999 9999 2 4186 4005
      2 1972 1992 1982 1 4170 4006
      2 1980 1983 1983 2 4905 4007
      2 1984 1996 1995 2 4175 4007
      2 1990 9999 2014 1 4195 4031
      2 1990 1994 1994 2 4181 4032
      2 2000 9999 9999 2 4211 4032
      2 2000 2005 2005 3 4193 4036
      2 2006 9999 9999 3 4197 4036
      I am only considering the birth parents who's id is pid_m and pid_d and it never changes for a child. The missing values just mean that the parent dropped out of the survey and was not followed any longer. Therefore, I also tried merging first using only pid_m and then pid_d, (trying to link the child to at least on birth parent) but it did not work either. 1:1 or even a 1:m or m:1 merge does not work as pid_m uniquely defines observations neither in master nor in using data. I have not tried joinby or rangejoin yet.
      Last edited by Zainab Iftikhar; 20 Jul 2023, 11:23.

      Comment


      • #4
        It looks like what you need is
        Code:
        use child_data, clear
        joinby pid_m pid_d using marriage_history_data
        This will match each child with any marriage record involving both of his/her parents. You might, after, that, want to create a separate data set that keeps those children where no match was found and then try to find matches to just one parent using -joinby pid_m- and then using -joinby pid_d-. But I don't know if it's sensible to do that, as it may turn up a marriage of, say, the mother noted in the child's record to some person other than the father noted in the child's record. That, of course, is quite possible in real life, although one might then wonder how to decide which "father" to use for purposes of analysis. (The same could happen with the sexes reversed, of course.)

        Note: -joinby- does what people usually want when tempted to use the evil -merge m:m-.
        Last edited by Clyde Schechter; 20 Jul 2023, 11:15.

        Comment


        • #5
          Dear Clyde!
          Aren't you wonderful! This works just as desired.
          Thanks for adding humor to the dry topics, the "evil merge" made the thread much more interesting.
          stay blessed.

          Comment

          Working...
          X