m:m merge problem

Zainab Iftikhar

Join Date: Apr 2021
Posts: 13

20 Jul 2023, 07:52

Hi All!
I am working with Panel Study of Income Dynamics (PSID). I need to get details about the marriage history of the parents. The final goal is to understand if the marriage behavior of the parent affects child's cognitive skills measure by some test scores. I have two dta files, one has information on child id (pid_ch), her parent's id (pid_m (mom) and pid_d (dad)) in the following format (there are many missing values, especially for single mos)

pid_ch	pid_m	pid_d
1001	2002	4002
1002	2002	.
1003	4907	3500
1004	4909	5473
1005	5120	5473

The other file has the marriage history of mother and father

sex	marriage year	divorce year	separation year	no of marriages	pid_m	pid_d
2	1935	1935	.	2	2002	4002
1	1936	1970	.	2	2002	.
2	1942	1950	1948	3	4907	3500
2	1984	.	.	1	4909	5473
1	2008	.	2021	1	5120	5473

Now I want to merge the two files, so I can link the child to the parent's marriage history. Once I have done that, I will use the test score data to see if there is any correlation between parent's marriage history (divorce or more than one marriage) on the child's outcome. The problem is that it seems that I need to do a m:m merge and that is strongly discouraged. I will be very grateful if someone could please help me with merging the data such that I can link children to their parent's marriage history.
Many thanks in advance.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#2

20 Jul 2023, 09:09

I can guarantee you that, whatever the solution turns out to be, it won't be m:m.

Now, in the examples you show, there is no issue at all. -merge 1:1 pid_m pid_d- gets you the matching you need.

The simplest case is that the child has exactly one set of parents at birth, and the parents have a single marriage record. If every real-life situation were like that, -merge 1:1- would do the job. It gets complicated because the parents at birth may divorce, or die, and new parents may enter the picture. It is even possible that a pair of parents may divorce, and then later remarry each other. I don't know what would be the best way to handle these situations for your purposes. Is it the parents at birth who matter for the purpose of understanding these cognitive skills outcomes? Or is it the parents at the time the outcomes are measured? Or are both important? Or perhaps all sets of parents from birth up to and including the time the outcomes are measured? Also if separation and divorce occur in separate years, how is this to be handled?

My point is that you are not facing a technical data management problem here. You are facing a substantive problem in study design because life is complicated. What you first need to do is figure out which set(s) of parents you need to pair up with the child for the purposes of understanding effects on these cognitive skills outcomes. Once you develop a scheme for that, the technical aspects of achieving that pairing in the data will definitely not involve -merge m:m-. They will involve one or more of -merge 1:1-, -merge 1:m-, -joinby-, or -rangejoin-. (-rangejoin- by Robert Picard is available from SSC. Its use requires -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.)
1 like
Comment

Zainab Iftikhar

Join Date: Apr 2021
Posts: 13

20 Jul 2023, 10:26

Dear Clyde!
Thank you so much for your detailed reply. I indeed started with the 1:1 merge and received the following error message

variables pid_m pid_d do not uniquely identify observations in the master data

Perhaps, the imaginary data I shared makes it look like 1:1 merge can solve the problem. Here are the first 15 observations from the real dataset sorted by pid_m
fist dataset with child id

pid_ch	pid_m	pid_d
2254170	0	0
9308170	0	0
5402004	0	0
9308001	0	0
6216177	0	0
2301002	0	0
1277001	0	0
6495909	0	6495175
830174	0	0
5865001	0	0
2032181	0	0
6262170	0	0
7015001	0	0
5907171	0	0
9303900	0	9303001

The second data set on marriage history

sex	maryear	Divyear	sepyear	#Mar	pid_m	pid_d
2	1936	1970	9999	3	2001	2002
2	1978	1981	1980	3	2170	2002
2	1935	1935	9999	3	2901	2002
2	1942	9999	9999	1	4001	4002
2	1972	1998	1997	1	4172	4004
2	1970	1980	1980	2	4174	4005
2	1986	9999	9999	2	4186	4005
2	1972	1992	1982	1	4170	4006
2	1980	1983	1983	2	4905	4007
2	1984	1996	1995	2	4175	4007
2	1990	9999	2014	1	4195	4031
2	1990	1994	1994	2	4181	4032
2	2000	9999	9999	2	4211	4032
2	2000	2005	2005	3	4193	4036
2	2006	9999	9999	3	4197	4036

I am only considering the birth parents who's id is pid_m and pid_d and it never changes for a child. The missing values just mean that the parent dropped out of the survey and was not followed any longer. Therefore, I also tried merging first using only pid_m and then pid_d, (trying to link the child to at least on birth parent) but it did not work either. 1:1 or even a 1:m or m:1 merge does not work as pid_m uniquely defines observations neither in master nor in using data. I have not tried joinby or rangejoin yet.

Last edited by Zainab Iftikhar; 20 Jul 2023, 11:23.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#4

20 Jul 2023, 11:11

It looks like what you need is

Code:

use child_data, clear joinby pid_m pid_d using marriage_history_data

This will match each child with any marriage record involving both of his/her parents. You might, after, that, want to create a separate data set that keeps those children where no match was found and then try to find matches to just one parent using -joinby pid_m- and then using -joinby pid_d-. But I don't know if it's sensible to do that, as it may turn up a marriage of, say, the mother noted in the child's record to some person other than the father noted in the child's record. That, of course, is quite possible in real life, although one might then wonder how to decide which "father" to use for purposes of analysis. (The same could happen with the sexes reversed, of course.)

Note: -joinby- does what people usually want when tempted to use the evil -merge m:m-.

Last edited by Clyde Schechter; 20 Jul 2023, 11:15.
Comment
Zainab Iftikhar

Join Date: Apr 2021

Posts: 13
#5

20 Jul 2023, 11:21

Dear Clyde!
Aren't you wonderful! This works just as desired.
Thanks for adding humor to the dry topics, the "evil merge" made the thread much more interesting.
stay blessed.
Comment

Announcement

m:m merge problem

Comment

Comment

Comment

Comment