Generating new variables for partner?

Joel Persson

Join Date: Oct 2023

Posts: 4
#1

Generating new variables for partner?

26 Oct 2023, 03:22

Dear stata community,

I am fairly new to STATA and am trying to wrap my head around creating new variables from the content of the partner observation.
I have a wide dataset where each observation is a single individual. let's say it looks like the following.

ID PartnerID Var1 Var2
1 5 “hi” “ho”
2 3 “cat” “flower”
3 2 “bird” “stone”
4 . “Frog” “cycle”
5 1 “Jupiter” “lollipop”

Now I am attempting to generate each partner's variables as variable's upon the index-person's observation. Like this:

ID PartnerID Var1 PartnerVar1 Var2 PartnerVar2
1 5 “hi” “Jupiter” “ho” “lollipop“
2 3 “cat” “bird” “flower” “stone”
3 2 “bird” “cat” “stone” “flower”
4 . “Frog” . “cycle” ""
5 1 “Jupiter” “hi” “lollipop” “ho”

The following syntax worked fine initially:
gen PartnerVar1 = Var1[PartnerID]

Yet, it is dependant on the ID-variable being a steady sequence without interruptions. If it breaks (e.g. 123 5678 10 15) there will be a mismatch.

Do any of you have suggestions on how to match, not by row number but by the content of PartnerID? Preferably without using a Foreach/Forval loop as there are aprox. 70 variables and 1,000,000 observations.

Kind regards,
Joel
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35762

26 Oct 2023, 08:49

I would merge the dataset with a copy of itself, after suitable renaming.

Code:

clear 
input ID PartnerID str8 (Var1 Var2)
1 5 "hi" "ho"
2 3 "cat" "flower"
3 2 "bird" "stone"
4 . "Frog" "cycle"
5 1 "Jupiter" "lollipop" 
end 

* you start here 
save thisdata   

drop ID 
rename (Var?) (P=)
rename PartnerID ID 

merge 1:1 ID using thisdata

list

Comment

Joel Persson

Join Date: Oct 2023

Posts: 4
#3

26 Oct 2023, 15:29

Thank you Nick for taking the time to answer, I do appreciate it! I have also tried this solution, yet there is an issue of divorce. To explain this, I should propapbly also introduce ObsYear as the year of said observation. To return to the former example, it would look like this:

ID PartnerID ObsYear Var1 Var2
1 5 2010 "hi" "ho"
2 3 2014 "cat" "flower"
3 2 2012 "bird" stone"
4 . 2009 "frog" "cycle"
5 1 2013 "jupiter" "lollipop"

Now let's say that ID[1] gets married to ID[5] in 2010. The following year ID[1] is divorced and is remarried to ID[2] in 2013. This would cause both ID[5] and ID[2] to have PartnerID ==1 at the point of observation (ObsYear). As is illustrated below:

ID PartnerID ObsYear Var1 Var2
1 5 2010 "hi" "ho"
2 1 2014 "cat" "flower"
3 2 2012 "bird" stone"
4 . 2009 "frog" "cycle"
5 1 2013 "jupiter" "lollipop"

Were I to merge following PartnerID (renamed as ID), there will be a duplicate of 1:s which interferes with the mergeing of a wide dataset.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35762
#4

27 Oct 2023, 02:44

I guess you need to try a merge using year as well as identifier.
Comment
Joel Persson

Join Date: Oct 2023

Posts: 4
#5

28 Oct 2023, 01:52

This is an excellent suggestion, thank you Nick! A problem is however, that the years do not necessarily match. The year variable doesn't show the point of partnership initiation, only of observation. For example ID[1] and ID[5] are partners in the example dataset above. Yet for ID[1], year=2010 and for ID[5], year=2013 as they were registered/observed at different points in time. Thus they will not align in the merge.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35762
#6

28 Oct 2023, 02:31

That sounds like a limitation of your dataset for what you want to do.
Comment
Joel Persson

Join Date: Oct 2023

Posts: 4
#7

28 Oct 2023, 02:37

Could I perhaps create some kind of familyID for each partnership and then merge from that?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35762
#8

28 Oct 2023, 03:00

I don't know. I'd suggest creating a better real or realistic data example using dataex -- as explained in https://www.statalist.org/forums/help#stata and exemplified in #2 -- as diminishing returns set in quickly when a discussion depends on reading word descriptions that get longer in the pursuit of understanding set-ups that turn out to be more complicated than first stated.

Alternatively, if your initial statement that each individual occurs just once in the dataset is still true, then you have no information on changing properties and it's a 1:m or m:1 merge, depending which way you do it.
Comment

Announcement

Generating new variables for partner?

Comment

Comment

Comment

Comment

Comment

Comment

Comment