Dear Statalisters,
I’m seeking help to resolve an annoying problem – it might be nothing major for some of you, but trying to solve this issue, given the dyadic nature of the data, is spinning my head. I have two unbalanced panel datasets:
Dataset 1 – dyadic relationships recorded between two partners for multiple years.
Dataset 2 – relationships of partners in dataset 1 with others (excluding their dyadic relationships) over multiple years.
Both entities in dataset 1 (id and id_partner) may have multiple other relationships with other entities in dataset 2. I’m interested in identifying the following:
Step 1. Check whether the focal entity in dataset 1 (id) has any other relationships with other entities in dataset 2 (id1-id2) for a given year.
*Example generated by -dataex-. (dataset 1)
clear
input float year str10(id id_partner)
2016 "194535257" "10001"
2017 "194535257" "10001"
2012 "10001" "3236"
2013 "10001" "3236"
2014 "10001" "3236"
2015 "10001" "3236"
2016 "10001" "3236"
2017 "10001" "3236"
2018 "10001" "3236"
2019 "10001" "3236"
2020 "10001" "3236"
2014 "10001" "4598"
2015 "10001" "4598"
2016 "10001" "4598"
2017 "10001" "4598"
end
*Example generated by -dataex- (dataset 2)
clear
input float year str10(id1 id2)
2010 "1871295" "1000"
2011 "1871295" "1000"
2012 "1871295" "1000"
2013 "1871295" "1000"
2019 "194528198" "100000012"
2020 "194528198" "100000012"
2021 "194528198" "100000012"
2022 "194528198" "100000012"
2012 "10001" "11122"
2013 "10001" "11122"
2014 "10001" "11122"
2015 "10001" "11122"
2016 "10001" "11122"
2006 "10001" "1508418"
2007 "10001" "1508418"
end
Thanks for your help!
I’m seeking help to resolve an annoying problem – it might be nothing major for some of you, but trying to solve this issue, given the dyadic nature of the data, is spinning my head. I have two unbalanced panel datasets:
Dataset 1 – dyadic relationships recorded between two partners for multiple years.
Dataset 2 – relationships of partners in dataset 1 with others (excluding their dyadic relationships) over multiple years.
Both entities in dataset 1 (id and id_partner) may have multiple other relationships with other entities in dataset 2. I’m interested in identifying the following:
Step 1. Check whether the focal entity in dataset 1 (id) has any other relationships with other entities in dataset 2 (id1-id2) for a given year.
- The focal entity (id) can appear multiple times in both datasets as it can have dyadic relations with multiple other entities, e.g. A related with B, A related with E, and so on. However, this is also the case with the partner entity (id_partner), which can also be related to multiple focal entities. In short, I’m dealing with a m:m case.
- Similarly, Dataset 2 can have multiple records of focal and partner entities over multiple years.
- As my main focus is the focal entity, I can remove duplicate id-year records from dataset 1 and use merge 1:m id year using dataset 2 to find matches for the focal entity-year record, which is fairly straightforward. Although, focal entity (id) from dataset 1 can appear in either id1 or id2 in dataset 2. But that’s not a challenge as I can use merge twice – once looking for id-year in id1-year and then looking for id-year in id2-year.
- This is where it gets tricky because the partner (id_partner) in dataset 1 is also related with multiple focal entities (id) – think of the worst case of infidelity where everyone in the society is having an affair with (almost) everyone else!
- Similar to in step 1, I can use merge to find matches of id_partner from dataset 1 in dataset 2. However, now I’m again back to having two separate datasets – one with focal entity (id in dataset 1) and its matched relationships from dataset 2, and another with focal entity’s partner entity (id_partner in dataset 1) and its matched relationships from dataset 2.
- I tried the vlookup function but my understanding is that it is limited to finding matches in the same file and cannot be used to look up matches in another dataset.
- Eventually, I need to get the data in a format where I have focal entity (from dataset 1) + its identified matches (from dataset 2) + a tag for whether the particular match is also related to the focal entity’s partner entity (id_partner in dataset 1).
- Adam is the focal entity (dataset 1 - id).
- Adam is dating Bella in year 1 and 2, Chelsea in year 2, and Dakota in year 1 and 3 (dataset 1 - id_partner).
- Adam attends college with Brad in year 1 and 3, and with Chandler in year 2. Chandler is also related to Bella in year 2.
Dataset 1 | |||
year | id | id_partner | |
1 | Adam | Bella | |
1 | Adam | Dakota | |
2 | Adam | Bella | |
2 | Chelsea | Adam | |
3 | Adam | Dakota | |
Dataset 2 | |||
year | id1 | id2 | id_partner |
1 | Adam | Brad | No |
2 | Chandler | Adam | Yes |
3 | Adam | Brad | No |
*Example generated by -dataex-. (dataset 1)
clear
input float year str10(id id_partner)
2016 "194535257" "10001"
2017 "194535257" "10001"
2012 "10001" "3236"
2013 "10001" "3236"
2014 "10001" "3236"
2015 "10001" "3236"
2016 "10001" "3236"
2017 "10001" "3236"
2018 "10001" "3236"
2019 "10001" "3236"
2020 "10001" "3236"
2014 "10001" "4598"
2015 "10001" "4598"
2016 "10001" "4598"
2017 "10001" "4598"
end
*Example generated by -dataex- (dataset 2)
clear
input float year str10(id1 id2)
2010 "1871295" "1000"
2011 "1871295" "1000"
2012 "1871295" "1000"
2013 "1871295" "1000"
2019 "194528198" "100000012"
2020 "194528198" "100000012"
2021 "194528198" "100000012"
2022 "194528198" "100000012"
2012 "10001" "11122"
2013 "10001" "11122"
2014 "10001" "11122"
2015 "10001" "11122"
2016 "10001" "11122"
2006 "10001" "1508418"
2007 "10001" "1508418"
end
Thanks for your help!
Comment