Alternative to merge m:m

Paula de Souza Leao Spinola

Join Date: Jun 2015

Posts: 384
#1

Alternative to merge m:m

24 Aug 2022, 13:24

I have data two datasets that have a common variable. I would like to merge the two datasets using this common variable and only keep observations included in the first dataset.

The issue is that in none of the datasets the common variable identifies the rows. As I know that merge m:m is not recommended, I usually reshape one of the datasets in a way that I can use a m:1 or 1:m merge. I show this below.

My issue now is that my data is massive which makes the code below unfeasible. Are there any alternatives to this that would be feasible with large datasets?

Code:

clear input id hospid hospid_peer 1 1000 101 1 1000 102 1 1001 101 2 1002 102 end tempfile data1 save `data1', replace clear input hospid_peer peerid 101 3 101 4 102 3 1000 1 1001 1 1002 2 2000 7 2003 8 end tempfile data2 save `data2', replace use `data1', clear preserve use `data2', clear bys hospid_peer: gen n = _n reshape wide peerid, i(hospid_peer) j(n) tempfile data1_temp save `data1_temp' restore merge m:1 hospid_peer using `data1_temp', keep(master match) nogenerate
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

24 Aug 2022, 14:01

https://journals.sagepub.com/doi/ful...6867X211063416
3 likes
Comment

Øyvind Snilsberg

Join Date: Oct 2021
Posts: 591

25 Aug 2022, 03:11

maybe -joinby-

Code:

. *reshape
. use data2, clear

. bys hospid_peer: gen n = _n

. qui reshape wide peerid, i(hospid_peer) j(n)

. tempfile data1_temp

. save `data1_temp'
file C:\Users\oyvinsni\AppData\Local\Temp\ST_3200_000001.tmp saved as .dta format

. use data1, clear

. merge m:1 hospid_peer using `data1_temp', keep(master match) nogenerate

    Result                      Number of obs
    -----------------------------------------
    Not matched                             0
    Matched                                 4  
    -----------------------------------------

. qui reshape long peerid, i(id hospid hospid_peer)

. drop if peerid == .
(2 observations deleted)

. sort id hospid hospid_peer peerid

. list id hospid hospid_peer peerid, sep(10) abb(12) clean

       id   hospid   hospid_peer   peerid  
  1.    1     1000           101        3  
  2.    1     1000           101        4  
  3.    1     1000           102        3  
  4.    1     1001           101        3  
  5.    1     1001           101        4  
  6.    2     1002           102        3  

. *joinby
. use data1, clear

. joinby hospid_peer using data2

. sort id hospid hospid_peer peerid

. list id hospid hospid_peer peerid, sep(10) abb(12) clean

       id   hospid   hospid_peer   peerid  
  1.    1     1000           101        3  
  2.    1     1000           101        4  
  3.    1     1000           102        3  
  4.    1     1001           101        3  
  5.    1     1001           101        4  
  6.    2     1002           102        3

Last edited by Øyvind Snilsberg; 25 Aug 2022, 03:13.

Comment

Paula de Souza Leao Spinola

Join Date: Jun 2015

Posts: 384
#4

25 Aug 2022, 03:47

This is awesome, thank you both!
Comment

Announcement

Alternative to merge m:m

Comment

Comment

Comment