Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alternative to merge m:m

    I have data two datasets that have a common variable. I would like to merge the two datasets using this common variable and only keep observations included in the first dataset.

    The issue is that in none of the datasets the common variable identifies the rows. As I know that merge m:m is not recommended, I usually reshape one of the datasets in a way that I can use a m:1 or 1:m merge. I show this below.

    My issue now is that my data is massive which makes the code below unfeasible. Are there any alternatives to this that would be feasible with large datasets?

    Code:
    clear
    input id hospid hospid_peer
    1 1000 101
    1 1000 102
    1 1001 101
    2 1002 102
    end
    tempfile data1
    save `data1', replace
    
    clear
    input hospid_peer peerid
    101 3
    101 4
    102 3
    1000 1
    1001 1
    1002 2
    2000 7
    2003 8
    end
    tempfile data2
    save `data2', replace
    
    use `data1', clear
    preserve
        use `data2', clear
        bys hospid_peer: gen n = _n
        reshape wide peerid, i(hospid_peer) j(n)
        tempfile data1_temp
        save `data1_temp'
    restore
    merge m:1 hospid_peer using `data1_temp', keep(master match) nogenerate

  • #2
    https://journals.sagepub.com/doi/ful...6867X211063416

    Comment


    • #3
      maybe -joinby-
      Code:
      . *reshape
      . use data2, clear
      
      . bys hospid_peer: gen n = _n
      
      . qui reshape wide peerid, i(hospid_peer) j(n)
      
      . tempfile data1_temp
      
      . save `data1_temp'
      file C:\Users\oyvinsni\AppData\Local\Temp\ST_3200_000001.tmp saved as .dta format
      
      . use data1, clear
      
      . merge m:1 hospid_peer using `data1_temp', keep(master match) nogenerate
      
          Result                      Number of obs
          -----------------------------------------
          Not matched                             0
          Matched                                 4  
          -----------------------------------------
      
      . qui reshape long peerid, i(id hospid hospid_peer)
      
      . drop if peerid == .
      (2 observations deleted)
      
      . sort id hospid hospid_peer peerid
      
      . list id hospid hospid_peer peerid, sep(10) abb(12) clean
      
             id   hospid   hospid_peer   peerid  
        1.    1     1000           101        3  
        2.    1     1000           101        4  
        3.    1     1000           102        3  
        4.    1     1001           101        3  
        5.    1     1001           101        4  
        6.    2     1002           102        3  
      
      . *joinby
      . use data1, clear
      
      . joinby hospid_peer using data2
      
      . sort id hospid hospid_peer peerid
      
      . list id hospid hospid_peer peerid, sep(10) abb(12) clean
      
             id   hospid   hospid_peer   peerid  
        1.    1     1000           101        3  
        2.    1     1000           101        4  
        3.    1     1000           102        3  
        4.    1     1001           101        3  
        5.    1     1001           101        4  
        6.    2     1002           102        3
      Last edited by Øyvind Snilsberg; 25 Aug 2022, 03:13.

      Comment


      • #4
        This is awesome, thank you both!

        Comment

        Working...
        X