Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging duplicated observations between two datasets

    Hello Statalist users,

    Let me take an example of the problem I encounter. I have two datasets, namely Using and Master. Master is panel data with an old ID for each firm while Using contains the new IDs. I need to merge Using with Master so that firms in the panel dataset will have new IDs.

    The problem is that in Using, firm "1" has two new IDs (e.g., 101 & 102). I expect that after merging, the data of firm "1" in Master should duplicate and turn into two different firms. To do so, I first use -expand- command with Master data, and then -merge m:m-. However, the result is not as I expected.

    Please take a look at the data below and advise me on what to do. Thank you very much in advance.

    An illustration of Using is as below:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long old_id float(new_id treated)
    1 101 0
    1 102 0
    2 200 1
    3 300 1
    end
    label values old_id _id
    label def _id 1 "a", modify
    label def _id 2 "b", modify
    label def _id 3 "c", modify
    An illustration of Master is as below:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(old_id at t)
    1 380 1999
    1 535 2000
    1 234 2001
    2 567 1999
    2 100 2000
    2 233 2001
    3 674 1999
    3 120 2000
    3 546 2001
    end
    I expect the data file after merging as follows:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(old_id at t new_id treated)
    1 380 1999 101 0
    1 535 2000 101 0
    1 234 2001 101 0
    2 567 1999 200 1
    2 100 2000 200 1
    2 233 2001 200 1
    3 674 1999 300 1
    3 120 2000 300 1
    3 546 2001 300 1
    1 380 1999 102 0
    1 535 2000 102 0
    1 234 2001 102 0
    end
    Last edited by TramAnh Nguyen; 06 May 2022, 05:31.

  • #2
    According to the Book of StataCorp, verse 10:
    Because m:m merges are such a bad idea, we are not going to show you an example. If you think that you need an m:m merge, then you probably need to work with your data so that you can use a 1:m or m:1 merge.

    Comment


    • #3
      You need joinby.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long old_id float(new_id treated)
      1 101 0
      1 102 0
      2 200 1
      3 300 1
      end
      label values old_id _id
      label def _id 1 "a", modify
      label def _id 2 "b", modify
      label def _id 3 "c", modify
      
      tempfile two
      save `two'
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(old_id at t)
      1 380 1999
      1 535 2000
      1 234 2001
      2 567 1999
      2 100 2000
      2 233 2001
      3 674 1999
      3 120 2000
      3 546 2001
      end
      
      joinby old_id using `two'
      Res.:

      Code:
      . l, sep(0)
      
           +----------------------------------------+
           | old_id    at      t   new_id   treated |
           |----------------------------------------|
        1. |      1   380   1999      102         0 |
        2. |      1   380   1999      101         0 |
        3. |      1   535   2000      102         0 |
        4. |      1   535   2000      101         0 |
        5. |      1   234   2001      102         0 |
        6. |      1   234   2001      101         0 |
        7. |      2   567   1999      200         1 |
        8. |      2   100   2000      200         1 |
        9. |      2   233   2001      200         1 |
       10. |      3   674   1999      300         1 |
       11. |      3   120   2000      300         1 |
       12. |      3   546   2001      300         1 |
           +----------------------------------------+

      Comment


      • #4
        Yep, as Andrew notes, allow me to introduce you to joinby, the real m:m merge

        Comment


        • #5
          -joinby- is exactly what I need. Thank you two very much for the code and the article.

          Comment

          Working...
          X