Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging Datasets with 2 Variables

    Hello All,

    I am having troubles solving this issue:
    1. I have a dataset1 that has the variable hhid(no duplicates) that is a household id and a bunch of other variables
    2. I have another dataset2 where hhid can have duplicates but also have a variable pid that uniquely defines members of the household (pid does not exist in dataset1)
    I want to merge both datasets such as:
    • When hhid is duplicated in dataset2, I make a copy of the existing line in dataset1 for each unique pid that exists for that hhid on dataset2, insert the same pids from dataset2 into those lines then merge on pid.
    • When hhid is not duplicated in dataset2, just merge on hhid

  • #2
    If I understand you correctly you are describing exactly what
    Code:
    use dataset1, clear
    merge 1:m hhid using dataset2
    does.

    If that is not what you meant, then post back showing example data from both data sets. Make sure the examples include some observations that should pair up in the two data sets. And then show what you want the final result to look like.

    To show the data examples, please use the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thanks Clyde,

      The first dataset looks like this:
      Code:
      hhid14_9 ksr3type
      "001060000" "B"
      "001060004" "B"
      "001080000" "B"
      The second dataset looks like this:
      Code:
      hhid14_9 pid14
      "001060000"  9
      "001060000" 10
      "001060000" 11
      "001060004"  3
      "001060004"  4
      "001080003"  3
      "001080003"  4
      "001080003"  5
      "001080003"  6
      "001080003"  7
      The merged dataset should look like this:
      Code:
      hhid14_9 ksr3type pid14
      "001060000" "B" 9
      "001060000" "B" 10
      "001060000" "B" 11
      "001060004" "B" 3
      "001060004" "B" 4
      "001080000" "B" .
      I hope this makes more sense.
      Last edited by Ali Bouba; 27 Mar 2024, 17:18.

      Comment


      • #4
        Code:
        . * Example generated by -dataex-. For more info, type help dataex
        . clear
        
        . input str11 hhid14_9 str3 ksr3type
        
                hhid14_9   ksr3type
          1. `""001060000""' `""B""'
          2. `""001060004""' `""B""'
          3. `""001080000""' `""B""'
          4. end
        
        . tempfile dataset1
        
        . save `dataset1'
        file C:\Users\clyde\AppData\Local\Temp\ST_2a54_000001.tmp saved as .dta format
        
        .
        . * Example generated by -dataex-. For more info, type help dataex
        . clear
        
        . input str11 hhid14_9 byte pid14
        
                hhid14_9     pid14
          1. `""001060000""'  9
          2. `""001060000""' 10
          3. `""001060000""' 11
          4. `""001060004""'  3
          5. `""001060004""'  4
          6. `""001080003""'  3
          7. `""001080003""'  4
          8. `""001080003""'  5
          9. `""001080003""'  6
         10. `""001080003""'  7
         11. end
        
        . tempfile dataset2
        
        . save `dataset2'
        file C:\Users\clyde\AppData\Local\Temp\ST_2a54_000002.tmp saved as .dta format
        
        .
        . use `dataset1', clear
        
        . merge 1:m hhid14_9 using `dataset2', keep(master match)
        
            Result                      Number of obs
            -----------------------------------------
            Not matched                             1
                from master                         1  (_merge==1)
                from using                          0  (_merge==2)
        
            Matched                                 5  (_merge==3)
            -----------------------------------------
        
        . sort hhid14_9 pid14
        
        .
        . list, noobs clean
        
               hhid14_9   ksr3type   pid14            _merge  
            "001060000"        "B"       9       Matched (3)  
            "001060000"        "B"      10       Matched (3)  
            "001060000"        "B"      11       Matched (3)  
            "001060004"        "B"       3       Matched (3)  
            "001060004"        "B"       4       Matched (3)  
            "001080000"        "B"       .   Master only (1)
        In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Comment


        • #5
          Thank you very much sir.

          Comment

          Working...
          X