Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging in Stata version 12 without specifying the merge type like 1:m, what happens then with what consequences?

    Dear Statalists,

    Merging in Stata version 12 without specifying the merge type like 1:m, what happens then with what consequences?
    When I try to merge a 4th dataset to my previously merged three datasets, I cannot find/use an unique key variable for that merge, however, when I do not specify 1:1, 1:m, m:1 or m:m for that merge, the merge seems to go perfectly but I'm afraid that not specifying the merge type means using a m:m merge type which I shouldn't use.


  • #2
    If you don't specify 1:m, m:1, 1:1, (or m:m) you are using the old syntax for -merge-. If you don't know what that syntax does, you should probably not use it.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Lydia,

      Merging in Stata is neatly explained here and here. If you want to conducted more sophisticated merging you may have a look at user-written programmes, like xmerge, mmerge and similar. When performing merging it is good to consider uniqueness of your ids and explore newly obtained data, as suggested here. How you decide to merge depends on what you want to achieve with the data.
      Kind regards,
      Konrad
      Version: Stata/IC 13.1

      Comment


      • #4
        Both versions of merge work the same way, it's just that the old version will not check your stated assumptions about identifiers in both the master and the using datasets. Your instincts tell you not to use an m:m merge but you want to merge anyway. Using the old syntax to get around your problem is going to lead to a m:m merge with your data. Since that is not illegal, merge will not complain and you will get results.

        With merge, it always pays to practice with toy examples to make sure you understand what's happening. Please run the sample code below and carefully study each step. In both datasets, id does not uniquely identify observations. The example tries 1:1, 1:m, and m:1 merge and all yield an error because of the lack of unique identifiers. Then I show that the m:m merge results are the same as those produced with the old syntax of merge.

        Since you don't have unique identifiers in both datasets, you have to decide how to combine observations with the same key value. The first with the first, the second with the second. What happens when the master has less observations that the using (or vice versa)? The last two sections of the sample code below show two possible approaches. Usually, the joinby command is what makes sense in these cases.

        Code:
        clear
        input str1 id x
        A 1 1
        A 2 2
        B 2 3
        end
        bysort id: gen obsno = _n
        tempfile fx
        qui save "`fx'"
        list, sepby(id) noobs
        
        clear
        input str1 id y
        A 11
        A 12
        A 13
        A 14
        B 12
        B 13
        B 14
        end
        bysort id: gen obsno = _n
        tempfile fy
        qui save "`fy'"
        list, sepby(id) noobs
        
        * id does not uniquely identify obs in either datasets; use
        * capture and noisily to ignore the error but show the error message
        use "`fx'", clear
        capture noisily merge 1:1 id using "`fy'"
        capture noisily merge 1:m id using "`fy'"
        capture noisily merge m:1 id using "`fy'"
        
        * Stata will let you do a m:m merge on these data
        use "`fx'", clear
        merge m:m id using "`fy'"
        sort id x y
        list, sepby(id) noobs
        
        * the old syntax just did not check for unique identifiers
        * so no errors are reported
        use "`fx'", clear
        merge id using "`fy'"
        
        * the results are the same as a m:m merge using the new syntax
        sort id x y
        list, sepby(id) noobs
        
        * there's only one observation with "B" in the first dataset so for
        * id == "B", this is equivalent to a 1:m merge. So the merged dataset
        * contains all pairwise combinations of x and y.
        list if id == "B", sepby(id) noobs
        
        * the problem with m:m merge is that x and y are paired in a way
        * that can't be explained logically. Why is 1 matched only with 11 and 
        * 2 is matched with 12,13,14?
        list if id == "A", sepby(id x) noobs
        
        * If you can't find unique identifiers in either dataset, it would
        * perhaps make more sense to add obsno to the merge
        use "`fx'", clear
        merge 1:1 id obsno using "`fy'"
        sort id x y
        list, sepby(id) noobs
        
        * But usually, what is desired is that every observation of the master
        * dataset be considered unique for the purpose of the merge and match
        * all observations with the same key value in the using dataset. That's
        * exactly what -joinby- does
        use "`fx'", clear
        joinby id using "`fy'"
        sort id x y
        list, sepby(id x) noobs

        Comment


        • #5
          I don't know if the following example will help you understand what joinby does but it does connect it with merge by showing that you can replicate joinby results using multiple 1:m merge.

          Code:
          clear
          input str1 id x
          A 1
          A 2
          A 3
          B 2
          end
          tempfile fx
          qui save "`fx'"
          list, sepby(id) noobs
          
          clear
          input str1 id y
          A 11
          A 12
          A 13
          A 14
          B 12
          B 13
          B 14
          end
          tempfile fy
          qui save "`fy'"
          list, sepby(id) noobs
          
          * Form all pairwise combinations within id
          use "`fx'", clear
          joinby id using "`fy'"
          sort id x y
          list, sepby(id x) noobs
          
          * Replicate the -joinby- results using repeated 1:m merges.
          * merge the first observation of each id in master with using
          use "`fx'", clear
          bysort id (x): keep if _n == 1
          list, sepby(id) noobs
          merge 1:m id using "`fy'", keep(master match) nogen
          sort id x y
          list, sepby(id) noobs
          tempfile obs1
          qui save "`obs1'"
          
          * merge the second observation of each id in master with using
          use "`fx'", clear
          bysort id (x): keep if _n == 2
          list, sepby(id) noobs
          merge 1:m id using "`fy'", keep(master match) nogen
          sort id x y
          list, sepby(id) noobs
          tempfile obs2
          qui save "`obs2'"
          
          * merge the third observation of each id in master with using
          use "`fx'", clear
          bysort id (x): keep if _n == 3
          list, sepby(id) noobs
          merge 1:m id using "`fy'", keep(master match) nogen
          sort id x y
          list, sepby(id) noobs
          
          * combine all merges to replicate the -joinby- results
          append using "`obs2'"
          append using "`obs1'"
          sort id x y
          list, sepby(id x) noobs

          Comment


          • #6
            Lydia,

            The recommendation is to read (and re-read) help and manual entries for the commands, before going to non-official sources. If you run help merge and help joinby, you get links at the top of the help files that open pdf manuals which cover the commands extensively. Practice examples like that provided by Robert (lots of them) are of course very useful.
            You should:

            1. Read the FAQ carefully.

            2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

            3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

            4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

            Comment


            • #7
              I would not use risky to describe each task. Using m:m merge is simply reckless. There are many instances where joinby is appropriate. If joinby does not seem appropriate for your particular predicament then you still have to decide what is the correct outcome. We can't see your data so it's hard to make additional suggestions. Perhaps you could post an example where there are multiple observations in both the master and using dataset for a particular value of the key variable and how you would mix these together.

              Comment


              • #8
                Originally posted by LydiaSmit
                I'm very curious about the pdf manuals at the top of the help files. Where can I find them? Using help merge and help joinby do not show them.
                The blue link under Title should take you to the manual entry for merge.

                Click image for larger version

Name:	stata_help_merge.png
Views:	1
Size:	62.5 KB
ID:	63908

                Another way is through the Help > PDF Documentation menu in the main Stata window. But that will not take you to the specific entry; you can search for it though.
                You should:

                1. Read the FAQ carefully.

                2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

                3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

                4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

                Comment

                Working...
                X