Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sequential merging error

    If there is a database into which you want to merge two datasets (A an B) one at a time, given that both A and B have the same named variables, does stata not merge in the second?

  • #2
    If you do something like
    Code:
    use some_dataset
    merge 1:1 keyvars using A, nogenerate
    merge 1:1 keyvars using B
    Stata will, indeed, -merge- B--but the data set will not change. When Stata does a -merge-, variables in the master data set that are also found in the using data set are left unchanged. If you want to change them, then you need to specify either the -update- or -update replace- options. If you specify -update-, then any missing values of these variables in the master data set will be replaced by the corresponding values in the using dataset. If you specify -update replace-, then all values of those variables in the master data set will be replaced by the corresponding values in the using data set.

    But here's the thing. If you do:
    Code:
    use some_dataset
    merge 1:1 keyvars using A, nogenerate
    merge 1:1 keyvars using B update replace
    then, at the end, it will appear that A was not merged in. In fact, it was merged in, but then everything it did got overwritten by B.

    If you want a data set that, in the end, contains both the values from datasets A and B, then you need to rename the variables in one (or both) of the data sets before -merge-ing. That said, doing this will result in a wide data set. It is likely that you would be better off not doing this, and instead -append-ing A and B, and then -merge-ing that result with the first data set. This will give you the data from both A and B in a long layout, which is more generally useful for data management and analysis in Stata.

    Comment


    • #3
      what if i only use "update"? that would mean it would "truly" merge in the variables from data B as well and i'll get the maximum data, correct?

      Comment


      • #4
        no, the "update" option replace missing data in the master with non-missing data (if any) in the using dataset; please read
        Code:
        h merge
        other than that, what you actually want is still unclear to me

        Comment


        • #5
          Re #3: Rich Goldstein is correct. Please re-read the final paragraph of my response in #2 to see how to get a data set that includes both the A and B data.

          Comment


          • #6
            I see.. so the "update" data only replaces existing missing data, but does not add new data..Got it. So with update, what i want, would only work in the case that datasets A and B have non overlapping samples and identical variables. Which is why i guess it worked in my case when i tried it. But i understand the better practice is to append A and B and then merging that product into main DB.. Thanks all!

            Comment

            Working...
            X