How To Proceed With the Error Variable Merge Already Defined?

Roman Johnson

Join Date: Mar 2018

Posts: 87
#1

How To Proceed With the Error Variable Merge Already Defined?

05 Apr 2018, 20:34

Can somebody tell me how to get around this? I don't understand my error message given that this is the first time I have used the variable merge. Here is my do file and output.

do file:

use "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_tb.d ta", clear
merge m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3b\iiib_gh.d ta"
merge m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_ed.d ta"
merge m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_hm1. dta"
merge m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_iin. dta"
drop _merge
sort folio ls
save "C:\Users\rjohn123\Documents\09to12mflsdataset.dta ", replace

output:

do "C:\Users\rjohn123\AppData\Local\Temp\STD1050_0000 00.tmp"

. use "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_tb.d ta", clear
(Ennvih-3 Libro 3a_portad)

. merge m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3b\iiib_gh.d ta"

Result # of obs.
-----------------------------------------
not matched 97
from master 57 (_merge==1)
from using 40 (_merge==2)

matched 24,887 (_merge==3)
-----------------------------------------

. merge m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_ed.d ta"
variable _merge already defined
r(110);

end of do-file

r(110);

.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

05 Apr 2018, 21:04

Your first -merge- command creates the variable _merge. When you issue your second -merge- command, it attempts to create a variable _merge, but it can't do that because the one that the first -merge- command created is already there. So it complains and breaks. There are a couple of ways around this:

1. You can add the -nogenerate- option to each of your -merge- commands. That will prevent them from creating the _merge variable. The drawback to this is that you will have no way of identifying which observations in the resulting data set came from where. But if that doesn't matter for your purposes, then this may well be the simplest approach.

2. Instead, you can specify the -_merge()- option in each of your -merge- commands. Whatever name you specify there is what Stata will use instead of _merge itself when it creates the variable that shows where each observation came from. So you might want to give them names like _merge_gh, _merge_ed, etc. so that it will be clear when all is said and done what came from where.

3. Another approach would be to insert additional code between the -merge- commands that verifies that the resulting data set contains the number of observations from each source that you expect it to and then drops _merge before proceeding to the next -merge- command.
Comment
Roman Johnson

Join Date: Mar 2018

Posts: 87
#3

05 Apr 2018, 22:27

Okay, so this code:

use "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_tb.d ta", clear
_merge_gh m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3b\iiib_gh.d ta"
_merge_ed m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_ed.d ta"
_merge_hml m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_hm1. dta"
_merge_iin m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_iin. dta"
drop _merge
sort folio ls
save "C:\Users\rjohn123\Documents\09to12mflsdataset.dta ", replace

But I got the following:

. do "C:\Users\rjohn123\AppData\Local\Temp\STD1050_0000 00.tmp"

. use "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_tb.d ta", clear
(Ennvih-3 Libro 3a_portad)

. _merge_gh m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3b\iiib_gh.d ta"
command _merge_gh is unrecognized
r(199);

end of do-file

r(199);
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

06 Apr 2018, 06:31

You misunderstood what Clyde wrote, and Clyde complicated matters by apparently referring to an old syntax that does not appear in the output of help merge.

I believe that what you show in post #3 as

Code:

_merge_gh m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3b\iiib_gh.dta"

should be

Code:

merge m:1 folio ls using "C:\Users\rjohn123\Documents\hh09dta_b3b\iiib_gh.dta", generate(gh)

if what you intended was to generate the variable gh rather than the variable _merge to contain the indicator of the results of this merge.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#5

06 Apr 2018, 08:28

William is correct, the old _merge() option has been superseded by the generate() option. My apologies for confusing the issue.
Comment
Roman Johnson

Join Date: Mar 2018

Posts: 87
#6

06 Apr 2018, 22:02

Hi, I am now getting an error about my merging variable folio when previously I hadn't got an error. (I don't need the variable ls per the user's guide for this dataset). Can someone please tell me why this might be happening? I wasn't having an issue with my merge variable before until I tried the advice given about my previous merge error.

Code:

use "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_tb.d ta", clear
sort folio
merge m:1 folio using "C:\Users\rjohn123\Documents\hh09dta_b3b\iiib_gh.d ta", gen (gh)
merge m:1 folio using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_ed.d ta", gen (ed)
merge m:1 folio using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_hm1. dta", gen (hm1)
merge m:1 folio using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_iin. dta", gen (iin)
drop merge
save "C:\Users\rjohn123\Documents\09to12mflsdataset.dta ", replace

Output:

. merge m:1 folio using "C:\Users\rjohn123\Documents\hh09dta_b3b\iiib_gh.d ta", gen (gh)
variable folio does not uniquely identify observations in the using data
r(459);
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#7

06 Apr 2018, 22:23

The issue has nothing to do with the advice you were given about your previous problem. This issue arises because you dropped the variable ls from the merge key and are now keying the merge on folio alone.

The error message is self explanatory: in the file iiib_gh.dta, the variable folio does not uniquely identify observations. Otherwise put, this file contains some value(s) of the variable folio for which more than one observation is present.

You state that you dropped the variable ls because the user's guide for the data set says it isn't needed. It would not be the first time that a user's guide is wrong in this way. Perhaps you really do need the variable ls as part of the merge key. It is also possible that your data sets either came to you with errors or have been corrupted subsequently.

Anyway, the first step is to find the offending observations:

Code:

user iiib_gh, clear duplicates report folio duplicates tag folio, gen(flag) browse if flag

This will show you a count of the values of folio that have any given number of copies in the data set. And the browser will show you the offending observations, so you can try to figure out why they are there and what to do about them.

It may be that some of these observations are duplicates not just with respect to the variable folio but are complete duplicates. Such surplus observations can be dropped leaving just one behind, and the -duplicates drop- command will do that for you. Another innocuous possibility is that there are a few completely empty observations stuck onto the end of the data set (i.e. observations where every variable contains a missing value). I have seen this often.

But probably there will be some observations that have the same folio value along with conflicting information on other variables. In that case you will need to figure out how to reconcile those conflicts. It may be possible to identify which one is correct. Or it may be that the results for other variables have to be combined in some way. The solution ultimately relies on an understanding of the data, what it means, how it was gathered, etc., It is not a question of statistics or coding. Colleagues in your own discipline who have worked with these particular data sets before may be able to offer good advice on resolving these problems.

The fact is that even data sets from organizations with an excellent reputation often contain problems like this; in fact I would say that this is the rule rather than the exception. It is a fact of life in data management that most data sets come with errors and most data sets are not exactly as described, nor as you expect them to be.
Comment
Roman Johnson

Join Date: Mar 2018

Posts: 87
#8

09 Apr 2018, 12:54

Hi, I figured out how to handle the duplicates, but now stata thinks my merge command is a variable.

Code: use "C:\Users\rjohn123\Documents\iiia_tb dup.dta", clear
merge m:1 pid_link using "C:\Users\rjohn123\Documents\iib_gh dup.dta"
merge m:1 pid_link using "C:\Users\rjohn123\Documents\iiia_ed dup.dta"
merge m:1 pid_link using "C:\Users\rjohn123\Documents\iiia_hm1 dup.dta"
merge m:1 pid_link using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_iin. dta"
drop merge
sort pid_link
save "C:\Users\rjohn123\Documents\09to12mflsdataset.dta ", replace

Output:
. use "C:\Users\rjohn123\Documents\iiia_tb dup.dta", clear
(Ennvih-3 Libro 3a_portad)

. merge m:1 pid_link using "C:\Users\rjohn123\Documents\iib_gh dup.dta"

Result # of obs.
-----------------------------------------
not matched 57
from master 57 (_merge==1)
from using 0 (_merge==2)

matched 24,927 (_merge==3)
-----------------------------------------

. merge m:1 pid_link using "C:\Users\rjohn123\Documents\iiia_ed dup.dta"
variable _merge already defined
r(110);

end of do-file

r(110);

Can someone please tell me what I need to do to fix this? Thank you.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#9

09 Apr 2018, 13:10

This is exactly the same problem that you describe in post #1. You need to add the generate() option to each of the merge commands as you did in post #6.
Comment
Roman Johnson

Join Date: Mar 2018

Posts: 87
#10

09 Apr 2018, 13:18

Yes, I know, but I dropped them because I have the same problem, William.

Thanks for your help.

code:

use "C:\Users\rjohn123\Documents\iiia_tb dup.dta", clear
merge m:1 pid_link using "C:\Users\rjohn123\Documents\iib_gh dup.dta", gen (gh1)
merge m:1 pid_link using "C:\Users\rjohn123\My Documents\iiia_ed dup.dta", gen (ed1)
merge m:1 pid_link using "C:\Users\rjohn123\Documents\iiia_hm1 dup.dta", gen (hm2)
merge m:1 pid_link using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_iin. dta", gen (iin1)
drop merge
sort pid_link
save "C:\Users\rjohn123\Documents\09to12mflsdataset.dta ", replace

output:
. use "C:\Users\rjohn123\Documents\iiia_tb dup.dta", clear
(Ennvih-3 Libro 3a_portad)

. merge m:1 pid_link using "C:\Users\rjohn123\Documents\iib_gh dup.dta", gen (gh1)

Result # of obs.
-----------------------------------------
not matched 57
from master 57 (gh1==1)
from using 0 (gh1==2)

matched 24,927 (gh1==3)
-----------------------------------------

. merge m:1 pid_link using "C:\Users\rjohn123\My Documents\iiia_ed dup.dta", gen (ed1)

Result # of obs.
-----------------------------------------
not matched 40
from master 40 (ed1==1)
from using 0 (ed1==2)

matched 24,944 (ed1==3)
-----------------------------------------

. merge m:1 pid_link using "C:\Users\rjohn123\Documents\iiia_hm1 dup.dta", gen (hm2)

Result # of obs.
-----------------------------------------
not matched 564
from master 564 (hm2==1)
from using 0 (hm2==2)

matched 24,420 (hm2==3)
-----------------------------------------

. merge m:1 pid_link using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_iin. dta", gen (iin1)

Result # of obs.
-----------------------------------------
not matched 40
from master 40 (iin1==1)
from using 0 (iin1==2)

matched 24,944 (iin1==3)
-----------------------------------------

. drop merge
variable merge not found
r(111);

end of do-file

r(111);
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#11

09 Apr 2018, 13:28

You need to try to think about what your commands are doing. And you need to read the error messages you are getting and understand what they mean. The problems you report in #8 and #10 are different. Read the error messages: they are pretty self-explanatory. In #8 Stata is complaining that _merge already exists. In #10, Stata is complaining that merge (note: no _ at the beginning) does not exist. These are completely different situations.

In #10 each -merge- command creates a new variable with the name specified in the -gen()- option. So your new variables are named gh1, ed1, hm2, and iin1. None of them is named merge. So when you ask Stata to -drop merge- it appropriately complains that there is no such variable.

Just skip the -drop merge- command. If you are tight on memory and you need to eliminate gh1 ed1 hm2 and iin1, then -drop- those instead.

In #8, because you do not use the -gen()- option, each -merge- command attempts to create a variable called _merge. But you can't do that in the second or subsequent -merge-s because _merge has already been created by the first -merge- command. So the solution is to either use the -gen()- option to avoid this kind of name clash, or you can skip the -gen()- option, but then you must -drop _merge- (note: the initial _ before merge is mandatory!!), and you must do this following each and every one of the -merge- commands, not just at the end of the whole series of them.
1 like
Comment
Roman Johnson

Join Date: Mar 2018

Posts: 87
#12

09 Apr 2018, 18:13

Thanks, Clyde! I know now!
Comment

Announcement

How To Proceed With the Error Variable Merge Already Defined?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment