merge

sahar shabani

Join Date: Feb 2019

Posts: 14
#1

merge

19 Feb 2019, 06:01

hi. please help me
i have 2 data-sets that common variables to merge are state, index_state and year. but when merging i faced an error: variables index_state state year do not uniquely identify observations in the master data or using data.
do you have any soloution?

Last edited by sahar shabani; 19 Feb 2019, 06:32.
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

19 Feb 2019, 06:20

Welcome to the Stata Forum / Statalist.

Please read the FAQ and take a look at the recommendations concerning sharing data/command/output.

You didn't present the command, but I suspect it was - merge 1:1 - , hence you should think about using - m:1 - instead.

This is just a tentative approach. Hopefully that helps.

Best regards,

Marcos
1 like
Comment
sahar shabani

Join Date: Feb 2019

Posts: 14
#3

19 Feb 2019, 07:08

thank you @macros almeida
i am trying all: 1:1 m:1 1:m ,but does not solved.

Last edited by sahar shabani; 19 Feb 2019, 07:22.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2423
#4

19 Feb 2019, 08:41

I would echo Marcos's suggestions regarding sharing data. Beyond that, I believe there are two possibilities for your problem:1) There is bad data in one or both of your files, and there should *not* be duplicated key values in or or the other of them. To check that, the built-in duplicates command might help. 2) The data is ok, but you really need some different than a -merge-. To determine what you need, we would need more explanation of the conceptual relationship between your two data files. That is: What is the structure of your master file, and what do you want your other ("using") file to add to it? If you could provide example data that illustrates on a small scale the problem you are having, that would be an important way to clarify 2).
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#5

19 Feb 2019, 08:42

Please present an abridged example of the variables in each data set. Depending on both, you may need - joinby - instead.

P.S.: Crossed with Mike's reply.

Last edited by Marcos Almeida; 19 Feb 2019, 08:47.

Best regards,

Marcos
Comment
Khairul Kamarudin

Join Date: Feb 2019

Posts: 6
#6

19 Feb 2019, 20:04

Dear Sahar

If you are using merge 1:1 then both master and 'using' file must not have any duplicate in term of unique identifier.

Use the following command to check for duplicates, (x1 x2 are e.g of your unique identifiers)
duplicates list x1 x2

remove the duplicate using the syntax below

duplicates drop x1 x2, force

Cheers

Khairul
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#7

19 Feb 2019, 20:26

I agree with Mr. Kamarudin's advice to check for duplicates. I disagree with the advice to use -duplicates drop x1 x2, force-.

If there are observations that agree on x1 and x2 (hence duplicates) but disagree on other variables (hence the need to use -force-), then applying using that command will arbitrarily keep one among the disagreeing observations, and the information in the other x1 x2 duplicating observations will be discarded. This is not good data management. If duplicates of x1 and x2 are found that are not completely duplicate observations on all variables, then those observations should be examined to determine why there are these discrepancies. There are several possibilities.

1. Some of the observations are errors, and there is an identifiable unique correct observation for each x1 x2 that should be kept. In that case go about -drop-ing those observations that are wrong, and retain the correct ones only. That cannot be accomplished with -duplicates drop, force-. Usually it requires some sequence of -drop if- or -keep if- commands.

2. There are supposed to be multiple observations for combinations of x1 and x2. This possibility splits into three subcases.
2a. Each of these observations is partial information and the observations need to be combined in some way. For example, perhaps (some of) the other variables that disagree need to be averaged, or the largest value used, or something like that. -collapse- is often useful in this setting.
2b. x1 and x2 are not sufficient to identify the observations for merging, but with some additional variables we can. So you need to figure out which additional variables, combined with x1 and x2, provide a satisfactory unique identification of observations. These additional variables, of course, must also appear in the other data set involved in the merging.
2c. There really aren't any meaningful sets variables that uniquely identify observations. In this case, merging the data sets is not possible, as there is no way to determine which observation in one data set is to be paired up with which observation in the other. Sometimes in this situation, what is really wanted is to pair each observation in the first data set with every observation in the other data set that has the same values of x1 and x2. In that case, the -merge- command is not appropriate; the correct command for this is -joinby-. See -help joinby-.

Bottom line: -duplicates drop x1 x2, force- is a dangerous command and should only be used if you are absolutely certain that the disagreements among the observations having common values of x1 and x2 are completely irrelevant to what you will be doing with the data. That condition is seldom met in real life.
1 like
Comment
Khairul Kamarudin

Join Date: Feb 2019

Posts: 6
#8

20 Feb 2019, 03:49

Thank you very much Clyde.
You made right points. Agree with you.
Need to be extremely careful with the 'force' option.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment