variable date does not uniquely identify observations in the master data stata

Pratheesh Omana-Sudhakaran

Join Date: Jul 2014
Posts: 15

variable date does not uniquely identify observations in the master data stata

21 Jul 2014, 16:07

Hai
I was trying to merge two data sets using "date variable" as common. One of the data sets is a panel data with multiple observations for a day and the another have weather observations for each day and some days there are no observations. When I tried to merge these two datasets, I got the error message "variable date does not uniquely identify observations in the master data stata".

My master dataset looks like:

date	landwk	dealr	quantity	price	spp
1-Jan-07	2007w1	10	40	1	5
1-Jan-07	2007w1	10	583.1	0.840336	4
1-Jan-07	2007w1	10	31.57143	1.12	1
1-Jan-07	2007w1	10	1.565217	0.69	1
2-Jan-07	2007w1	13	20.52174	0.7475	1
2-Jan-07	2007w1	13	35.42857	1.12	1
2-Jan-07	2007w1	10	86.57143	1.12	1
2-Jan-07	2007w1	13	189.7143	1.12	1
2-Jan-07	2007w1	12	0		0

and my second dataset looks like:

date	wtempQuon	atempQuon	wspeedQuon	wgustQuon
1/1/2007	6.411864	7.785593	6.168644	8.516949
1/10/2007	6.682353	NA	3.169608	5.35098
1/11/2007	6.438938	NA	8.125664	11.41416
1/12/2007	6.578302	8.581132	5.318868	7.106604
1/13/2007	NA	NA	NA	NA
1/14/2007	6.366667	NA	99	NA
1/15/2007	NA	NA	NA	NA
1/16/2007	NA	4.532353	6.720588	11.21765
1/17/2007	NA	NA	6.668182	10.91818
1/18/2007	4.647748	-7.18468	5.54955	8.818919
1/19/2007	4.663077	-3.34615	2.7	3.850769

Can you please help me to merge these two datasets?
Thanks in advance

Tags: None

Richard Williams

Join Date: Apr 2014

Posts: 4983
#2

21 Jul 2014, 16:17

What was the command you used? I think it is a many to 1 merge. Something like

use masterdata
merge m:1 date using weatherdata

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Pratheesh Omana-Sudhakaran

Join Date: Jul 2014

Posts: 15
#3

22 Jul 2014, 06:46

Thanks Richard, it worked. I tried 1:1 merging and 1:m merging, but somehow I did not try m:1 merging.
Comment
Stefano Grillini

Join Date: Jun 2015

Posts: 85
#4

23 May 2016, 09:57

Hi, I have a similar problem. I'm trying to merge two datasets with daily observations from 01/01/1990 to 31/12/2015. I checked the length of the datasets and it's exactly the same. Of course the merging variable (date) is also equal for both. I get the error code r(459) "variable date does not uniquely identify observations in the using data". I already have 1447 variables and I want to merge other 625 variables, all downloaded from Datastream, so I have not manipulated them. Before you ask I'm using Stata MP, so if I'm not wrong I should be allowed to work with more than 32000 variables.

I'm struggling because I have not idea what could be the problem.

Thanks for any help

Stefano
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35656
#5

23 May 2016, 10:01

If either dataset is panel data, then you need to merge on an identifier too. Otherwise, use duplicates to find out about duplicates and/or see e.g.

http://www.stata.com/support/faqs/da...ues/index.html
Comment
Stefano Grillini

Join Date: Jun 2015

Posts: 85
#6

29 May 2016, 09:10

Thanks Nick,
I tried a "dangerous" solution. I created a new variable in both datasets that allowed me to merge. However, 2 items are not matched, even though in the Excel file the length of the columns is equal. Then I deleted the last two rows, since "apparently" the dataset is the same.

Thanks

Stefano
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#7

29 May 2016, 11:01

Stefano:
the 2 non-matched observations may alternatively belong to:file in memory (master; _merge==1) ;file on disk (using file; _merge==2); eventually, 1 for each file.
It would be interesting to check whether what you got from your "dangerous" solution mirrors the goal of your -merge- procedure (I've some bad surprises in the past).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Stef Anie

Join Date: Dec 2016

Posts: 25
#8

10 Dec 2016, 15:34
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30069
#9

10 Dec 2016, 16:19

Stefano, there's a reason solutions like that are called "dangerous." If the two data sets are supposed to match up exactly and Stata says they don't (I've never known Stata to be wrong when it says this) then either something is wrong with the data or something is wrong with your expectations about the data. In either case, sweeping the problem under the rug is likely to lead to trouble later in your work with the data. If you are lucky, at some point you will get some obviously ridiculous result which will force you to go back and solve the data problem and then rerun everything. You will have wasted a lot of time doing rework, but you will be fine in the end. On the other hand, if you are unlucky, you will go along blithely unaware of the danger lurking. And then it will strike at the worst possible time. A client will rely on your analysis and will discover through applyling it that it's dead wrong. Or during a presentation, somebody in the audience will observe something strange about your results that you didn't notice and will call you out on it.

You should never do this kind of thing. It is likely, as Carlo points out, that the two unmatched items are in fact intended to match each other but one of them has a wrong date, perhaps due to a typo. With only two mismatched items like that, it is likely that the fix is fairly simple. Or it may be that if you look into the specifications of these two files from their source, you will find out that they are not intended to match exactly. Perhaps one covers a period from 1jan2005 through 31dec2005 and the other covers 31dec2004 through 30dec2005. Such data would properly have two unmatched observations when merged, and deleting them as you have done, would be appropriate. But unless you are sure that the two unmatched observations are spurious, deleting them is a really bad idea. You should either find out for sure that they don't belong, or you should find out why one (or both) of them is a data error and get the error corrected.
1 like
Comment
Devender Pratap

Join Date: Apr 2017

Posts: 1
#10

08 Apr 2017, 01:58

Hi I encountered a problem while reading the NSSO unit level for India with the help of dictionary file. I specified a variable in my dictionary using "D:\NSSO_70th\Data\Visit-1\AH0633V1.txt" {
_column(1) ccode %3s " "
_column(4) fsu %5s ""
_column(9) rnd %2s "". I ran a command "infile using Lev_06.dct, clear"", it read a variables say fsu rnd (and all other variables as well) fsu, as "float %9.0g" , whicseems quite strange to me. I shall be grateful if someone could address my issues
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30069
#11

08 Apr 2017, 09:13

Devender Pratap Your question is not related to the topic of this thread. It is important that threads stay on topic so that people who are searching for help similar to what you are asking will be able to find whatever solutions come up. Please re-post your question in a new thread.

I would also recommend you make your question clearer. First there is formatting: your code should be posted between code delimiters so it is easily readable. (See FAQ #12 for details on how to use code delimiters). Next, this is an international, interdisciplinary forum. You should not assume that people will automatically understand what NSSO is. Spell it out, and if it refers to data that is available online for others to see, do provide a link. It would probably also be a good idea to show something of what the original data in AH0633V1.txt look like, so that somebody can see whether your dictionary file is an appropriate match to the data. With the limited information you have given, it is anybody's guess what the problem might be.
Comment
Stephen Long

Join Date: May 2017

Posts: 4
#12

08 May 2017, 06:53

I'm having a similar issue, but the error message about variables not uniquely identifying observations in the master data persists when I run the merge command, even after specifying that it is a m:1 merge. Isn't that the point of specifying the m:1 format? Stata 14, so it will not go forward with the merge (Stata <11 would give the error message but go forward with the merge under the old syntax, I think).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30069
#13

08 May 2017, 09:38

Please use code delimiters (see FAQ #12 if you are not familiar with these) and post the exact code you ran and the exact output you got from Stata.
Comment
Stephen Long

Join Date: May 2017

Posts: 4
#14

08 May 2017, 12:06

I loaded the master set, sorted by the identifying variables (a country code and the year) in the master and the using set, then tried to do the m:1 merge.

Code:

merge m:1 ccode1 year using "SWIID.dta"

where the master set has multiple instances of ccode1-year combinations, the using set has (or should have) only one instance of each ccode1-year combination, and I am trying to insert the using set's data into all instances in the master set.

Stata then stops the merge and gives me

Code:

variables ccode1 year do not uniquely identify observations in the using data r(459);

This seems odd, as the duplicates command in the using data set yield no reported duplicates.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4459
#15

08 May 2017, 13:16

what, again, exact, duplicates command did you use? in particular, did you limit it to searching for duplicates on only those 2 variables? please show your exact code (copy-and-paste) if this is not clear
Comment

Announcement

variable date does not uniquely identify observations in the master data stata

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment