Merging problem for two data-sets with a common string" id"

Hossein Kavand

Join Date: Jan 2015

Posts: 26
#1

Merging problem for two data-sets with a common string" id"

23 Sep 2018, 14:18

Hello,
I want to merge two datasets using a common "id" variable that is in both of them. However, the "id" variable is"string" and is a mixture of "Q" and some numbers such as Q2515789.

I tried to use this "id" variable with the command

Code:

merge 1:1 id

, but it did not merge them and only put the second database below the first one like what the command

append

does.

Should I convert the "id" first to something else? How?
Please let me know how the command

merge

works this kind of variables.

Thanks for your help.

Hossein
Tags: None
Rebecca Water

Join Date: Sep 2018

Posts: 44
#2

23 Sep 2018, 15:34

Did you use the command correctly? Usually it is no problem if the key variable is string ! I merged today with a string variable and it worked (Stata 15.0)
What is your error message?
Did you check if your specification 1:1/1:m/m:1 is the right one? Is the key variable named identically in both the master and using dataset?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

23 Sep 2018, 17:50

Since you don't share samples of your data, and merge apparently did run, I'd have to guess that there were no matches between the two datasets. Since you didn't share what merge reported when it ran, I'd have to guess that it reported that none of your observations matched, and you overlooked or misunderstood that..

Usually when there are no matching observations it's obvious from looking at the two datasets. But one subtle, easy-to-overlook possibility is that in one (or both!) of your datasets, there are extra blank characters at the end of id, which are not there in the other dataset. This can be resolved by

Code:

replace id = trim(id)

in whichever dataset has this problem.

If none of this helps, or even if it does, please take the time to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
1 like
Comment
Hossein Kavand

Join Date: Jan 2015

Posts: 26
#4

24 Sep 2018, 16:45

Thanks for your comments. Unfortunately, the data that I am working with is confidential and I cannot share it since I do not have access to it out of the institute.

I am glad that the merge command accepts string variables as the key variable to merge two datasets.

Maybe as Rebecca mentioned, I am using a wrong "merge x:x" command.

As an example, there is a firm with ID= Q2515789 and it has 3 different types of capital assets such as a,b, and c.
Dataset 1 contains the expected life of each asset and the second one has data for the actual discard date. As you can see below, asset c in the second data set does not appear and asset "a" may be missing.

What type of merge command do I need to use?

Code:

merge 1:1 ID Asset using …

Or

Code:

merge m:m ID Asset using …

I used the first command above and the value for the variable "_merge==3" was zero.

Thank you so much for your help.

Code:

Dataset 1 Dataset 2

ID Asset expected life ID Asset discard date

Q2515789 a 10 Q2515789 a .

Q2515789 b 8 Q2515789 b 9

Q2515789 c 9
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

24 Sep 2018, 18:42

Maybe ... I am using a wrong "merge x:x" command.

Since your

Code:

merge 1:1 ID Asset using

completed without an error message, we know that ID and Asset provide unique identifiers in each dataset - if one of them had duplicated values of ID and Asset, the merge command would have given an error message telling you so, and not proceeded to create a merged dataset. So, no, neither merge 1:m nor merge m:1 nor merge m:m will do what you need. If this isn't clear, do review the documentation on the merge command in the Stata Data Management Reference Manual PDF including in your Stata installation and accessible through Stata's Help menu. In particular the documentation will warn you that merge m:m is never the right thing to do, so I suspect you have yet to review the documentation.

the value for the variable "_merge==3" was zero

That confirms my assertion that none of the ID/Asset pairs in your master dataset matched any of the ID/Asset pairs in your using dataset.

Post #3 gave you guidance on what the problem could be, and a solution. You do not report results of exploring that possibility. Let me show you an approach to test my assertion that one or both of your datasets has blanks in ID or Asset or both. I created a dataset with 100 IDs from Q101 to Q200.

Code:

codebook ID ------------------------------------------------------------ ID (unlabeled) ------------------------------------------------------------ type: string (str4) unique values: 100 missing "" > : 0/100 examples: "Q120" "Q140" "Q160" "Q180"

Next I modified the data so that one of the values had a trailing space.

Code:

. codebook ID ------------------------------------------------------------ ID (unlabeled) ------------------------------------------------------------ type: string (str5) unique values: 100 missing "" > : 0/100 examples: "Q120 " "Q140" "Q160" "Q180" warning: variable has trailing blanks

So in both your master dataset and your using dataset, run

Code:

codebook ID Asset

and see what it reports about blanks that you do not expect to have in either variable.
Comment

Dataset 1			Dataset 2
ID	Asset	expected life	ID	Asset	discard date
Q2515789	a	10	Q2515789	a	.
Q2515789	b	8	Q2515789	b	9
Q2515789	c	9

Announcement

Merging problem for two data-sets with a common string" id"

Comment

Comment

Comment

Comment