combine?merge?

Jiawei Tan

Join Date: Jun 2016

Posts: 12
#1

combine?merge?

24 Jun 2016, 14:42

Hi, I have been struggling with this so please help. Dataset1 has new variables that I want to add into Dataset 2. To make sure the new variables match with the Dataset 2, I included bostonid (patient's ID number) in Dataset 1.

The outcome that I want is Dataset 2 has new variables from dataset1, and it corresponds to the bostonid.
For example, bostonid dh-bos-1066-001, with the patient's age, sex, and New Variable's observation.

I tried to combine, form pairwise combination within groups, BUT I ended up with MORE bostonid, and more observations in age,sex,etc. Dataset1 does have more observations of bostonid than dataset2. But I do not want "new" bostonid added into my dataset2. How do I go about this?

I tried merge m:1 as well, but it says my new variables are not found. So I generated the new variables in dataset2 using "gen rs928167= ." Then I merge m:1 again. It then says rs928167 "is str3 in master but float in using data
You could specify merge's force option to ignore this string/numeric mismatch. The using variable would then be
treated as if it contained "".

I tried to combine database with merge 1:1 it says the new variables do not uniquely identify observations in the master data

I got response from STATA technician saying that
"You have a "rs928167" in both datasets, you will need to either -rename- one of
the variables so that there is not a conflict or if you are trying to
merge the variables in both dataset you will need to change the storage
type with a command like -tostring-."

I renamed it, then when I try to merge it says the variable does not exist.
I destring the rs928167 as well, and created a new variable for it, and merge again, still it says the destrung new variable does not exist.

What should I do now?

Attached Files
Tags: None
Joe Canner

Join Date: Mar 2014

Posts: 580
#2

24 Jun 2016, 15:25

Jiawei,

It seems that there are a number of problems here, not the least of which is that it is not clear what commands you are running and what the results are. Please copy and paste the exact command and the exact result. Moreover, you mention a variable "rs928167" but that variable does not exist in the Dataset1 you have in the picture.

Here are some general answers that may help in the meantime:
1. I'm not sure what you mean by "combine, form pairwise combination within groups". What command are you talking about? In any case, merge is probably the command that you want, specifically something like:

Code:

use Dataset2 merge 1:1 bostonid using Dataset1

2. If you don't want new observations from Dataset1 to be added to Dataset2, you can use the variable _merge to get rid of them. If Dataset2 is your master and Dataset1 is your using, drop cases with _merge==2 after the merge is complete. Alternatively, use the keep(master match) option.

3. The decision to use merge 1:1 or merge m:1 should be based on the structure of your data, not just on whether you get an error. It sounds like your data should be 1:1 but since you get an error, perhaps you need to examine your data more closely.

Regards,
Joe
2 likes
Comment
Jiawei Tan

Join Date: Jun 2016

Posts: 12
#3

24 Jun 2016, 18:01

Hi, Thanks for helping! Sorry it was rs926198*

I did not type command, I clicked on "Data----> combine dataset-----> form pairwise combination within group"

But seems like merge is the right thing to do, so I did:

. sort bostonid

. quietly by bostonid: gen dup = cond(_N==1,0,_n)

. drop if dup>1
(279 observations deleted)

. merge 1:1 bostonid using "C:\Users\jwt23\Desktop\Statin merged with Outliers removed,Averaged,prediabetic data,urine
> K.dta"
variable bostonid does not uniquely identify observations in the using data
r(459);

. merge m:1 bostonid using "C:\Users\jwt23\Desktop\Statin merged with Outliers removed,Averaged,prediabetic data,urine
> K.dta"
variable bostonid does not uniquely identify observations in the using data
r(459);

Why does this happen even after I removed the duplicated observation in dataset1?
("C:\Users\jwt23\Desktop\Statin merged with Outliers removed,Averaged,prediabetic data,urine
> K.dta" is my dataset2)

I cannot remove duplication from dataset2, it must remain unchanged.

Can you please show the command for
1)" variable _merge to get rid of them"
2)"drop cases with _merge==2 after the merge is complete."
3)" Alternatively, use the keep(master match) option."

thank you!
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

24 Jun 2016, 19:22

I want to start by clarifying the following. You wrote

I did not type command, I clicked on "Data----> combine dataset-----> form pairwise combination within group"

But after using a command from the menu, if you look in Stata's Results window, you will see the command that Stata "typed" for you, followed by Stata's results, both of which are what Joe wanted you to copy and paste. However, given your description, Stata created a joinby command, which is probably not what you needed.
If this is new to you, then perhaps you have not made much use of the Stata documentation. When I began using Stata in a serious way, I started by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax.
Now, after running your merge commands, Stata is telling you that you have duplicate observations in your using dataset, which is your dataset2. You told us you removed the duplicate observations in dataset1. That has no bearing on the error message. And it is not necessary that you remove duplicates in dataset2 as long as you have removed those in dataset1.

Your merge command need to be something like

Code:

merge 1:m bostonid using dataset2, keep(using match)

I strongly encorage you to read the output of help merge which is very clear about about merge 1:m, merge m:1, and merge 1:1, as well as the _merge variable and the keep option.

Also, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. See especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, as described in section 12 of the FAQ.

Last edited by William Lisowski; 24 Jun 2016, 19:24.
2 likes
Comment
Jiawei Tan

Join Date: Jun 2016

Posts: 12
#5

25 Jun 2016, 08:54

Thank you very much for the detailed explanation and advice!
Comment

Announcement

Comment

Comment

Comment

Comment