merging two datasets

jaskaran kaur

Join Date: Jul 2017

Posts: 12
#1

merging two datasets

25 Jul 2017, 03:13

Hello everyone, I need to merge two datasets using ISIN as the main identifier.

Dataset 1(Master file)

Year mnemonic ISIN CompanyName
2011 ULVR GB00B10RZP78 UNILEVER (UK)
2005 ULVR GB00B10RZP78 UNILEVER (UK)
2009 ULVR GB00B10RZP78 UNILEVER (UK)
2010 ULVR GB00B10RZP78 UNILEVER (UK)
2013 ULVR GB00B10RZP78 UNILEVER (UK)
2016 ULVR GB00B10RZP78 UNILEVER (UK)
2004 ULVR GB00B10RZP78 UNILEVER (UK)
2006 ULVR GB00B10RZP78 UNILEVER (UK)
2008 ULVR GB00B10RZP78 UNILEVER (UK)
2003 ULVR GB00B10RZP78 UNILEVER (UK)
2012 ULVR GB00B10RZP78 UNILEVER (UK)
2007 ULVR GB00B10RZP78 UNILEVER (UK)
2017 ULVR GB00B10RZP78 UNILEVER (UK)
2015 ULVR GB00B10RZP78 UNILEVER (UK)
2002 ULVR GB00B10RZP78 UNILEVER (UK)
2014 ULVR GB00B10RZP78 UNILEVER (UK)

Dataset 2 (Using file)
ISIN G H I J K L M
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735
NL0000388601 NL0000009355 GB00B10RZP78 US9047847093 NL0000388619 NL0000009348 US9047677045 GB0005748735

I need to merge using ISIN but in using file there are so many duplicates, which I separated in different columns.So in above example, ISIN from 1 dataset is matching to H in 2 dataset.
is there anyway I can merge so stata pick up all the duplicate ISIN from Using dataset (ISIN, G,H,I,J,K,L,M).

Right now I am only merging using ISIN.

Anyone can help?

Regards
Jas
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 28551
#2

25 Jul 2017, 08:11

There are two obstacles to getting a solution to your problem.

1. The way you posted your data examples they are very time-consuming to import to Stata. Most of those who want to help do, after all, have other things to do as well, and can't waste a lot of time wrestling with poorly-displayed data. As per FAQ #12, the helpful way to show example data is with the -dataex- command. Run -ssc install dataex- to install it in your Stata setup, then run -help dataex- to read the simple instructions for using it. Post back using -dataex- for both of these example data sets and people will be able to work with your data to test out solutions.

2. It isn't clear what you want the end result to look like. So, work out by hand what you are trying to get using this small example and, again, use -dataex- to post that.

It is likely that given workable data (-dataex- examples) and clear understanding of the desired endpoint, someone will be able to help you.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

25 Jul 2017, 10:10

Following up on Clyde's advice, I was confused by your sample from your using dataset. It has 24 identical observations, and includes only the ISINs that you hope to match. Is that how your data really looks - duplicated observations and no additional variables? It's important that your examples be realistic, if we were to merge this dataset to your master, what would you expect: each year in your master dataset would match all 24 observation in your using dataset.

Clyde's recommendation in his second point is crucial: we cannot imagine exactly what you want your result to look like, and for that matter, cannot even be sure your inputs are realistically represented in your post.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
Comment

Announcement

merging two datasets

Comment

Comment