Combining two data sets without common variable

Lucy Kraftman

Join Date: Aug 2017

Posts: 8
#1

Combining two data sets without common variable

04 Aug 2017, 04:28

Apologies in advance if I do not ask this question correctly - this is my first post.

I have two data sets. One is a survey of 5000 people with numerous variables about the individuals including their 1st, 2nd and 3rd choices of university. I have another data set which includes a number of characteristics about the universities including their ranking. The University numbers in the 1st, 2nd and 3rd choices of the first data set match with the numbers given in the second data set. What would be the simplest way to create variables which would for example give ranking of 1st choice, ranking of 2nd choice ranking of 3rd choice ?

E.g. of data sets

ID | 1ST CHOICE | 2ND CHOICE | 3RD CHOICE |
1 | 4005 | 3904 | 5001 |
2 | 6789 | 4678 | 9803 |

and so on. The schools data set then looks like this:

School | Ranking | % females |....
4005 | 45 | 51 |
3904 | 36 | 53 |
5001 | 2 | 53 |
6789 | 17 | 48 |

and so on.

I cannot think of any merge or append strategy that would work in this case.

Thanks in advance for any help.

Last edited by Lucy Kraftman; 04 Aug 2017, 04:31.
Tags: None
Soumyadeb Chatterjee

Join Date: Aug 2017

Posts: 17
#2

04 Aug 2017, 05:45

You can change the first dataset from wide to long, using the reshape command, and then use "School" to match the characteristics of each school.

Code:

Wide ID | 1ST CHOICE | 2ND CHOICE | 3RD CHOICE | 1 | 4005 | 3904 | 5001 | 2 | 6789 | 4678 | 9803 | Long ID | SCHOOL | CHOICE 1 | 4005 | 1 1 | 3904 | 2 1 | 5001 | 3 2 | 6789 | 1

and so on. Now we can clearly see how to match it with the second dataset.
1 like
Comment
Mina Sami

Join Date: Jun 2014

Posts: 50
#3

04 Aug 2017, 05:58

First, I am proposing to reshape your data.
In order to Reshape, I propose to change the variables name (to a unique commun identification with number) in your first data:
rename 1stCHOICE choice_1
rename 2ndCHOICE choice_2
rename 3rdCHOICE choice_3

Then proceed to reshape command.

reshape long choice_, i(id) j(date)

Now you have your data reshaped.
save it then merge with your main data.
1 like
Comment

Christoph Thewes

Join Date: Jun 2014
Posts: 33

04 Aug 2017, 06:38

and without reshape:

Code:

ren 1st_choice school
merge m:1 school using "university.dta"
ren rank rank1st
ren females females1st
ren school 1st_choice

ren 2nd_choice school
merge m:1 school using "university.dta"
ren rank rank2nd
ren females females2nd
ren school 2nd_choice

ren 3rd_choice school
merge m:1 school using "university.dta"
ren rank rank3rd
ren females females3rd
ren school 3rd_choice

you can shorten this with a loop

Comment

Lucy Kraftman

Join Date: Aug 2017

Posts: 8
#5

07 Aug 2017, 05:36

Thanks for all the help, worked perfectly
Comment
Gianno van Well

Join Date: May 2020

Posts: 2
#6

04 May 2020, 06:25

Hello,

For a research I want to combine 3 datasets. First I want to combine the AuditAnalytics dataset with Compustat;North-America Daily;fundamentals annual (both from WRDS-Wharton). I took the years 2006 till 2009 and selected "CIK" for company codes. After selecting all the available variables, I wanted to merge these 2 datasets into 1 by using "CIK" as the main variable to base the merging on. However, when I try to merge, I get the message: "variable cik does not uniquely identify observations in the master data. How can I solve this?

Thanks in advance for any help!
Comment

Announcement

Combining two data sets without common variable

Comment

Comment

Comment

Comment

Comment