Create new variable with new data in current dataset

Madelon Noten

Join Date: Jul 2020

Posts: 4
#1

Create new variable with new data in current dataset

03 Jul 2020, 08:09

Hi I have seen many posts with similar questions, but not one that I think completely fits my problem.

I am working with the European Values Survey and I would like to add a variable with data from an external source, that describes the percentage of the population that is immigrant of each country that is included.
My dataset consists of panel data of many individuals in European countries, so the observations of this percentage should be linked to each individuals and should match with the country they live in. I do not know how to add these data, let alone connect these together. I really hope someone can help me with this. Thank you so much in advance.
Tags: None
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#2

03 Jul 2020, 08:25

Looks like you want to merge on country. You will have to create a common country variable to merge on in both datasets, with the same country names (for example, not "UK" in one and "United Kingdom" in the other dataset. This could also be a country code variable, as long as both are the same. Then you would probably merge m:1 country or merge 1:m country depending on which dataset is the master dataset.

See also help merge for more information.

Edit: If your panel data is at the country year level and the immigrant data is also at the country year level, you would merge 1:1 country year

Last edited by Wouter Wakker; 03 Jul 2020, 08:27.
2 likes
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

03 Jul 2020, 08:31

Welcome to Statalist.

You want to use the merge command to add the variable from the second dataset to your existing dataset. The output of

Code:

help merge

gives the syntax, but because you apparently are a new user of Stata, you should click the link at the top of this output to view the complete PDF manual explanation, which has a lot more content geared to the new user of the merge command.

If your existing data is in main.dta and your additional variable is in immigrant.dta, and the variable country is in both datasets and is coded the same, the commands you will want will be something like

Code:

use main.dta merge m:1 country using immigrant.dta, keep(merge master)

But don't take my word for it! Read the documentation and understand what I am recommending. This is one of Stata's most important commands for data preparation, the time spent understanding how merge works will be amply repaid when you need it again later.

Added in edit: Wouter gives good additional advice about what to do if your immigrant data, like your main data, has different observations for different years. All the more reason why reading and understanding the documentation and treating my advice, and his advice, as no more than guidance to be evaluated in regard to your actual datasets, not as authority.

Last edited by William Lisowski; 03 Jul 2020, 08:34.
2 likes
Comment
Madelon Noten

Join Date: Jul 2020

Posts: 4
#4

06 Jul 2020, 13:25

Thank you both for your help. The thing I do not understand yet, is how to link the value for my new variable to the specific country. I have a list of percentages, but how do I link these to the right country in stata, so that when I merge the two datasets, the percentage values are linked to the right countries.

Added edit: and in the same way to the individuals per country.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#5

06 Jul 2020, 14:30

A variable indicating country must be present in both files, and it must be the same in both files. You will want an m:1 merge, where m (more than one) individual is linked to the one country in the country file. This is a very common thing in data processing, and is known as a "key" variable in a merge or join.
Comment

Announcement

Create new variable with new data in current dataset

Comment

Comment

Comment

Comment