Merging two sets of panel data

Ali Atia

Join Date: May 2020

Posts: 737
#1

Merging two sets of panel data

19 Jun 2020, 09:03

I have two datasets, both in the following format:

ID Time Var_1 Var_2 Var_3 Var_k
ID Time Var_1 Var_2 Var_3 Var_k

I'm trying to add some variables from dataset #2 to dataset #1 so that the latter looks like this:
ID Time Var_1 Var_2 Var_3 Var_from_dataset#2 Var_k

The problem is that the datasets are not perfect matches in terms of how often each ID appears for each time unit. In datatset#1, each ID has an observation for each time unit. In dataset#2, some IDs don't have observations for some time units, so they're omitted (not coded as missing). Because of this, dataset#2 has fewer observations total, and simply pasting in a variable to dataset#1 wouldn't work. Also, not all of the IDs which exist in dataset#2 exist in dataset#1.

Is there a way to move the data I need from dataset#2 to dataset#1 while making sure each observation matches the right ID and time? Or to somehow automatically create a bunch of missing values for observations which don't exist in dataset#2?

Thanks in advance.
Tags: merging datasets, panel data
Ali Atia

Join Date: May 2020

Posts: 737
#2

01 Jul 2020, 14:26

The solution was very simple. I simply standardized the IDs (country names) using the kountry command, converting both datasets to the same 3 digit format, and then merged using the command 'merge m:m id time using dataset#2.'
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

01 Jul 2020, 19:51

I hate to be the bearer of bad news, but it is quite possible that your merge m:m has not given you what you need.

The following is copied word-for-word from the documentation of the merge command in the Stata Data Management Reference Manual PDF included in the Stata installation and accessible from Stata's Help menu.

m:m merges

m:m specifies a many-to-many merge and is a bad idea. In an m:m merge, observations are matched within equal values of the key variable(s), with the first observation being matched to the first; the second, to the second; and so on. If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group. Thus m:m merges are dependent on the current sort order—something which should never happen.

Because m:m merges are such a bad idea, we are not going to show you an example. If you think that you need an m:m merge, then you probably need to work with your data so that you can use a 1:m or m:1 merge. Tips for this are given in Troubleshooting m:m merges below.

If you are thinking about using merge m:m, it is a near certainty that at least one of the following is true:

1. Your merge key actually does uniquely identify the observations in one of your data sets, so you can use merge 1:m or merge m:1 or maybe even merge 1:1.

2. You are failing to take account of one or more additional variables in your data set that, combined with the variables you are trying to use as the merge key, uniquely identify the observations in one or both of your data sets, so you can use merge 1:m or merge m:1 or merge 1:1 with the expanded merge key.

3. You are really trying to accomplish what joinby, a different command, does: create in the output dataset every possible combination of an observation from the first dataset and an observation from the second dataset, both having the same key. (SQL users take note! I fell for this myself: to Stata m-to-m does not mean m-by-m.)

4. The data sets you are trying to merge are incorrectly configured or contain data errors that need to be fixed.
Comment

Announcement

Merging two sets of panel data

Comment

Comment