Merging datasets

Zariab Hossain

Join Date: Oct 2020

Posts: 54
#1

Merging datasets

13 Jun 2023, 09:06

Hi All,

I am trying to merge two large datasets (from 1996-2018) namely lfs_append_all.dta and rams_append.dta. Both of these datasets contain person_id, firm_id, plant_id and year. I want to merge the datasets using these 4 keys. Each dataset has only one observation for each person_id, firm_id, plant_id and year. Both dataset contains multiple observation per firm_id, plant_id and year. The command I am using is below:

clear all
set more off
set seed 12345

use "P:\2021\15\Zariab\lfs_append_all"
merge 1:1 person_id firm_id plant_id year using "P:\2021\15\Zariab\rams_append", nogen

However, I am getting the following result:

variables person_id firm_id plant_id year do not uniquely identify observations in the master data
r(459);

I do not have a clear understanding what is going wrong. Any help would be highly appreciated. I sorted the data by person_id, firm_id, plant_id and year.

Thanks in advance!

Zariab Hossain
Uppsala University

Last edited by Zariab Hossain; 13 Jun 2023, 09:22.
Tags: None
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#2

13 Jun 2023, 09:13

If you want to merge the data using the four key variables you listed, then you need to put those into the merge command, perhaps like so:

Code:

merge 1:1 person_id firm_id plant_id year using "P:\2021\15\Zariab\rams_append", nogen

If you are trying to get the variables dnr202115 etc (and not any others) from the rams_append dataset, then you might want to do

Code:

merge 1:1 person_id firm_id plant_id year using "P:\2021\15\Zariab\rams_append", nogen keepusing(dnr202115 dnr2019129_peorgnr dnr2019129_cfar)

You might want to spend some time carefully reading through the documentation at

Code:

help merge
Comment
Zariab Hossain

Join Date: Oct 2020

Posts: 54
#3

13 Jun 2023, 09:25

Hi Hemanshu

Sorry for mentioning the wrong name of the variables. I tried to rename the variables for easier interpretation and forgot that I didn't change the code. I corrected my post. I already performed the thing that you mentioned and got that error.
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1060
#4

13 Jun 2023, 09:37

Check the Master file for duplicates. For example:

Code:

duplicate report person_id firm_id plant_id year

To learn more, check out: https://www.stata.com/manuals/dduplicates.pdf
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#5

13 Jun 2023, 09:40

Zariab Hossain you say in #1 that "each dataset has only one observation for each person_id, firm_id, plant_id and year", and yet Stata is telling you this is not the case, at least in the "master" dataset lfs_append_all. You need to investigate and see why the dataset does not meet your assertion. You can do something like

Code:

duplicates tag person_id firm_id plant_id year, gen(dups) br if dups != 0
Comment
Zariab Hossain

Join Date: Oct 2020

Posts: 54
#6

13 Jun 2023, 09:50

Hi Hemanshu and Ken,

Yes, I did find duplications in the master dataset. Shall I use the duplicates drop command and then try to merge? Thanks a lot for your quick helps.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#7

13 Jun 2023, 09:53

I don't think we can answer that for you. Are the duplicates definitely "mistakes"? If so, then you can probably drop them. Or is it that you have misunderstood the dataset? Then, likely not. If the multiple observations are meaningful, you may want an m:1 merge instead of a 1:1 merge.
Comment
Zariab Hossain

Join Date: Oct 2020

Posts: 54
#8

13 Jun 2023, 09:58

Thanks a lot for your great suggestions. I solved the problem.

Last edited by Zariab Hossain; 13 Jun 2023, 10:08.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment