Merging Variables to create new ID

Joe Larwood

Join Date: Apr 2021

Posts: 2
#1

Merging Variables to create new ID

26 Apr 2021, 06:58

Hello!
So I've recently been working with household data sets that describes observations in both a serial number for the household, and a number (usually 1-3) describing the individual within that household. I was hoping to merge two of the datasets however STATA comes up with the message that the serial number and the individual number do not uniquely identify the observations, and I'm assuming this is because the individual number is frequently 'repeated' despite referencing different individuals. I wondered if it was possible to create a new variable through merging that would identify both the household and individual number in an easier way.
For example, if the serial number was 1107 and there were two different individuals in the household, rather than having observations both with serial numbers 1107 and then individual numbers 1 and 2, I could create a variable through merging the two that looked like 1107/1 and 1107/2, or something like that.
Thanks in advance!
Tags: None
Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

26 Apr 2021, 07:02

Hello and welcome!

Probably no need to create a combined ID, in -merge- you can provide a list of variables that together define a unique case. For example:

Code:

use DATA_A, clear merge 1:1 HHID PERSONID using DATA_B

Last edited by Ken Chui; 26 Apr 2021, 07:08.
Comment
Joe Larwood

Join Date: Apr 2021

Posts: 2
#3

26 Apr 2021, 08:03

Hi Ken, and thank you!
Sorry for not making it clear, but this is what I was trying to do before I made the post! This command results in the error "variables SERNUM PERSON do not uniquely identify observations in the using data" despite each combination representing a unique observation.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4450
#4

26 Apr 2021, 08:09

Stata is correct and, when you say, "despite each combination representing a unique observation" you are not correct; use the duplicates command to find these problems which are in the "using" data set but, apparently, not in the master data set; see

Code:

help duplicates

note that, as the FAQ advises, we can more easily help if you provide data examples of each data set using -dataex-; see

Code:

help dataex
2 likes
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1058
#5

26 Apr 2021, 12:22

Originally posted by Joe Larwood View Post

Hi Ken, and thank you!
Sorry for not making it clear, but this is what I was trying to do before I made the post! This command results in the error "variables SERNUM PERSON do not uniquely identify observations in the using data" despite each combination representing a unique observation.

In that case, creating a combined ID will not help either. The second data set has some duplicated hhid-personid cases in it. There are a few possibilities: data entry error; the second data could be longitudinal by nature (in that case -merge 1:m- would be more appropriate); there could also be multiple empty rows in the data set (in Stata, two missings in ID count as duplicates as well.) Please follow the advice in #4 to diagnose what happened with the second data set.

Last edited by Ken Chui; 26 Apr 2021, 12:34.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35563
#6

26 Apr 2021, 12:31

The problem discussed in https://www.stata.com/support/faqs/d...d-time-values/ is similar (not identical, but that's not material).

It's common that people are in denial about such reports, partly because of what should not be true. As in the rest of life, should not and is not are different spheres. Exceptions arise for various reasons and here are some:

Trivial errors that Stata takes literally, including misspellings and differences in punctuation (such as spaces) that accidentally create duplicates.

Occasional duplicates that you have overlooked and Stata is smarter at spotting.

A chunk of observations that are declared duplicates because they are all missing on key variables. If data were in a spreadsheet, this kind of record you would just ignore, but if imported into Stata they can hard to spot until you look for them.
Comment

Announcement

Merging Variables to create new ID

Comment

Comment

Comment

Comment

Comment