Generating a Unique ID for a wide dataset

Aditi Rao

Join Date: Feb 2023

Posts: 4
#1

Generating a Unique ID for a wide dataset

21 Feb 2023, 13:17

Hello,

I am currently working with a wide dataset and I would like to assign a unique variable to each individual, not each observation. Example below:

Beneficiary ID Claim ID

EEEAVH9E abcde

EEEAVH9E fghij

EEEAVH9E klmno

EEEAVG7F pqrst

Beneficiary IDs are not unique, but Claim IDs are. However, I need to generate a unique ID that will identify everyone with the same beneficiary ID as one individual. Is there a way I can generate a unique ID for just each beneficiary ID?

Any advice is appreciated!
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3337
#2

21 Feb 2023, 14:20

egen id = group(BeneficiaryID)

If you need for both,

egen id = group(BenefiaryID ClaimID)
Comment
Aditi Rao

Join Date: Feb 2023

Posts: 4
#3

22 Feb 2023, 14:14

Thank you so much!

I have a follow up question-- in a wide dataset where there are multiple observations per individual, how would you make sure to consider only one of the observations while calculating things like mean age. For ex. I only want to use the value of the first observation for each individual. Let's say I want to calculate mean age and it varies per row because these are values recorded over several years. How would I consider only the first observation for each individual
Comment
George Ford

Join Date: Aug 2014

Posts: 3337
#4

24 Feb 2023, 10:48

If its wide, then just take the variable indicating the first incidence of age.

If long, which it likely should be, you need to create an identifer. If there's a year variable, you could "if" on that. if not, then great an index by id (bys id: g index = _n) then restrict the sample on index==1.
Comment

Beneficiary ID	Claim ID
EEEAVH9E	abcde
EEEAVH9E	fghij
EEEAVH9E	klmno
EEEAVG7F	pqrst

Announcement

Generating a Unique ID for a wide dataset

Comment

Comment

Comment