Dear Statalisters,
My dataset comprises approximately 21 million individuals and each individual represented by one row in the dataset. I would like to create a unique identifier for each individual numbered from 1...n.
I used the following code:
gen id = _n
This created a variable with only approximately 19 million unique values. Approximately 2 million values then were duplicates. Having examined the duplicates closely I do not see what distinguishes them and explains why Stata treated them as identical observations.
Can anyone explain why this is happening and how I can otherwise create a unique identifier for each individual?
Thank you.
Omar
My dataset comprises approximately 21 million individuals and each individual represented by one row in the dataset. I would like to create a unique identifier for each individual numbered from 1...n.
I used the following code:
gen id = _n
This created a variable with only approximately 19 million unique values. Approximately 2 million values then were duplicates. Having examined the duplicates closely I do not see what distinguishes them and explains why Stata treated them as identical observations.
Can anyone explain why this is happening and how I can otherwise create a unique identifier for each individual?
Thank you.
Omar
Comment