Count number of unique ids by wealth percentile

Claire Lyng

Join Date: Apr 2016

Posts: 12
#1

Count number of unique ids by wealth percentile

13 Feb 2018, 05:06

Dear Statalist,

I want to count the number of unique individuals who are enrolled in multiple institutions at a specific year by wealth percentile. A concrete example:
My data is from 2009 to 2015. unique identifier of my panel is (personal id, institution id, year). Some people are enrolled in multiple institutions at a particular year, so there will be duplicates by (personal id, year). I want to plot the number of unique ids who are enrolled in multiple institutions against wealth distribution in year 2009.

how should I think of this problem? I am thinking to solve it in the following way:

1) generate wealth percentile for 2009. same individual have the same wealth in the same year, but if he is enrolled in multiple institutions, he will have several observations. in this case, he is counted several times. the wealth distribution will not be correctly calculated.
2) I want to generate a varaible that tells me which wealth distribution that person is in for the year 2009. then i can calculate number of unique ids grouped by each wealth percentile.

Will you please give me some suggestions on how to solve this?

Thank you for your time!
Best,
claire
Tags: None

Edmund Schuster

Join Date: Jan 2018
Posts: 4

13 Feb 2018, 05:17

I am sure there are much more elegant solutions, but I would do

Code:

preservekeep person_ID wealth year
duplicates drop
bys year: egen wealth_percentile = pctile(wealth) //etc - don't know how you specify this
keep person_ID year wealth_percentile
save "wealth_pctile.dta"  restore
  
merge m:1 using "wealth_pctile.dta"

this I think should solve problems 1 and 2, or am I missing something?

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35734
#3

13 Feb 2018, 06:32

I'd rather see a concrete example based on using dataex (FAQ Advice #12), but starting with the idea that you have variables

person_ID

year

percentile, i.e. which bin

then this may help:

Code:

egen tag = tag(person_id percentile year) egen ndistinct = total(tag), by(percentile year)

I advise strongly against the word "unique". Unique values occur just once, but the problem is that that is often not the case.

This paper from 2008

SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q4/08 SJ 8(4):557--568
shows how to answer questions about distinct observations
from first principles; provides a convenience command

is a discussion of technique in this territory Accessible at http://www.stata-journal.com/sjpdf.h...iclenum=dm0042
Comment
Claire Lyng

Join Date: Apr 2016

Posts: 12
#4

13 Feb 2018, 07:07

Thank you Nick! Right, distinct id not unique id. Thank you for the correction!
(egen tag) is a better way, instead of (by percentile year: gen first = _n==1) and then sum(first) over each percentile and year.
In terms of how to create wealth percentiles when there are duplicates by Person_ID and year, I will collapse data into one observation per person-year.

Best, Claire
Comment

Announcement

Count number of unique ids by wealth percentile

Comment

Comment

Comment