Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count number of unique ids by wealth percentile

    Dear Statalist,

    I want to count the number of unique individuals who are enrolled in multiple institutions at a specific year by wealth percentile. A concrete example:
    My data is from 2009 to 2015. unique identifier of my panel is (personal id, institution id, year). Some people are enrolled in multiple institutions at a particular year, so there will be duplicates by (personal id, year). I want to plot the number of unique ids who are enrolled in multiple institutions against wealth distribution in year 2009.

    how should I think of this problem? I am thinking to solve it in the following way:

    1) generate wealth percentile for 2009. same individual have the same wealth in the same year, but if he is enrolled in multiple institutions, he will have several observations. in this case, he is counted several times. the wealth distribution will not be correctly calculated.
    2) I want to generate a varaible that tells me which wealth distribution that person is in for the year 2009. then i can calculate number of unique ids grouped by each wealth percentile.

    Will you please give me some suggestions on how to solve this?

    Thank you for your time!
    Best,
    claire

  • #2
    I am sure there are much more elegant solutions, but I would do
    Code:
    preserve
    keep person_ID wealth year duplicates drop bys year: egen wealth_percentile = pctile(wealth) //etc - don't know how you specify this keep person_ID year wealth_percentile save "wealth_pctile.dta"
    restore merge m:1 using "wealth_pctile.dta"
    this I think should solve problems 1 and 2, or am I missing something?

    Comment


    • #3
      I'd rather see a concrete example based on using dataex (FAQ Advice #12), but starting with the idea that you have variables

      person_ID

      year

      percentile, i.e. which bin

      then this may help:

      Code:
      egen tag = tag(person_id percentile year) 
      egen ndistinct = total(tag), by(percentile year)
      I advise strongly against the word "unique". Unique values occur just once, but the problem is that that is often not the case.

      This paper from 2008

      SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
      (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
      Q4/08 SJ 8(4):557--568
      shows how to answer questions about distinct observations
      from first principles; provides a convenience command

      is a discussion of technique in this territory Accessible at http://www.stata-journal.com/sjpdf.h...iclenum=dm0042


      Comment


      • #4
        Thank you Nick! Right, distinct id not unique id. Thank you for the correction!
        (egen tag) is a better way, instead of (by percentile year: gen first = _n==1) and then sum(first) over each percentile and year.
        In terms of how to create wealth percentiles when there are duplicates by Person_ID and year, I will collapse data into one observation per person-year.

        Best, Claire

        Comment

        Working...
        X