Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summarize data in a way where I calculate max (min) of all variables but BASED ON AN AVERAGE OF THE TOP (BOTTOM) 5 INDIVIDUALS?

    Hi,

    I use administrative data, where I cannot show the minimum or maximum incomes (and other variables), as this is secret data. Still, I want to find an alternative "maximum" and "minimum".

    How can I summarize (like the "sum" code in STATA) my data in a way where I calculate the maximum (minimum) of all variables but BASED ON AN AVERAGE OF THE TOP (BOTTOM) 5 INDIVIDUALS?

    Normally, the maximum is simply the one maximum value, but I need this as an average of the top 5 individuals.

    Furthermore, my data is panel data, where I observe each individual in a 10-year window. Therefore the top 5 need to be grouped by individuals (and not just the top 5 maximum rows/observations).

    Example: What is the mean income of the top 5 persons with the highest income?

    Many thanks.

  • #2
    You can write a loop along the following lines:

    Code:
    frame put personid income, into(income)
    frame income{
         collapse income, by(personid)
         sort income
         *MIN 
         sum income in 1/5
         *MAX
         sum income in -5/l
    }
    frame drop income
    However, note that with panel data, you are sort of calculating the max of the mean of a variable. The mean may be highly influenced by missing data (or varying sample periods across individuals). If you just want to consider the maximum observed income per individual, then change the second line to

    Code:
    collapse (max) income, by(personid)
    but the maximum income may not be representative of an individual's income over the sample period.

    Comment


    • #3
      Also Google something like

      Stata Journal largest five

      to find a paper dedicated to this topic.

      Comment


      • #4
        https://journals.sagepub.com/doi/pdf...6867X221106436 is a link. Despite 2022 publication, the paper is open access.

        Comment


        • #5
          Turning to the specific problem, I think it is easier than might be feared.

          Code:
          egen rank = rank(-income), unique by(year) 
          
          egen wanted = mean(cond(rank <= 5, income, .)), by(year)
          Note the minus sign!

          Comment


          • #6
            Thanks a lot.
            Last edited by Natasha Drud Bendsen; 20 Oct 2022, 08:03.

            Comment

            Working...
            X