Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate the Top 10% with multiple depending variables?

    Dear members,

    I have a dataset with the variables mgrno, rdate, cusip,value & sum_value for different investments.

    In the data excerpt below you can see a mgrno with different cusips for the date 31dec1998. Each cusip has a different value. sum_value is the sum of ALL values with the same mgrno and rdate.

    Code:
    mgrno rdate cusip     value sum_value
    110 14244 "84454480"     .58 155126096
    110 14244 "00950F20"    1.26 155126096
    110 14244 "04216660"   10.64 155126096
    110 14244 "28367785"   43.75 155126096
    110 14244 "75120310"     112 155126096
    110 14244 "15116710"     170 155126096
    110 14244 "04362810"  280.44 155126096
    110 14244 "45766C10"     550 155126096
    110 14244 "04516H10"     553 155126096
    110 14244 "57629210"  839.16 155126096
    110 14244 "35671310" 1134.63 155126096
    110 14244 "21254010"    1190 155126096
    110 14244 "92904N10"    1209 155126096
    110 14244 "42786610" 1492.56 155126096
    110 14244 "37733W10"    2085 155126096
    110 14244 "84474110"    2269 155126096
    110 14244 "90483210"    2440 155126096
    110 14244 "23282010" 3888.75 155126096
    110 14244 "G2706W10"    4165 155126096
    110 14244 "58503X10" 4278.75 155126096
    end
    format %td rdate
    My goal is now to determine all mgrno whose value in the cusip is in the top 10% of the sum_value.

    I hope some of you could assist me with this issue! Thanks in advance!

    Best regards
    Bam
    Last edited by Bam Bovitzki; 24 Jan 2023, 08:49.

  • #2
    Code:
    help egen
    to learn about
    Code:
     pctile(90)

    Comment


    • #3
      I actually tried it with this and used the code:
      Code:
      bysort mgrno rdate value : egen pc90_value = pctile(sum_value) ,p(90)
      But it does not work the way I would like it to.. the value is equal to sum_value

      Comment


      • #4
        Too many variables in by:.

        Comment


        • #5
          So I assume that the following code is correct?

          Code:
          bysort rdate sum_value : egen pc90_value = pctile(value) ,p(90)
          
          gen top10=(value>pc90_value)
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input double mgrno long rdate str12 cusip float(value sum_value pc90_value top10)
          110 14244 "84454480"     .58 155126096 1929096 0
          110 14244 "00950F20"    1.26 155126096 1929096 0
          110 14244 "04216660"   10.64 155126096 1929096 0
          110 14244 "28367785"   43.75 155126096 1929096 0
          110 14244 "75120310"     112 155126096 1929096 0
          110 14244 "15116710"     170 155126096 1929096 0
          110 14244 "04362810"  280.44 155126096 1929096 0
          110 14244 "45766C10"     550 155126096 1929096 0
          110 14244 "04516H10"     553 155126096 1929096 0
          110 14244 "57629210"  839.16 155126096 1929096 0
          110 14244 "35671310" 1134.63 155126096 1929096 0
          110 14244 "21254010"    1190 155126096 1929096 0
          110 14244 "92904N10"    1209 155126096 1929096 0
          110 14244 "42786610" 1492.56 155126096 1929096 0
          110 14244 "37733W10"    2085 155126096 1929096 0
          110 14244 "84474110"    2269 155126096 1929096 0
          110 14244 "90483210"    2440 155126096 1929096 0
          110 14244 "23282010" 3888.75 155126096 1929096 0
          110 14244 "G2706W10"    4165 155126096 1929096 0
          110 14244 "58503X10" 4278.75 155126096 1929096 0
          end
          format %td rdate

          Comment


          • #6
            The code in #9 calculates the 90th percentile for groups jointly defined by rdate and sum_value.

            Note that missing counts as arbitrarily high, so missing values would get recorded as being in the top 10%.

            Code:
             
             gen top10=(value>pc90_value) if value < .   
            would map missings to missings.

            Comment

            Working...
            X