Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to keep 5 observations per id?

    Hi all,

    I've got a problem with reformatting my dataset. I've got multiple observations of the percentage of shares held by an institution (minimum 20) per id (Cusip in my case).

    I would like to keep the top 5 of those observations (percentages held by institution) per Cusip and delete all the other observations for that cusip. But I can't find anywhere the command for this.

    Is there someone who could help me with this matter?

    Thanks in advance,

    Geert

  • #2
    Hi Geert,

    please have a look at the FAQ, especially section 12.2 regarding data examples, in order to get quick answers to your questions.
    Without seeing your data (or a sample from it), it may be very hard to understand some data questions, and find appropriate answers to it.

    That said, I suspect something like this could help:
    Code:
    * input example data
    clear
    input cusip share
    1 0.1
    1 0.2
    1 0.3
    1 0.4
    1 0.5
    1 0.6
    1 0.7
    1 0.8
    1 0.9
    1 1.0
    2 0.1
    2 0.2
    2 0.3
    2 0.4
    2 0.5
    2 0.6
    2 0.7
    2 0.8
    2 0.9
    2 1.0
    end
    * only keep highest 5 observations per cusip
    bysort cusip (share) : drop if _n<=_N-5
    Regards
    Bela

    Comment


    • #3
      Geert.
      I do share Bela's advice about posting an example/excerpt of your data via -dataex- (see -search dataex- to install).
      From you post, it is not clear whether you consider keeping missing values, too (as Stata ranks missing values as the highest one; hence they were automatically included among the top five percentages per id).
      If you would get rid of missing values, you can tweak Bela's code this way:
      Code:
      bysort cusip (share) : drop if _n<=_N-5 | share==.
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        I agree with Carlo's concern about missing values, but I fear that his approach will include the missing values in the count of the top 5 shares. The following would avoid that by dropping all observations with share missing first, then choosing the top 5 of the remaining observations for each cusip.
        Code:
        drop if share==.
        bysort cusip (share) : drop if _n<=_N-5

        Comment


        • #5
          Thanks everyone for the fast response! Sorry I am new here, so my next post will be with an example from -dataex-.

          Already dropped all the missing variables before the calculation, therefore it isnt neccessary.

          Comment


          • #6
            Quick follow-up question.
            This is an example from dataex of my sample.

            I would like to add up all the perc2012 for the same cusip. As I looked on other topics, they used the collapse command. Which would be collapse (sum) perc2012, by (cusip) in this case. However, as I do that I'm losing my variable indcode which I still need. Is there a solution in Stata for this, or do I have to download the data again and merge it with the ''new'' file?
            Attached Files

            Comment


            • #7
              To post dataex output, follow the instructions to select and copy from your Stata Results window the lines starting with the line after "copy starting from the next line" and ending with the line before "copy up to and including the previous line", thus

              [CODE]
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str18 make int price
              "AMC Concord" 4099
              "AMC Pacer" 4749
              "AMC Spirit" 3799
              "Buick Century" 4816
              "Buick Electra" 7827
              "Buick LeSabre" 5788
              "Buick Opel" 4453
              "Buick Regal" 5189
              "Buick Riviera" 10372
              "Buick Skylark" 4082
              end
              [/CODE]

              and then paste these lines into your post, where they will appear as
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str18 make int price
              "AMC Concord"    4099
              "AMC Pacer"      4749
              "AMC Spirit"     3799
              "Buick Century"  4816
              "Buick Electra"  7827
              "Buick LeSabre"  5788
              "Buick Opel"     4453
              "Buick Regal"    5189
              "Buick Riviera" 10372
              "Buick Skylark"  4082
              end
              from which they can be copied and pasted into Stata to develop and test an answer to your question.

              Please try again.

              Comment


              • #8
                Without data to test on, I cannot be sure this will work for you, but I think
                Code:
                collapse (max) indcode (sum) perc2012, by (cusip)
                will work for you, as long as you want the result to be just one line per cusip, and assuming the indcode is the same in all observations with the same cusip. If you would like to preserve your data as it is and just add a new variable with the total, something like
                Code:
                by cusip, sort: egen sum2012 = total(perc2012)
                might work.
                Last edited by William Lisowski; 29 Apr 2017, 14:11.

                Comment


                • #9
                  That worked! You're a genius. Thank you!

                  Comment

                  Working...
                  X