Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repetitive, non-overlapping ranking with a unique identifier

    I have the data that you see on the attachment. They are ranked as: sort firm day

    I want to generate a new variable that will rank the variable SM by firm, every 21 days. Then it will go to the next 21 days. So the new variable should take values from 1 to 21.

    As a first step I tried the command: egen rank21 = rank(SM), by(firm)
    which ranks SM by firm for all days, so that goes from 1 to 7045 in my case.

    I probably need something with a forvalues loop. I also need to rank them in descending order, but when I enter rank(-SM) stata doesn't recognize the rank() function.

    Thank you in advance,
    Dimitrios Chlorokostas
    Attached Files

  • #2
    Welcome to Statalist.

    Unfortunately, your picture of your data makes it impossible for the Statalist readers to develop and test recommended code.

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    It would be particularly helpful to post a small hand-made example, perhaps with just a few variables and observations, showing the data before the process and how you expect it to look after the process. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.

    When you say you wan to rank each firm every 21 days, you mean for example (looking at your picture) for days 3855 through 3875, then 3876 through 3896, etc? Are you assured that no days are omitted? Or if days are omitted, do you mean to rank with fewer days, or to extend the data so that exactly 21 days are used in every case (except perhaps the last)?

    Comment


    • #3
      Dear William,

      with regards to your question, the answer is yes. I am sure that no days are omitted.

      So, I simplify my example and I provide also a code example in dataex.

      Let's say that I have 10 firms. And that I have the variable "day" that goes from 1 to 7045 for each firm. No days are omitted. in the third column I have a variable called "SM".

      Now, what I want to do is: for each firm separately, every 5 discrete days (note that i simplified my example as I said before), to order the variable SM for these 5 days. Then go to the next 5 days and do the same process. So, the new variable "rank" that will be generated for the ranking should take values from 1 to 5. The ranking should be also in descending order.
      Below I have the variable "rank" to show I want it to look like but it is not on my original dataset. I also want to note that the variable "SM" could be missing. Then I would like it to be in the end or the ranking after zeros.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str16 firm float(day SM rank)
      "CY00005402171D_w" 3721  .8644508 3
      "CY00005402171D_w" 3722  .8903404 2
      "CY00005402171D_w" 3723  .9000608 1
      "CY00005402171D_w" 3724   .848883 4
      "CY00005402171D_w" 3725         0 5
      "CY00005402171D_w" 3726         0 4
      "CY00005402171D_w" 3727         0 5
      "CY00005402171D_w" 3728  .9370794 1
      "CY00005402171D_w" 3729   .887111 2
      "CY00005402171D_w" 3730  .7468633 3
      "CY00005402171D_w" 3731  .9019691 2
      "CY00005402171D_w" 3732  .6755079 5
      "CY00005402171D_w" 3733  .9137996 1
      "CY00005402171D_w" 3734  .7576907 4
      "CY00005402171D_w" 3735  .8041553 3
      "CY00005402171D_w" 3736  .7088608 2
      "CY00005402171D_w" 3737  .5023799 3
      "CY00005402171D_w" 3738  .9476869 1
      "CY00005402171D_w" 3739  .3625277 5
      "CY00005402171D_w" 3740 .41831055 4
      end
      Thank you for you valuable help,
      Dimitris Chlorokostas

      Comment


      • #4
        Thank you for the simplified data, including your expected results, presented with dataex. This makes the objective clear and the task simple. I added to your sample one more set of 5, but with one observation of SM missing, to work out the logic to handle that. In presenting the results below, you will notice that in the set with the tie, the tie was broken in a different way than you broke the tie. I do not see a straightforward way of producing a tiebreak that is other than random. And if the order matters (for instance, if you wanted the earlier day to determine the order of the tied observations) then I would suggest removing the "unique" option, so that all tied observations will receive the same rank, and then sort by rank and day, and then adjusting the rank on the two observations. But that is a detail, perhaps what I present below will be enough to set you on the path to meet your needs.
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str16 firm float(day SM rank)
        "CY00005402171D_w" 3721  .8644508 3
        "CY00005402171D_w" 3722  .8903404 2
        "CY00005402171D_w" 3723  .9000608 1
        "CY00005402171D_w" 3724   .848883 4
        "CY00005402171D_w" 3725         0 5
        "CY00005402171D_w" 3726         0 4
        "CY00005402171D_w" 3727         0 5
        "CY00005402171D_w" 3728  .9370794 1
        "CY00005402171D_w" 3729   .887111 2
        "CY00005402171D_w" 3730  .7468633 3
        "CY00005402171D_w" 3731  .9019691 2
        "CY00005402171D_w" 3732  .6755079 5
        "CY00005402171D_w" 3733  .9137996 1
        "CY00005402171D_w" 3734  .7576907 4
        "CY00005402171D_w" 3735  .8041553 3
        "CY00005402171D_w" 3736  .7088608 2
        "CY00005402171D_w" 3737  .5023799 3
        "CY00005402171D_w" 3738  .9476869 1
        "CY00005402171D_w" 3739  .3625277 5
        "CY00005402171D_w" 3740 .41831055 4
        "missingval" 3736  .7088608 2
        "missingval" 3737         . .
        "missingval" 3738  .9476869 1
        "missingval" 3739  .3625277 4
        "missingval" 3740 .41831055 3
        end
        generate day5 = floor((day-1)/5)
        bysort firm day5: egen rank2 = rank(cond(missing(SM),99,-SM)), unique
        Code:
        . list, noobs sepby(firm day5)
        
          +----------------------------------------------------------+
          |             firm    day         SM   rank   day5   rank2 |
          |----------------------------------------------------------|
          | CY00005402171D_w   3721   .8644508      3    744       3 |
          | CY00005402171D_w   3722   .8903404      2    744       2 |
          | CY00005402171D_w   3723   .9000608      1    744       1 |
          | CY00005402171D_w   3724    .848883      4    744       4 |
          | CY00005402171D_w   3725          0      5    744       5 |
          |----------------------------------------------------------|
          | CY00005402171D_w   3726          0      4    745       5 |
          | CY00005402171D_w   3727          0      5    745       4 |
          | CY00005402171D_w   3728   .9370794      1    745       1 |
          | CY00005402171D_w   3729    .887111      2    745       2 |
          | CY00005402171D_w   3730   .7468633      3    745       3 |
          |----------------------------------------------------------|
          | CY00005402171D_w   3731   .9019691      2    746       2 |
          | CY00005402171D_w   3732   .6755079      5    746       5 |
          | CY00005402171D_w   3733   .9137996      1    746       1 |
          | CY00005402171D_w   3734   .7576907      4    746       4 |
          | CY00005402171D_w   3735   .8041553      3    746       3 |
          |----------------------------------------------------------|
          | CY00005402171D_w   3736   .7088608      2    747       2 |
          | CY00005402171D_w   3737   .5023799      3    747       3 |
          | CY00005402171D_w   3738   .9476869      1    747       1 |
          | CY00005402171D_w   3739   .3625277      5    747       5 |
          | CY00005402171D_w   3740   .4183106      4    747       4 |
          |----------------------------------------------------------|
          |       missingval   3736   .7088608      2    747       2 |
          |       missingval   3737          .      .    747       5 |
          |       missingval   3738   .9476869      1    747       1 |
          |       missingval   3739   .3625277      4    747       4 |
          |       missingval   3740   .4183106      3    747       3 |
          +----------------------------------------------------------+

        Comment


        • #5
          Thank you very much for your help William. These 2 commands solved my problem perfectly.

          Comment

          Working...
          X