Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a rank variable

    mpg rank
    12 1
    12 1
    14 2
    14 2
    14 2
    15 3
    15 3
    16 4
    17 5
    I want to create a rank like the one in the table above. I tried using the default rank command. None of the commands gives me the above ranking.

  • #2
    I guess you're referring to the rank functionality of egen. You're correct that egen doesn't regard this numbering as a kind of ranking. The essence of ranking is to count how many observations have higher or lower values, and this method fails at showing that except in some limiting cases.

    That said, you should consider

    Code:
    egen rank = group(mpg)
    as a way to get what you want.

    Comment


    • #3
      This is exactly what I need! Thanks a lot!

      Comment


      • #4
        dear all,


        I should create a rank variable using data currently in the long format:

        Pid; rank; country
        1 1 NZ
        1 2 AU
        1 3 CA
        1 4 US
        1 5 TH
        2 1 US
        2 2 GB
        2 3 IT
        2 4 DE
        2 5 GR
        3 1 DO
        3 2 AW
        3 3 AU
        3 4 ES
        3 5 CH
        4 1 ES
        4 2 GB
        4 3 US
        4 4 TH
        4 5 CK
        5 1 CA
        5 2 JP
        5 3 AU
        5 4 GB
        5 5 DK
        6 1 AL
        6 2 DE

        for every individual, I would obtain an ordered variable (from most preferred, value 1 in the rank variable, to least preferred, value 5 in the rank variable) of the destinations according to the ranking. Before reshaping I've tried several alternatives. Including commands introduced by Nick. But none is working for this specific case, I can't figure out how to do this and the clock is ticking...can anyone help?

        Thank you in advance!

        Comment


        • #5
          Sorry, but I don't,understand what you're asking in #4. Your data example seems to show a variable that already is the rank. So, what is to be calculated.

          I've tried several alternatives. Including commands introduced by Nick. But none is working for this specific case I can't figure out how to do this and the clock is ticking
          There is nothing there to comment on. This is just "My code is not working". Sympathy is natural, but a reply with substance is impossible.

          Comment


          • #6
            thank you for this Nick, I wanted to be synthetic and not generic but I did not succeed!


            Each individual in my sample has answered a question about a preferred destination among the countries in the world.
            Each Pid ranked 5 countries among all possible destinations:
            1st most preferred, 2nd most preferred, 3rd most preferred, and so on.
            Now I have a dataset that for each individual has this information in this form:

            Click image for larger version

Name:	Screenshot 2021-10-18 at 21.10.46.png
Views:	1
Size:	19.2 KB
ID:	1632300


            I need a variable ordering the preferences in a rank the destination preferences for each individual. Hence an ordered variable Xi that takes 5 values. So as instance should be for Pid =1
            X1 = (1 = UK, 2=USA, 3 =NZ, 4=IT, 5=AU)

            In the post above I reported my data in long format. In fact unable to find a solution, I've done the following:

            keep Pid first_prefered_world second_prefered_world third_prefered_world fourth_prefered_world fifth_prefered_world
            rename first_prefered_world q0_1
            rename second_prefered_world q0_2
            rename third_prefered_world q0_3
            rename fourth_prefered_world q0_4
            rename fifth_prefered_world q0_5

            sort Pid
            reshape long q0_, i(Pid) j(rank)

            rename q0_ country


            Click image for larger version

Name:	Screenshot 2021-10-18 at 21.10.59.png
Views:	1
Size:	19.0 KB
ID:	1632301


            However, now I'm still unable to associate the values of the country answer with their respective rank position in the ordered preference of each individual. I still don't know how to obtain, eg for Pid =1 X1 = (1 = UK, 2=USA, 3 =NZ, 4=IT, 5=AU)

            I hope this is clear and some of you can help me!

            Comment


            • #7
              to follow up wrt the post above (#6), I would need a variable that for Pid =1 is valued X1 = (1 = NZ, 2=AU, 3 =CA 4=US, 5=TH), for Pid=2 X2 = (1 = US, 2=GB, 3 =IT, 4=DE, 5=GR), Pid=3 X3 = (1 = DO, 2=AW, 3 =AU, 4=ES, 5=CH), and so on.
              I have tried different methods but I can't manage to figure it out, I hope is not under my eyes and I couldn't see it.
              Thank you!

              Matilde

              Comment


              • #8
                This is still very obscure, and without a clear explanation using words (not code), you are unlikely to get any substantive help.

                You appear to already have data in long format which contains each person's 5 most preferred destinations which are already ranked 1 to 5. Then you talk about a reshape to wide format, which itself is straightforward, and also preserves ranks. So, please provide a clear description of what you have (if different) and what you want.

                Comment


                • #9
                  I don't understand how what you want differs from what you have in the variable -rank- you created in #6. It seems to me that variable, which you created yourself, is exactly what you are asking for.

                  Added: Crossed with #8, which expresses the same puzzlement.

                  Comment


                  • #10
                    thank you Clyde and Leonardo for trying to help me,

                    I need a variable that for each individual takes 5 values, corresponding to the 5 countries that represent their preferred destination.
                    So it does not have to be just the 'rank' variable like in the long format in #6.

                    I need to associate the rank to the value of the string variable 'country'. So if '' i '' has replied that the US is her first preferred destination, GB the second ... and GR (Greece) is her fifth preferred one,
                    I need a variable in which Xi = (US =1 , GB =2 , IT= 3, DE=4, GR=5), and not only X=( 1, 2, 3, 4, 5). I need the variable 'country' to be the one ranked for each individual.

                    I hope this is clearer

                    Comment


                    • #11
                      But that's what you already have in the starting variables 1stMostPreferred through 5thMost Preferred, right? What am I missing?

                      Comment


                      • #12
                        I'm still puzzled. Perhaps you want this instead? There is one observation per person, and one variable for each country. The values indicate the person's rank preference for that country (or missing if not ranked).

                        Code:
                        . reshape wide rank , i(pid) j(country) string
                        . list pid rankUS rankGB rankIT rankDE rankGR, sep(0)
                        
                             +--------------------------------------------------+
                             | pid   rankUS   rankGB   rankIT   rankDE   rankGR |
                             |--------------------------------------------------|
                          1. |   1        4        .        .        .        . |
                          2. |   2        1        2        3        4        5 |
                          3. |   3        .        .        .        .        . |
                          4. |   4        3        2        .        .        . |
                          5. |   5        .        4        .        .        . |
                          6. |   6        .        .        .        2        . |
                             +--------------------------------------------------+
                        This is some approach, but hardly useful for subsequent analysis compared to the long-format dataset you have, or an alternative reshape, as below.

                        Code:
                        . reshape wide country, i(pid) j(rank)
                        
                             +------------------------------------------------------------+
                             | pid   country1   country2   country3   country4   country5 |
                             |------------------------------------------------------------|
                          1. |   1         NZ         AU         CA         US         TH |
                          2. |   2         US         GB         IT         DE         GR |
                          3. |   3         DO         AW         AU         ES         CH |
                          4. |   4         ES         GB         US         TH         CK |
                          5. |   5         CA         JP         AU         GB         DK |
                          6. |   6         AL         DE                                  |
                             +------------------------------------------------------------+

                        Comment


                        • #13
                          hi guys,

                          I also need some help with the rank variable. I am trying to create two variables (rank and rank percentiles) and I'd like to group it by month_year. I copy something I have created to test my variable.

                          I managed to create the rank variable with the code egen rank = rank(data), by(month_year), but I was wondering whether:
                          1) there is a way to calculate it based on the last 30 moving days (in the past)?
                          2) how to create a percentile rank

                          Thank you,

                          Cristiano

                          ----------------------- copy starting from the next line -----------------------
                          Code:
                          * Example generated by -dataex-. For more info, type help dataex
                          clear
                          input float(ID data) str11 date float Date str6 month_year float rank
                          1 50 "1 Jan 2020"  21915 "1_2020" 1
                          2 40 "4 Feb 2020"  21949 "2_2020" 1
                          3 70 "7 Jan 2020"  21921 "1_2020" 2
                          4 55 "15 Feb 2020" 21960 "2_2020" 2
                          5 65 "17 Feb 2020" 21962 "2_2020" 3
                          6 80 "18 Jan 2020" 21932 "1_2020" 3
                          end
                          format %td Date
                          ------------------ copy up to and including the previous line ------------------
                          Last edited by Cristiano Bellavitis; 12 Dec 2022, 07:33.

                          Comment


                          • #14
                            #13 asks two questions I don't understand.

                            1) Each observation might have several percentiles depending on which window you are talking about. Consider a simplified example in which observations have values 1 up at times 1 up. Then observation 7 for example holds a value with rank 7 for a window of length 7 or more starting at 1, rank 6 for such a window starting at 2, rank 5 for such a window starting at 3, and so on. So how you do want to hold results?

                            2) seems to raise the same query.

                            Disjoint windows are fine, but you already know how to deal with those.

                            Comment


                            • #15
                              Originally posted by Nick Cox View Post
                              #13 asks two questions I don't understand.

                              1) Each observation might have several percentiles depending on which window you are talking about. Consider a simplified example in which observations have values 1 up at times 1 up. Then observation 7 for example holds a value with rank 7 for a window of length 7 or more starting at 1, rank 6 for such a window starting at 2, rank 5 for such a window starting at 3, and so on. So how you do want to hold results?

                              2) seems to raise the same query.

                              Disjoint windows are fine, but you already know how to deal with those.
                              Hi Nick,
                              Thank you for your prompt reply. My ideal rank is based on the date.

                              For example, the first observation has January 1st 2020. I'd like to rank that observation against all the other observations occuring in the prior 30 days, which would be December (no observation here). Observation 6, would be ranked against observation 3 and 1 since they happen in the prior 30 days. And since the value of that observation is 80, it would be the top. It is a moving window.

                              I hope this clarifies. Below I copy a simplified version.

                              ----------------------- copy starting from the next line -----------------------
                              Code:
                              * Example generated by -dataex-. For more info, type help dataex
                              clear
                              input float(ID data) str11 date
                              1 50 "1 Jan 2020"
                              2 40 "4 Feb 2020"
                              3 70 "7 Jan 2020"
                              4 55 "15 Feb 2020"
                              5 65 "17 Feb 2020"
                              6 80 "18 Jan 2020"
                              end
                              ------------------ copy up to and including the previous line ------------------
                              Last edited by Cristiano Bellavitis; 12 Dec 2022, 10:44.

                              Comment

                              Working...
                              X