Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding second highest value across variables

    Hello,


    heres hoping this is not super basic but im severely stuck.

    I want create a dummy that indicates whether the variable v1 has the second highest/n highest value among the variables v1, v2, v3 and v4 within the same observation. There are possibly ties between v1, v2 and v3.


    Thus far i tried rank in egen and rowsort but they seem not to do what i want to achieve. Where do i go from here?


    Thank you so much for your time!
    Last edited by He Krau; 02 Jul 2023, 11:58.

  • #2
    How do you want to handle ties? If v1 is tied with one of the others for first place, should we consider v1 to be first place or second? What if v1 is tied with one of the others for second place?

    Comment


    • #3
      If tied with first place the dummy should be 0. If tied with second place the dummy should be 1.

      Thank you very much for your reply!

      Comment


      • #4
        OK. As you did not provide example data, I have made a toy data set to demonstrate the approach.
        Code:
        //    CREATE A DEMONSTRATION DATA SET
        clear*
        set obs 20
        set seed 314159
        forvalues i = 1/4 {
            gen v`i' = runiformint(1, 5)
        }
        
        //    DEMONSTRATE THE APPROACH
        gen `c(obs_t)' obs_no = _n
        reshape long v, i(obs_no) j(index)
        gsort obs_no v -index
        by obs_no, sort: gen byte wanted = (index[3] == 1) & (`v'[3] != `v'[4])
        reshape wide
        The code also assumes, but does not verify that the variables v1 through v4 never contain missing values.

        In the future, when asking for help with code, please use the -dataex- command and show example data. Although sometimes, as here, it is possible to give an answer that has a reasonable probability of being correct, this is usually not the case. Moreover, such answers are necessarily based on experience-based guesses or intuitions about the nature of your data. When those guesses are wrong, both you and the person trying to help you have wasted their time as you end up with useless code. To avoid this, a -dataex- based example provides all of the information needed to develop and test a solution.

        If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Last edited by Clyde Schechter; 02 Jul 2023, 12:41.

        Comment


        • #5
          You can count how many of v2 v3 v4 are higher than v1. The answer is 1 if and only if v1 ranks 2nd, regardless of whether v1 ties with any other variable.

          Code:
          gen higher = 0
          
          foreach v in v2 v3 v4 {
                replace higher = higher + (`v' > v1)
          }
          
          gen is_second = higher == 1
          Last edited by Nick Cox; 02 Jul 2023, 14:22.

          Comment


          • #6
            Thank you so much, your intuitions are quite on point!
            -dataex- ist noted for next time if it should come to that.

            Comment

            Working...
            X