Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create 2 new variables taking the "maximum" value and tag the variable with maximum value

    Dear All

    Please assist to create

    1. A variable which will take the maximum value from three variables (
    sum_rough, sum_inst , sum_public)
    for each record. Data is longitudinal and has repeated observations "sum..." for each time period. So basically the new variable is repeated for each record in the same time period same as the sum variables shown below.

    2. Create another variable which will "tag" the name of the variable with the maximum obervations

    Example data shown below:

    input byte id float time float sum_rough float sum_inst float sum_public

    1 1 3 2 2

    1 1 3 2 2

    1 2 4 6 8

    1 2 4 6 8

    2 1 0 2 3

    2 2 3 6 8

    2 3 4 5 1



  • #2
    Code:
    egen maxvar = rowmax(sum_*)
    gen whichvar = ""
    foreach var of varlist sum_*{
        replace whichvar = "`var'" if maxvar == `var'
    }

    Comment


    • #3
      Both the question and @Ali Atia's helpful reply leave open what to do if two or more values tie for maximum.

      The maximum is the maximum, regardless, but which variable holds the maximum is more complicated.

      In practice, that may not bite. For discussion of what to do if it does, see https://www.stata-journal.com/articl...ticle=pr0046_1

      Comment


      • #4
        Thanks for the code-very useful

        Is there an easier way / syntax to handle situations where there is two or more values which tie for maximum.

        Comment


        • #5
          My second question which may help overcome the ties is: Is there a syntax where I can pick up the variable the "whichvar"- where the variable is highest is rank i.e. the numbers in the original variables represent the ranking order:

          e.g. original variables and ties are in the following:
          sum_1rough
          sum_2inst
          sum_3public

          then if the maxim value is a tie on the three I would select "sum_1rough" which is highest in the ranking order.

          Comment


          • #6
            This is covered by the paper cited in #3. What suffices here is to select the first maximum you see and ignore the others. Tweaking the nice code of @Ali Atia

            Code:
            egen maxvar = rowmax(sum_*)
            
            gen whichvar = ""
            
            foreach sffx in 1rough 2inst 3public {
                replace whichvar = "`sffx'" if maxvar == sum_`sffx' & whichvar == "" 
            }

            Comment


            • #7
              Thank you for this

              Comment

              Working...
              X