Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • max/min & three unq_id

    Hi statalist community,

    I need to generate two variables for my research.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(x_id y_id z_id) int(x y z)
    1 2 3 130 120 100
    1 2 3 100 120 120
    1 2 3 300 350 350
    1 2 3 300 300 310
    1 2 3 360 360  90
    1 2 3 250  60  30
    1 2 3 250  15  30
    1 2 3  35  25  35
    1 2 3 300 270 100
    1 2 3  41 225 225
    1 2 3 100  70 140
    1 2 3 300 300 330
    1 2 3 300  60  30
    1 2 3 330 300 350
    1 2 3 345 200 300
    1 2 3 200 200 300
    1 2 3 300  80 300
    1 2 3 100 300 100
    1 2 3  20 280 230
    1 2 3 340 200  50
    1 2 3  60 180 180
    1 2 3 145 130 195
    end
    1st variable needed

    ID_max=x_id if x is greater than y and z
    ID_max=y_id if y is greater than x and z
    ID_max=z_id if z is greater than x and y


    2nd variable needed

    ID_min=x_id if x is less than y and z
    ID_min=y_id if y is less than x and z
    ID_min=z_id if z is less than x and y


    may anyone please help?

    regards,
    ajay
    Last edited by ajay pasi; 22 Dec 2022, 12:36.

  • #2
    Code:
    rename (x y z) =value
    gen long obs_no = _n
    reshape long @value @_id, i(obs_no) j(_j) string
    
    by obs_no (value), sort: gen id_max = _id[_N] if value[_N] > value[_N-1]
    by obs_no (value): gen id_min = _id[1] if value[1] < value[2]
    reshape wide
    rename *value *
    I have given you what you have asked for. I'm not sure it's what you want. In many observations, there is a tie for the min or max value among x, y, and z. If you have a tie for, say, the minimum value among them, then none of them is strictly less than both of the others. So there is no *_id that satisfies your criteria, and this code returns missing values in those instances.

    If you want to have some value for id_min and id_max even in the face of ties, then you need to specify how you want to break those ties.
    Last edited by Clyde Schechter; 22 Dec 2022, 12:59.

    Comment


    • #3
      sir, thanks for the help.

      I get the point.

      I have x' y' and z' corresponding to three variables x y and z.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(x_id y_id z_id) int(x y z x'  y'  z')
      1 2 3 130 120 100  4  5  3
      1 2 3 100 120 120 10  8  9
      1 2 3 300 350 350  3  3  2
      1 2 3 300 300 310  2  2  2
      1 2 3 360 360  90  8  8  4
      1 2 3 250  60  30  5 10 12
      1 2 3 250  15  30  5  2 12
      1 2 3  35  25  35  2  2  1
      1 2 3 300 270 100  8  4  5
      1 2 3  41 225 225  2  9  9
      1 2 3 100  70 140  4  2  3
      1 2 3 300 300 330  9  6  6
      1 2 3 300  60  30  6  4  4
      1 2 3 330 300 350  4  6  3
      1 2 3 345 200 300  9  3  2
      1 2 3 200 200 300  2  1  6
      1 2 3 300  80 300  4  2  4
      1 2 3 100 300 100  2  9  1
      1 2 3  20 280 230  2  2  4
      1 2 3 340 200  50  9  6  6
      1 2 3  60 180 180 10  8  7
      1 2 3 145 130 195  6  4  6
      end

      If tie while generating ID_min, then the minimum of x * x' or y * y' or z * z' ( where * is for multiplication) should decide value that ID_min takes (from x_id, y_id, and z_id).

      Similarly, if tie while generating ID_max, then the maximum of x * x' or y * y' or z * z' ( where * is for multiplication) should decide the value that ID_min takes.

      If the tie still remains ( which is likely), then ID_max should take the value 1 (i.e., x_id)

      regards,
      ajay
      Last edited by ajay pasi; 22 Dec 2022, 13:11.

      Comment


      • #4
        You definitely do not have variables x', y', and z' as those are not legal Stata variable names. And, consequently, your -dataex- does not run. NEVER edit -dataex- output to post here. If you want to use different variable names in your post from what you have in your real data set, that's fine. But then -rename- them in our data set before running -dataex-.

        I have changed your ancillary variable names to x2, y2, and z2 for the purposes of this thread:
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear*
        input float(x_id y_id z_id) int(x y z x2  y2  z2)
        1 2 3 130 120 100  4  5  3
        1 2 3 100 120 120 10  8  9
        1 2 3 300 350 350  3  3  2
        1 2 3 300 300 310  2  2  2
        1 2 3 360 360  90  8  8  4
        1 2 3 250  60  30  5 10 12
        1 2 3 250  15  30  5  2 12
        1 2 3  35  25  35  2  2  1
        1 2 3 300 270 100  8  4  5
        1 2 3  41 225 225  2  9  9
        1 2 3 100  70 140  4  2  3
        1 2 3 300 300 330  9  6  6
        1 2 3 300  60  30  6  4  4
        1 2 3 330 300 350  4  6  3
        1 2 3 345 200 300  9  3  2
        1 2 3 200 200 300  2  1  6
        1 2 3 300  80 300  4  2  4
        1 2 3 100 300 100  2  9  1
        1 2 3  20 280 230  2  2  4
        1 2 3 340 200  50  9  6  6
        1 2 3  60 180 180 10  8  7
        1 2 3 145 130 195  6  4  6
        end
        
        rename (x y z) =value
        rename (x2 y2 z2) (x y z)
        rename (x y z) =value2
        gen long obs_no = _n
        reshape long @value @value2 @_id, i(obs_no) j(_j) string
        
        by obs_no (value), sort: gen id_max = _id[_N] if value[_N] > value[_N-1]
        by obs_no (value): gen id_min = _id[1] if value[1] < value[2]
        
        frame put _all if missing(id_max, id_min), into(ties)
        frame change ties
        gen tie_breaker = value*value2
        by obs_no (tie_breaker), sort: replace id_max = _id[_N] ///
            if tie_breaker[_N] > tie_breaker[_N-1] & missing(id_max)
        by obs_no (tie_breaker): replace id_min = _id[1] ///
            if tie_breaker[1] < tie_breaker[2] & missing(id_min)
        replace id_max = 1 if missing(id_max)
        
        frame change default
        frlink 1:1 obs_no _j, frame(ties)
        replace id_max = frval(ties, id_max) if !missing(ties)
        replace id_min = frval(ties, id_min) if !missing(ties)
        
        drop ties
        frame drop ties
        
        reshape wide
        rename *value *
        Note: You did not say what to do if there is still a tie for ID_min when taking into account the product of x and x2, so I have still left those missing, but clearly you just need to make a decision about that and add one more line to the code.

        Comment


        • #5
          https://journals.sagepub.com/doi/pdf...36867X20931007 is a more detailed discussion of the initial problem, but the nub of the matter is as explained by @Clyde Schechter: whenever there are ties the identifiers for which value is maximum or minimum are utterly moot. I doubt that they have any diagnostic value.

          Your code in ,#3 purports to be dataex output but it won't run because x' y' z' are not legal variable names (although the error message is different) .

          Please don't mess with dataex output until and unless you understand Stata better. (This is personal; I am one of the authors of dataex and need to object if I see people posting abuses in its name.)

          I haven't tried to answer #3 directly. You are most of the way to code with your definitions. Perhaps your rules make sense in some concrete context, but otherwise it seems that you are trying to save a dubious idea.

          Comment


          • #6
            sir, i understand the point. thanks for the help Clyde Schechter and Nick Cox.

            Comment

            Working...
            X