Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with finding an average based on condition

    Dear All

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(var1 var2 var3) float reqd_answer
     5  6  9 7.5
     8 -4 34  21
    -3  5  0 2.5
     .  .  .   .
     .  6  .   3
     3  4  1 3.5
    end
    In the above toy example, I need to find out the average of highest two values and divide it by 2 (since best two is considered, if 'n', then divide it by 'n'). I have given the intended answer as variable "reqd_answer". For instance in first row, higher values are 9 and 6 and hence their average is 7.5. In case of 6, others are blank , so assuming next best value is 0, the average (6+0)/2=3. I tried to find using conditional argument but in vain. How to proceed with this?

  • #2
    This starts as an easy problem, but gets more puzzling. To get the mean of the two highest values, take the sum, subtract the minimum, and divide by 2. But then your rules seem to say treat one missing value as 0, but all missing values as missing.

    You can have that, but unless it is imposed on you from on high, you may need to explain it later.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(var1 var2 var3) float reqd_answer
     5  6  9 7.5
     8 -4 34  21
    -3  5  0 2.5
     .  .  .   .
     .  6  .   3
     3  4  1 3.5
    end
    
    forval j = 1/3 { 
        gen VAR`j' = cond(var`j' == ., 0, var`j')
    }
    
    gen min = min(VAR1, VAR2, VAR3) 
    gen sum = VAR1 + VAR2 + VAR3 
    
    gen wanted = (sum - min) / 2 if (mi(var1) + mi(var2) + mi(var3)) < 3 
    
    list, sep(0)
    
         +-------------------------------------------------------------------------+
         | var1   var2   var3   reqd_a~r   VAR1   VAR2   VAR3   min   sum   wanted |
         |-------------------------------------------------------------------------|
      1. |    5      6      9        7.5      5      6      9     5    20      7.5 |
      2. |    8     -4     34         21      8     -4     34    -4    38       21 |
      3. |   -3      5      0        2.5     -3      5      0    -3     2      2.5 |
      4. |    .      .      .          .      0      0      0     0     0        . |
      5. |    .      6      .          3      0      6      0     0     6        3 |
      6. |    3      4      1        3.5      3      4      1     1     8      3.5 |
         +-------------------------------------------------------------------------+
    
    .

    Comment


    • #3
      Dear Nick Cox
      Thanks for the help and only after reading your post, I realized, yes the way I defined makes it complex. Reported values are, say income and I assumed that missing can be treated as 0. Yes in that case, if all are missing, average should be 0. But the codes you suggested can help me to intelligently. To be brutally honest, I didn't have an iota of the brain to think
      To get the mean of the two highest values, take the sum, subtract the minimum, and divide by 2
      . That itself is a revelation and special thanks for it.

      Comment


      • #4
        Even more delightful is the rule that the median of three numbers is their sum MINUS their minimum MINUS their maximum. Until I found this in Stata code, it never occurred to me that calculating the median had anything to do with addition and subtraction, but in that case it is true. (Missing values still need care.)

        If missings really mean 0, at least for this purpose, then this is an option:


        Code:
        clear
        input byte(var1 var2 var3) float reqd_answer
         5  6  9 7.5
         8 -4 34  21
        -3  5  0 2.5
         .  .  .   .
         .  6  .   3
         3  4  1 3.5
        end
        
        mvencode var*, mv(0) override 
        
        gen wanted = ((var1 + var2 + var3) - min(var1, var2, var3)) / 2

        Comment


        • #5
          the rule that the median of three numbers is their sum MINUS their minimum MINUS their maximum.
          It seems to me that I am light-years far away from the basics, but whether Stata's user manual has examples with such tips or not, but ya not sure whether will remember this tricks when required.

          Comment


          • #6
            That's not in [U]. It was in code distributed with Larry Hamilton's Stata book first published in 1990 that never got into official Stata so far as I can recall.

            But yes, there is stuff that was routine to me at age 16 that has disappeared from my repertoire on the principle of "use it or lose it". It would be less mysterious if and when I found it again.

            Comment

            Working...
            X