Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sum total of the best 4 scores across 6 different columns (variable)

    Hello STATA enthusiasts,
    I have a dataset comprising numbers (observations) from 8 distinct variables. I aim to accomplish two objectives: first, to compute two scores, ST1 and ST2. ST1 represents the sum total of the best 4 scores from v1 to v6, while ST2 is the sum total of ST1, SP1, and SP2. Second, I intend to verify if the highest 4 scores used in ST1 are all above 95. The challenge lies in selectively summing the best 4 marks across the 6 variables and checking whether they are all above 95. How can I address this challenge?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 Centre_Code int(v1 v2 v3 v4 v5 v6 SP1 SP2)
    "100001" 102 103 107 103 107 101 109 106
    "100001"  98  98  97 105  98 100  95  87
    "100001"  96  97  98 101 101  94  98  93
    "100001"  96  89 107  98  92 101 104 104
    "100001" 100  98 107 102  94 103 100 101
    "100001"  93 101  94  97  94  88  93 106
    "100001" 122 116 123 121 130 112 115 111
    "100001"  98 101 101 105 111  98 106 122
    "100001" 106  98 101 106 108 105 109 111
    "100001" 107 102  95 111 112 105 100  94
    "100001"  99 100 102  88  97  89  93  94
    "100001" 100   .  90 103  99  97  87 102
    "100001"  97 101  96  95  81  99  89 101
    "100001" 102  98 109 102 105 101  98 101
    "100001" 112 106 108 112 108 106 101  99
    "100001" 101  99 100  93  95  96 100 111
    "100001" 121 113 115 120 103 106 132 114
    "100001" 106  99 109 109 116 100 115 114
    "100001" 109 105  95 101 101  99 107  93
    "100001"  99 101 106 102 107  99 106 128
    "100001" 109 106 107 109 106 108 107 111
    "100003"  87 102  91  88  98  93  91 101
    "100003" 110 107 110 110 117 106 116 104
    "100003" 100 104  94  91 104  97  87  94
    end

  • #2
    Probably there is a more efficient way to do it, but, here is my solution to your problem

    Code:
    // reshape the data to handle the problem more easily
    
    gen id = _n
    reshape long v, i(id) j(number)
    
    bysort id (v): gen order = _n
    gen keep = order >=3
    
    // Create a variable missing when you want to exclude that value
    g v_new = .
    replace v_new = v if keep ==1
    drop keep order
    
    reshape wide v v_new, i(id) j(number) // rehsape back the data
    
    // generate your variables using egen's function rowtotal()
    
    egen ST1 = rowtotal(v_new*)
    egen ST2 = rowtotal(ST1 SP1 SP2)
    
    
    // Check that none of the selected values is smaller than 95, generating a tag variable = 1 if the variable v_new is smaller than 95
    
    forvalues i = 1(1)6{
        gen tag`i' = v_new`i' <= 95 | mi(v_new`i')
    }

    Comment


    • #3
      Here's another approach, using the user-written module -rowsort-.. That module will create a new list of variables whose values are those of an original set sorted in ascending or descending order. I presume that by "best 4 scores" Simwinga means "largest ... ," and so here's what I'd suggest.
      Code:
      cap ssc install rowsort  // Another of Nick Cox's creations.
      // v1-v6 go into temp1-temp6
      rowsort v1-v6, gen(temp1-temp6) descending
      // temp1-temp4 have the largest 4 values
      gen ST1 = temp1 + temp2 + temp3 + temp4
      gen ST2 = ST1 + SP1 + SP2
      // If 4th score is above 95 all the others must be, too.
      gen all_above_95 = (temp4 > 95)
      drop temp1-temp6
      (I allowed missing values to be treated as "largest," which might not be what is desired.)

      Comment


      • #4
        Thanks for the mention of rowsort. For the record, the version on SSC has been superseded by that published through the Stata Journal. The later version has handles for controlling where missing values are placed.

        Code:
        . search rowsort, sj
        
        Search of official help files, FAQs, Examples, and Stata Journals
        
        SJ-20-2 pr0046_1  . . . . . . . . . . .  Speaking Stata: More ways for rowwise
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                Q2/20   SJ 20(2):481--488                                (no commands)
                focuses on returning which variable or variables are equal
                to the maximum or minimum in a row
        
        SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
                (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
                Q1/09   SJ 9(1):137--157
                shows how to exploit functions, egen functions, and Mata
                for working rowwise; rowsort and rowranks are introduced

        Comment


        • #5
          Thank you everyone, I managed to go around my problem with your help.

          Comment

          Working...
          X