Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating New Variable to Categorize Based on Maximum of Several Variables


    I am analyzing survey data for 1000 respondents, and want to categorize them based on where they scored the highest across 7 metrics.

    I have 7 continuous variables of scores for class subjects:

    1. score_math
    2. score_science
    3. score_english
    4. score_history
    5. score_spanish
    6. score_reading
    7. score_writing

    And I want to create a new variable (student_segments) that will return discrete values 1 through 7 depending on which of the above 7 variables returned the max score (i.e., if their score for math is their highest of the 7 scores it would return a value of 1 for the variable customer_segment... and 2 for science, 3 for english and so on)

    Any advice on the best way to do this is very much appreciated!

  • #2
    What about ties?

    Code:
    egen max_score = rowmax(score_*) 
    
    gen which_max = . 
    
    tokenize "score_math score_science score_english score_history score_spanish score_reading score_writing" 
    
    quietly forval j = 1/7 { 
          replace which_max = `j' if max_score == ``j'' 
    }
    This code is likely to seem more transparent:

    Code:
    egen max_score = rowmax(score_*) 
    
    gen which_max = . 
    local j = 1 
    
    foreach subj in math science english history spanish reading writing { 
          replace which_max = `j' if max_score == score_`subj'
          local j = `j' + 1  
    }
    In each case the code reacts to ties by using the last-named subject with the highest score.


    Comment

    Working...
    X