Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count total number of observations bigger than each value in other variable

    I have a cross-section of data on school applications. Each student applies at multiple schools and needs a score higher than the school minimum to get in. I would like to know for each school the total number of students that have a score above it's minimum, irrespective of whether the student applied to this school or not. The final dataset is very large and I tried some foreach approaches but they did not work.


    Code:
    clear
    input id score school_code school_minimum
    1 50 5678 54
    1 50 3113 60
    1 50 1001 45
    2 55 4143 52
    2 55 5678 54
    2 55 3113 60
    3 51 3113 60
    3 51 3098 50
    3 51 1001 45
    4 62 3098 50
    4 62 3113 60
    4 62 8877 62
    end

    Code:
    levelsof school_minimum, local(cutoffs) 
    levelsof id, local(students) 
    
    gen num_higher_score = .
    
    foreach c in `cutoffs' {
    local count_`c' = 0
    
    foreach id in `students' {
    
    if score > `c' local ++count_`c'
    
    bysort school_code: replace num_higher_score = `count_`c'' if cutoff == `c'
    
    }
    }

  • #2

    Code:
    egen wanted = total(score > school_minimum), by(school_code)

    with a need to adjust if you have missing scores.

    The loop is not needed, but also is a long way from what would work. I guess you're guessing by analogy with some quite different software. The statements within the loop won't work differently each time around the inner loop..
    Last edited by Nick Cox; 09 Dec 2020, 10:02.

    Comment


    • #3
      Dear Nick thanks a lot for your reply! In the end I came up with a working (albeit still slow) solution. However, my result is different from yours so I suppose my question was not clearly phrased!

      Code:
      clear
      input id score school_code school_minimum
      1 50 5678 54
      1 50 3113 60
      1 50 1001 45
      2 55 4143 52
      2 55 5678 54
      2 55 3113 60
      3 51 3113 60
      3 51 3098 50
      3 51 1001 45
      4 62 3098 50
      4 62 3113 60
      4 62 8877 62
      end
      
      * For each school count the number of unique ids in the data with a score higher than the school minimum
      
      levelsof school_minimum, local(cutoffs)
      
      gen num_higher_score = .
      egen flag = tag(id)
      
      foreach c in `cutoffs' {
          count if score > float(`c') & flag == 1
          ass num_higher_score ==. if school_minimum == float(`c')
          replace num_higher_score = r(N) if school_minimum == float(`c')
          }

      Comment

      Working...
      X