Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to count observations in the same group that share other characteristics?

    Hi All.

    I have data that contains school-level observations. I'd like to count the number of schools that are in the same district_id and serve at least one of the same grades.

    For instance, if I had the following observations with school_id, district_id lowest_grade and highest_grade, then I'd like to create the last variable "count_same"
    0 0 0 5 1
    1 0 2 5 1
    2 0 6 8 0
    3 1 1 6 0
    4 1 7 8 0
    5 2 0 4 1
    6 2 3 5 2
    7 2 5 8 1
    8 3 0 3 0
    9 3 4 8 0
    Here you can see observations 0 and 1 both get a "1" because they each have a school in the same district that overlaps with the grades it serves. Observation 2 is in the same district but serves a unique set of grades in that district, so it gets a "0". Observations 3 and 4 are in the same district but don't overlap in grades, so they both get a "0". Observation 5 overlaps with observation 6, so it gets a "1" and observation 6 overlaps with 5 and 7, so it gets a "2" and so on. I hope that's clear enough?

    My feeling is that perhaps some form of an egen command might work. Or maybe some clever collapse and merge? Any guidance would be much appreciated!

  • #2
    I am not aware of any -egen- command that does what you ask for. The following code will:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(school_id district_id lowest_grade highest_grade)
    0 0 0 5
    1 0 2 5
    2 0 6 8
    3 1 1 6
    4 1 7 8
    5 2 0 4
    6 2 3 5
    7 2 5 8
    8 3 0 3
    9 3 4 8
    end
    
    //  PAIR UP EACH SCHOOL WITH ALL OTHER SCHOOLS IN SAME DISTRICT
    preserve
    tempfile copy
    rename (school_id *_grade) =_2
    save `copy'
    restore
    rename (school_id *_grade) =_1
    joinby district_id using `copy'
    
    //  ELIMINATE SELF-PAIRING
    drop if school_id_1 == school_id_2
    
    //  DETERMINE PAIRS THAT OVERLAP
    gen byte overlap = min(highest_grade_1, highest_grade_2) >= ///
        max(lowest_grade_1, lowest_grade_2)
        
    //  TOTAL UP THE OVERLAPS FOR EACH SCHOOL
    collapse (first) *_grade_1 (sum) overlap, by(district_id school_id_1)
    rename *_1 *
    In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      This works perfectly, thanks! My first time using joinby -- I'm going to have to read more about this command for future use; and my apologies about the formatting of my question, I'll be sure to do that in the future as well.

      Cheers!

      Comment


      • #4
        Cross-posted at https://www.reddit.com/r/stata/comme...me_group_that/

        That part of Reddit evidently has a rule

        Notify others of cross-posts (x-posts) and updates.
        and it is a request here too

        https://www.statalist.org/forums/help#crossposting

        Comment

        Working...
        X