How to count observations in the same group that share other characteristics?

Mica Kurtz

Join Date: Dec 2018

Posts: 3
#1

How to count observations in the same group that share other characteristics?

01 Aug 2022, 13:58

Hi All.

I have data that contains school-level observations. I'd like to count the number of schools that are in the same district_id and serve at least one of the same grades.

For instance, if I had the following observations with school_id, district_id lowest_grade and highest_grade, then I'd like to create the last variable "count_same"

0 0 0 5 1

1 0 2 5 1

2 0 6 8 0

3 1 1 6 0

4 1 7 8 0

5 2 0 4 1

6 2 3 5 2

7 2 5 8 1

8 3 0 3 0

9 3 4 8 0

Here you can see observations 0 and 1 both get a "1" because they each have a school in the same district that overlaps with the grades it serves. Observation 2 is in the same district but serves a unique set of grades in that district, so it gets a "0". Observations 3 and 4 are in the same district but don't overlap in grades, so they both get a "0". Observation 5 overlaps with observation 6, so it gets a "1" and observation 6 overlaps with 5 and 7, so it gets a "2" and so on. I hope that's clear enough?

My feeling is that perhaps some form of an egen command might work. Or maybe some clever collapse and merge? Any guidance would be much appreciated!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

01 Aug 2022, 15:23

I am not aware of any -egen- command that does what you ask for. The following code will:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(school_id district_id lowest_grade highest_grade) 0 0 0 5 1 0 2 5 2 0 6 8 3 1 1 6 4 1 7 8 5 2 0 4 6 2 3 5 7 2 5 8 8 3 0 3 9 3 4 8 end // PAIR UP EACH SCHOOL WITH ALL OTHER SCHOOLS IN SAME DISTRICT preserve tempfile copy rename (school_id *_grade) =_2 save `copy' restore rename (school_id *_grade) =_1 joinby district_id using `copy' // ELIMINATE SELF-PAIRING drop if school_id_1 == school_id_2 // DETERMINE PAIRS THAT OVERLAP gen byte overlap = min(highest_grade_1, highest_grade_2) >= /// max(lowest_grade_1, lowest_grade_2) // TOTAL UP THE OVERLAPS FOR EACH SCHOOL collapse (first) *_grade_1 (sum) overlap, by(district_id school_id_1) rename *_1 *

In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Mica Kurtz

Join Date: Dec 2018

Posts: 3
#3

01 Aug 2022, 22:12

This works perfectly, thanks! My first time using joinby -- I'm going to have to read more about this command for future use; and my apologies about the formatting of my question, I'll be sure to do that in the future as well.

Cheers!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#4

02 Aug 2022, 01:54

Cross-posted at https://www.reddit.com/r/stata/comme...me_group_that/

That part of Reddit evidently has a rule

Notify others of cross-posts (x-posts) and updates.

and it is a request here too

https://www.statalist.org/forums/help#crossposting
Comment

Announcement

How to count observations in the same group that share other characteristics?

Comment

Comment

Comment