If the location of each school is determined by geographic coordinates (latitude and longitude), then geonear (from SSC) can quickly find out, in long form, the list of schools within a specific radius. It seems to me that schools do not physically move so I do not see why the list of nearby school would have to be recalculated yearly. Once you have a list of nearby schools, all you need is to merge the annual enrollment for each of these nearby schools and calculate the total enrollment of nearby schools directly, no loops required.
Another approach to a nearest neighbor problem is to use the brute force approach. You form all pairwise combination of schools (per year) and then compute distance and then the desired statistics by group. Per year, this can be done using cross; if the problem is not too large, you can use joinby to do it for all years at once.
The post referenced in #10 includes a data example where the distances have already been computed. The problem is that the distances are recorded in wide form and I assume that there are as many distance variables as there are schools. The example can be morphed into the brute force approach explained in the previous paragraph by reshaping the data to long form. Unfortunately, the reshape command can be quite slow, particularly when reshaping to long.
The following solution uses a more efficient technique to reshape to long. Once you have all pairwise combination of schools per year in long form, you merge enrollment data for nearby schools and then add up enrollment for schools within 25km. A running sum() is more efficient than egen total(). The enrollment of the current school is removed at the end.
Another approach to a nearest neighbor problem is to use the brute force approach. You form all pairwise combination of schools (per year) and then compute distance and then the desired statistics by group. Per year, this can be done using cross; if the problem is not too large, you can use joinby to do it for all years at once.
The post referenced in #10 includes a data example where the distances have already been computed. The problem is that the distances are recorded in wide form and I assume that there are as many distance variables as there are schools. The example can be morphed into the brute force approach explained in the previous paragraph by reshaping the data to long form. Unfortunately, the reshape command can be quite slow, particularly when reshaping to long.
The following solution uses a more efficient technique to reshape to long. Once you have all pairwise combination of schools per year in long form, you merge enrollment data for nearby schools and then add up enrollment for schools within 25km. A running sum() is more efficient than egen total(). The enrollment of the current school is removed at the end.
Code:
* example from http://www.statalist.org/forums/forum/general-stata-discussion/general/1308490-complex-matching-on-rows-and-collumms clear input float(school_code year kids dist_to_1 dist_to_2 dist_to_3 kids_in_nearby_schools) 1 1 34 0 20 40 . 1 2 42 0 20 40 . 1 3 21 0 20 40 . 2 1 11 20 0 20 . 2 2 23 20 0 20 . 2 3 31 20 0 20 . 3 1 17 40 20 0 . 3 2 19 40 20 0 . 3 3 36 40 20 0 . end isid year school_code, sort list, sepby(year) tempfile master save "`master'" * variables to reshape to long unab vlong: dist_to_* * data on number of kids per school for merging later on keep school_code year kids rename (school_code kids) (school_near kids_near) tempfile kids save "`kids'" * reshape to long; use a more efficient method than -reshape- local n 0 foreach v of local vlong { use school_code year kids `v' using "`master'", clear rename `v' dist_near local id = subinstr("`v'", "dist_to_","", 1) gen school_near = `id' local n = `n' + 1 tempfile hold`n' save "`hold`n''" } clear forvalues i = 1 / `n' { dis "hold`i'" append using "`hold`i''" } * merge kid counts for nearby schools merge m:1 year school_near using "`kids'", assert(match) nogen * add up enrollment for neighbor schools using a running sum * and keep the last observation isid year school_code school_near, sort by year school_code: gen wanted = sum(kids_near / (dist_near <= 25)) by year school_code: keep if _n == _N replace wanted = wanted - kids isid school_code year, sort list, sepby(school_code)
Comment