Finding the mode value in aggregated spatial units

Philipp Schrauth

Join Date: Feb 2016
Posts: 31

Finding the mode value in aggregated spatial units

06 Mar 2019, 05:51

Hello,

I have a dataset with two geographical units of observations, districts and labor market regions. Labor market regions are just assembled districts, where each district is only part in one single labor market region. For districts, there exist shapefiles (Polygon files, which can be imported into a GIS program) and for labor market regions not. I also have satellite raster-data where each raster contains one of about 20 possible specific soil-types (so the dominant soil type within a raster). Using QGIS, I calculated zonal statistics and therefore could calculate the number of each soil-type within one district (e.g., there are 40 rasters with soiltype 6111 and 52 rasters with soil-type 6210 in a district that contains 92 rasters). This data I transferred into Stata where I want to aggregate the data and find the mode value of the soil variable within a labor market region. The file looks something like this:

Code:

 District
Labor Market Region
soil_6111
soil_6210
soil_5210
rasters

1
1
40
52
0
92

2
1
560
890
340
1790

3
1
30
600
1000
1630

4
2
800
500
100
140

5
2
400
300
200
900

6
2
100
50
20
170

7
3
340
300
660
1300

8
3
200
100
200
500

Thus, in district 1 and district 2 soil type 6210 is dominant, in district 3 soil type 5210 is dominant and in labor market region 1 soil type 6210 is dominant because it is the one mostly occurring there. Due to the variation of district size, there may be two districts with dominant soil type being e.g. 6111 but as the third region is much larger than the two and has another dominant soil type, this other soil is the dominant one in the labor market region.
Now the question is how to find a way to detect the dominant soil type within labor market regions. I simplyfied the example, actually there are about 20 soil types, but I don't think this should make a difference for the sollution. Any help would be much appreciated.

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35784

06 Mar 2019, 06:44

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(district labormarketregion) int(soil_6111 soil_6210 soil_5210 rasters)
1 1  40  52    0   92
2 1 560 890  340 1790
3 1  30 600 1000 1630
4 2 800 500  100  140
5 2 400 300  200  900
6 2 100  50   20  170
7 3 340 300  660 1300
8 3 200 100  200  500
end

foreach v of var soil* {
    egen `v'_lmr = total(`v'), by(labormarketregion)
}
gen dominant = ""
gen amount = 0 

quietly foreach v of var soil_*_lmr  {
    replace dominant = substr("`v'", 6, 4) if `v' > amount
    replace amount = max(amount, `v')
}

list, sepby(labormarketregion) 

     +---------------------------------------------------------------------------------------------------------------------+
     | district   laborm~n   soi~6111   soi~6210   soi~5210   rasters   so~1_lmr   soil_6..   soil_5~r   dominant   amount |
     |---------------------------------------------------------------------------------------------------------------------|
  1. |        1          1         40         52          0        92        630       1542       1340       6210     1542 |
  2. |        2          1        560        890        340      1790        630       1542       1340       6210     1542 |
  3. |        3          1         30        600       1000      1630        630       1542       1340       6210     1542 |
     |---------------------------------------------------------------------------------------------------------------------|
  4. |        4          2        800        500        100       140       1300        850        320       6111     1300 |
  5. |        5          2        400        300        200       900       1300        850        320       6111     1300 |
  6. |        6          2        100         50         20       170       1300        850        320       6111     1300 |
     |---------------------------------------------------------------------------------------------------------------------|
  7. |        7          3        340        300        660      1300        540        400        860       5210      860 |
  8. |        8          3        200        100        200       500        540        400        860       5210      860 |
     +---------------------------------------------------------------------------------------------------------------------+

See also https://www.stata-journal.com/sjpdf....iclenum=pr0046

Comment

Philipp Schrauth

Join Date: Feb 2016

Posts: 31
#3

07 Mar 2019, 03:06

Neat solution, thank you very much and thank you also for the reference to the stata-journal article!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35784
#4

07 Mar 2019, 06:37

Thanks. Just to add that if there are ties for dominant, this method will not flag them.

Last edited by Nick Cox; 07 Mar 2019, 06:46.
Comment

Announcement

Finding the mode value in aggregated spatial units

Comment

Comment

Comment