Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding the mode value in aggregated spatial units

    Hello,

    I have a dataset with two geographical units of observations, districts and labor market regions. Labor market regions are just assembled districts, where each district is only part in one single labor market region. For districts, there exist shapefiles (Polygon files, which can be imported into a GIS program) and for labor market regions not. I also have satellite raster-data where each raster contains one of about 20 possible specific soil-types (so the dominant soil type within a raster). Using QGIS, I calculated zonal statistics and therefore could calculate the number of each soil-type within one district (e.g., there are 40 rasters with soiltype 6111 and 52 rasters with soil-type 6210 in a district that contains 92 rasters). This data I transferred into Stata where I want to aggregate the data and find the mode value of the soil variable within a labor market region. The file looks something like this:

    Code:
     
    District Labor Market Region soil_6111 soil_6210 soil_5210 rasters
    1 1 40 52 0 92
    2 1 560 890 340 1790
    3 1 30 600 1000 1630
    4 2 800 500 100 140
    5 2 400 300 200 900
    6 2 100 50 20 170
    7 3 340 300 660 1300
    8 3 200 100 200 500
    Thus, in district 1 and district 2 soil type 6210 is dominant, in district 3 soil type 5210 is dominant and in labor market region 1 soil type 6210 is dominant because it is the one mostly occurring there. Due to the variation of district size, there may be two districts with dominant soil type being e.g. 6111 but as the third region is much larger than the two and has another dominant soil type, this other soil is the dominant one in the labor market region.
    Now the question is how to find a way to detect the dominant soil type within labor market regions. I simplyfied the example, actually there are about 20 soil types, but I don't think this should make a difference for the sollution. Any help would be much appreciated.

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(district labormarketregion) int(soil_6111 soil_6210 soil_5210 rasters)
    1 1  40  52    0   92
    2 1 560 890  340 1790
    3 1  30 600 1000 1630
    4 2 800 500  100  140
    5 2 400 300  200  900
    6 2 100  50   20  170
    7 3 340 300  660 1300
    8 3 200 100  200  500
    end
    
    foreach v of var soil* {
        egen `v'_lmr = total(`v'), by(labormarketregion)
    }
    gen dominant = ""
    gen amount = 0 
    
    quietly foreach v of var soil_*_lmr  {
        replace dominant = substr("`v'", 6, 4) if `v' > amount
        replace amount = max(amount, `v')
    }
    
    list, sepby(labormarketregion) 
    
         +---------------------------------------------------------------------------------------------------------------------+
         | district   laborm~n   soi~6111   soi~6210   soi~5210   rasters   so~1_lmr   soil_6..   soil_5~r   dominant   amount |
         |---------------------------------------------------------------------------------------------------------------------|
      1. |        1          1         40         52          0        92        630       1542       1340       6210     1542 |
      2. |        2          1        560        890        340      1790        630       1542       1340       6210     1542 |
      3. |        3          1         30        600       1000      1630        630       1542       1340       6210     1542 |
         |---------------------------------------------------------------------------------------------------------------------|
      4. |        4          2        800        500        100       140       1300        850        320       6111     1300 |
      5. |        5          2        400        300        200       900       1300        850        320       6111     1300 |
      6. |        6          2        100         50         20       170       1300        850        320       6111     1300 |
         |---------------------------------------------------------------------------------------------------------------------|
      7. |        7          3        340        300        660      1300        540        400        860       5210      860 |
      8. |        8          3        200        100        200       500        540        400        860       5210      860 |
         +---------------------------------------------------------------------------------------------------------------------+
    See also https://www.stata-journal.com/sjpdf....iclenum=pr0046

    Comment


    • #3
      Neat solution, thank you very much and thank you also for the reference to the stata-journal article!

      Comment


      • #4
        Thanks. Just to add that if there are ties for dominant, this method will not flag them.
        Last edited by Nick Cox; 07 Mar 2019, 06:46.

        Comment

        Working...
        X