Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • spmap location of geographic labels

    Hi,

    I am using a shapefile for the UK regions (NUTS1) from https://data.gov.uk/dataset/2aa6727d...united-kingdom

    I have managed to convert to a shape file and produce a map but I wanted to label the map with region names. Unfortunately the centroids in the data seem incorrect so I used google to look them up and replace them in the dataset.

    When I run my code (see below) all the labels seem to be at the correct latitude but the longitude (X coordinate) is out.

    Can anyone help?

    thanks,
    Jane.

    CODE:
    //
    // file: C:\Users\Jane\Google Drive\PhD\03_Data and modelling\maps\maps.do
    //
    // data source: https://data.gov.uk/dataset/2aa6727d...united-kingdom
    *ssc install geo2xy
    cap global working "C:\Users\Jane\Google Drive\PhD\03_Data and modelling\maps"
    global proj mercator //albers
    *cap global working "C:\Users\leva.sriubaite\Desktop\Jane"
    *cap global working "C:\Users\isri0001\Dropbox\Jane"

    shp2dta using "$working\regions.shp", ///
    database("$working\nuts1_data") coordinates("$working\nuts1_coord") replace genid(nuts1)


    *COORD
    use "$working\nuts1_coord.dta", clear

    /*
    london
    lon lat
    -.30864 51.492271
    2.10864 51.492271
    */
    /*
    set obs 2839094
    replace _ID = 13 in 2839092
    replace _ID = 13 in 2839093
    replace _X = -0.30864 in 2839093
    replace _Y = 51.492271 in 2839093
    replace _ID = 13 in 2839094
    replace _X = 2.10864 in 2839094
    replace _Y = 51.492271 in 2839094
    */
    *drop if _ID==13
    *north
    geo2xy _Y _X, proj($proj) replace
    save "$working\nuts1_coord-albers.dta", replace


    *DATA
    use "$working\nuts1_data.dta", clear
    cap rename var1 lon

    gen lon2=lon
    gen lat2=lat
    // adjust where the region labels are placed

    replace lat=55.1262743 if nuts1==1 //NE
    replace lon=-2.8600543 if nuts1==1
    replace lat=54.0472108 if nuts1==2 //NW
    replace lon=-5.0176158 if nuts1==2
    replace lat=53.9266255 if nuts1==3 //Y&H
    replace lon=-2.328694 if nuts1==3
    replace lat=52.8695323 if nuts1==4 //EMid
    replace lon=-1.582406 if nuts1==4
    replace lat=52.5048901 if nuts1==5 //WMid
    replace lon=-2.0955889 if nuts1==5
    replace lat=52.1988214 if nuts1==6 //East
    replace lon=0.0499475 if nuts1==6
    replace lat=51.5285582 if nuts1==7 //London
    replace lon=-0.2416815 if nuts1==7
    replace lat=51.3800976 if nuts1==8 //SE
    replace lon=-1.373755 if nuts1==8
    replace lat=50.9832435 if nuts1==9 //SW
    replace lon=-5.0733556 if nuts1==9
    replace lat=52.4000296 if nuts1==10 //Wales
    replace lon=-5.2809503 if nuts1==10
    replace lat=57.6676871 if nuts1==11 //Scotland
    replace lon=-8.1709656 if nuts1==11 //-9.1709656


    /*replace lon=-0.72 if nuts1==1 //lon==-1.7289 (East-west) NE
    replace lon=-1.07 if nuts1==2 //lon==-2.7723701 (East-west) NW
    replace lon=-0.18 if nuts1==3 //lon==-1.28712 Y&H
    replace lon=0.45 if nuts1==4 //lon==-0.84966999 EMid
    replace lon=-1.0 if nuts1==5 //lon==-2.20358 WMid
    replace lon=1.8 if nuts1==6 //lon==0.504146 East
    replace lon=3.8 if nuts1==7 //lon==-0.30864 London
    replace lon=0.45 if nuts1==8 //lon==-0.99311 SE
    replace lon=-1.07 if nuts1==9 //lon==-3.63343 SW
    replace lon=-1.07 if nuts1==10 //lon==-3.99416 Wales
    replace lon=-1.07 if nuts1==11 //lon==-3.97091 Scotland

    replace lat=55.09703 if nuts1==1 //lat==55.29703 (North-south)
    replace lat=54.02264 if nuts1==3 //lat==53.93264
    replace lat=50.911 if nuts1==9 //lon==50.811 SW
    replace lat=57.007429 if nuts1==11 //lat==56.177429
    */
    drop if objectid == 12

    cap gen add = nuts118nm
    *Manual adjustment of region name!!!!
    replace add="NE" if add=="North East (England)"
    replace add="NW" if add=="North West (England)"
    replace add="Y&H" if add=="Yorkshire and The Humber"
    replace add="E Mid" if add=="East Midlands (England)"
    replace add="W Mid" if add=="West Midlands (England)"
    replace add="East" if add=="East of England"
    replace add="SE" if add=="South East (England)"
    replace add="SW" if add=="South West (England)"

    /*
    ssc install insheetjson
    ssc install gcode
    ssc install opencagegeo

    gen add2 = nuts118nm
    gen step = subinstr(add2, " ", "+",.)
    gen fulladdress = step + ",+UK"

    gcode fulladdress

    geocode , fulladdr(fulladdress)
    */

    geo2xy lat lon, proj($proj) replace

    save "$working\nuts1_data-albers.dta", replace



    spmap nuts1 if objectid <12 using "$working\nuts1_coord-albers.dta", id(objectid) ///
    label(data("$working\nuts1_data-albers.dta") xcoor(lon) ycoor(lat) label(add) pos(3 0) color(black) size(vsmall)) ///
    title("Per capita accidents in British regions, 2015", size(*1.0)) fcolor(Reds)





  • #2
    Your spmap command indicates that you want the labels to the right (at 3 o'clock) of the centroid so that's perhaps what is throwing you off. In any case, the position of labels usually has to be manually fine tuned. When the shapefile contains geographic coordinates (lat/lon) like yours does, I think it is best to fine tune the label locations using the projected coordinates. Fine tuning the label locations requires redrawing the map several times to observe the effect of moving the label coordinates. One problem here is that your shapefile is really large, it contains more that 2.8 million points. Drawing such a detailed map is very slow.

    The shapefile you refer to (that can be downloaded by clicking on the shapefile link) has an unwieldy name and the database includes a variable called long, which is an invalid variable name in Stata. Here's the code I used to import the shapefile into Stata:

    Code:
    clear all
    local nuts NUTS_Level_1_January_2018_Full_Clipped_Boundaries_in_the_United_Kingdom
    
    shp2dta using "`nuts'/`nuts'.shp", ///
        database("nuts1_data") coordinates("nuts1_coord") replace genid(nuts1)
        
        
    * the shapefile database includes a variable name that is illegal in Stata, i.e. -long-
    use "nuts1_data.dta", clear
    rename var1 lon 
    save "nuts1_data.dta", replace
    The following code shows that this is a very detailed shapefile, with the shortest line segment being around 25cm (geodist is from SSC):
    Code:
    use "nuts1_coord.dta", clear
    gen double lon = _X[_n-1]
    gen double lat = _Y[_n-1]
    geodist lat lon _Y _X, gen(d) sphere
    sum d
    and the summarize results:
    Code:
    . sum d
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               d |  2,829,577    .0136627    .0180984   .0002456   1.447539
    
    .
    The distance from the bottom to the top of the map spans over 1,200km. Assume that an acceptable map has a height of 2000 pixels. Thus a single pixel represents about 600m * 600m. The shapefile contains many line segments that will not be distinguishable on such a printed map. The following code shows how to significantly reduce the size of the shapefile's coordinates dataset by skipping points that form line segments below a specified distance threshold. Since I wrote this, I thought of another algorithm that would probably be superior to this one but this one will do for this post:
    Code:
    clear all
    
    *! version 1.0.0, 09mar2019, Robert Picard, [email protected]
    program geoshrink
    
        version 10
        
        args lat lon threshold
        
        local obs0 = _N
        
        // note the order of observations
        tempvar obsid
        gen long `obsid' = _n
        
        // create identifier for each polygon ring
        tempvar ring 
        gen long `ring' = sum(mi(_X))
        
        // to avoid creating an invalid polygon, lock some points at both ends
        tempvar locked
        bysort `ring' (`obsid'): gen `locked' = inrange(_n,1,3) | inrange(_n,_N-1,_N)
        
        tempvar lat0 lon0 d target runs
        
        local more 1
        qui while `more' {
        
            // distance from the previous point
            gen double `lat0' = `lat'[_n-1]
            gen double `lon0' = `lon'[_n-1]
            geodist `lat' `lon' `lat0' `lon0', gen(`d') sphere
            
            // note initial distances
            if `obs0' == _N {
                noi dis as txt "Initial obs = " as res %12.0fc `obs0'
                sum `d', meanonly
                noi dis as txt "Initial distances between points, min = " ///
                    as res r(min) as txt "  max = " as res r(max)
            }
            
            // target distances below the specified threshold
            gen `target' = `d' < `threshold' & !`locked'
            
            // create runs of target/!target
            gen `runs' = sum(`target' != `target'[_n-1])
            
            // within a run of targets, drop odd observations
            local obs = _N        
            bysort `runs' (`obsid'): drop if `target' & mod(_n,2)
            
            
            // we can stop if no observation was dropped
            local more = `obs' != _N
            
            if `more' drop `lat0' `lon0' `d' `runs' `target'
            
        }
        
        sum `d', meanonly
        noi dis as txt "Final distances between points,   min = " ///
            as res r(min) as txt "  max = " as res r(max)
        dis as txt "Final obs   = " as res %12.0fc `obs'
        
        sort _ID `obsid'
    end
    
    use "nuts1_coord.dta", clear
    
    geoshrink _Y _X .5
    
    geo2xy _Y _X, replace
    
    save "nuts1_coor2use", replace
    With a 500m threshold, the dataset of coordinates is reduced from 2,839,091 points to a very manageable 56,738 points. With this reduced set of projected coordinates, you can create the map you want. Note that I use the default projection in geo2xy. This is the same projection that is used by Google Maps. The projected units have to do with tiles and the map origin point (0,0) is top left, which explains negative _Y and ylab coordinates. There are several ways to approach fine tuning labels and their position. In the following, I use dataex to generate code to create the variables you want to use with spmap and then copied and pasted the dataex output so that manual changes could be made. I do not have a variable to use as an attribute so I used one that came with the shapefile for illustration purposes.

    Code:
    use "nuts1_data.dta", clear
    
    geo2xy lat lon, gen(ylab xlab)
    gen lab = nuts118nm
    
    dataex nuts1 bng_e nuts118nm xlab ylab lab
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte nuts1 long bng_e str24 nuts118nm double(xlab ylab) str24 lab
     1 417313 "North East (England)"     126.5 -80.7 "NE"    
     2 350015 "North West (England)"     126.0 -81.5 "NW"    
     3 446903 "Yorkshire and The Humber" 127.0 -82.1 "Y&H"
     4 477660 "East Midlands (England)"  127.4 -83.4 "E Mid" 
     5 386294 "West Midlands (England)"  126.4 -83.9 "W Mid"
     6 571074 "East of England"          128.3 -84.3 "East"        
     7 517516 "London"                   127.9 -85.0 "Lon"                  
     8 470062 "South East (England)"     127.3 -85.2 "SE"   
     9 285015 "South West (England)"     125.4 -85.8 "SW"   
    10 263406 "Wales"                    125.3 -84.5 "Wales"                   
    11 277746 "Scotland"                 125.2 -78.5 "Scotland"                
    12  86601 "Northern Ireland"         123.2 -81.4 "N Ireland"        
    end
    
    save "nuts1_data2use.dta", replace
    
    use "nuts1_data2use.dta", clear
    spmap bng_e using "nuts1_coor2use.dta", id(nuts1) ///
        label(data("nuts1_data2use.dta") xcoor(xlab) ycoor(ylab) label(lab) ///
        pos(0) color(black) size(small)) ///
        title("Per capita accidents in British regions, 2015", size(*1.0)) fcolor(Reds) 
    
    
    graph export nuts1.png, width(500) replace
    and the resulting map:

    Click image for larger version

Name:	nuts1.png
Views:	1
Size:	420.6 KB
ID:	1487392

    Comment


    • #3
      Hi Robert,

      thanks so much for this. It is a major improvement for me and I'm very grateful. I have also incorporated my per capita accident measure so it is now perfect. I shall acknowledge your help in my PhD.

      Kind regards,
      Jane.

      Comment

      Working...
      X