spmap - create rectangles to show labels and colors for small jurisdictions in choropleth maps

David Radwin

Join Date: Mar 2014

Posts: 368
#1

spmap - create rectangles to show labels and colors for small jurisdictions in choropleth maps

05 Jun 2018, 18:46

I am making several choropleth (data) maps of the United States using spmap (M. Pisati, available from SSC). Some of the states in the northeast are too small to label or even to distinguish the color. I would like to add rectangles, adjacent to the small jurisdictions, to show the labels and colors legibly, like this example PNG image from this web page.

I have two questions.

1. What is the correct term for these rectangles?
2. Can anyone please share code to create them?

I imagine #2 is a matter of entering the coordinates and corresponding centroids in the shapefile, but I would really appreciate any help you may offer so as to not reinvent the wheel.

David

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Tags: None

Robert Picard

Join Date: Mar 2014
Posts: 1536

07 Jun 2018, 10:02

I think this is an interesting problem that deserves attention. I don't know if there's a name for these "rectangles" but it's hard to see how you can make a compelling choropleth map of US states without using a device similar to these outside proxy squares for some of the smaller states. This is particularly true if you want to overlay labels for each state. The short story is that you need to create polygons in the shape you want and add these, as if they were islands, to the polygon(s) of the desired state(s).

The example map in #1 is the product of a series of large and small decisions and I don't think this can easily be automated. Just creating a basic composite map of the 50 states requires a bit of effort. I've done some ground work on this in the Fun with Maps section of the geo2xy (from SSC) help files. Because the placement of labels will use centroids as starting locations, I'll start from scratch here. The first step is to get a shapefile of the 50 US states, DC, and Puerto Rico. At the same time, I'll create a dataset with an attribute variable to use for the map.

Code:

* http://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_us_state_20m.zip
shp2dta using "cb_2017_us_state_20m/cb_2017_us_state_20m.shp", ///
    data("dbase.dta") coor("coor.dta") genid(id) gencentroids(c) replace

use "dbase.dta", clear
rename (id x_c y_c) (_ID _X _Y)
save "dbase.dta", replace

* make an attribute variable
set seed 312356
use "dbase.dta", clear
keep STUSPS
gen pcchange = runiformint(-30,30)
gen spcchange = string(pcchange)
save "attribute.dta", replace

The second step is to convert the lat/lon coordinates of the downloaded shapefile to planar coordinates. To create the desired composite map, you will need to scale and offset the coordinates for Alaska, Hawaii, and Puerto Rico. Because the Census keeps changing their shapefiles (and the _ID codes used for each state), I'm creating local macros for the state identifiers (_ID) that need to be scaled and moved. The state's centroid coordinates are located in the database part of the shapefile (one per state) and they also need to be converted to planar using the exact same projection. To make things simpler, I'm combining the two sets of coordinates, applying the projections, and splitting them back at the end. I use USGS recommended projection parameters except for Puerto Rico where I simply use default parameters. This appears complicated but it only needs to be done once. The resulting database and coordinates form a shapefile of the composite US states in planar coordinates.

Code:

* create macros that contain the _ID of these 3 states
use "dbase.dta", clear
foreach state in AK HI PR {
    sum _ID if STUSPS == "`state'", meanonly
    local `state' = r(min)
    dis "`state' = " ``state''
}

* combine centroids with polygon coordinates to apply the same projections
use _ID _X _Y STUSPS using "dbase.dta", clear
append using "coor.dta"

// flip longitudes to reconnect Hawaii and Alaska
replace _X = cond(_X > 0, _X - 180, _X + 180) if inlist(_ID, `AK', `HI')

// Alaska - USGS recommends standard parallels of 55 and 65 north
sum _X if _ID == `AK'
local midlon = (r(min) + r(max)) / 2
geo2xy _Y _X if _ID == `AK', replace ///
        proj(albers, 6378137 298.257223563 55 65 0 `midlon')
replace _Y = _Y / 3 + 800000 if _ID == `AK'
replace _X = _X / 3 - 1700000 if _ID == `AK'

// Hawaii - USGS recommends standard parallels of 8 and 18 north
sum _X if _ID == `HI'
local midlon = (r(min) + r(max)) / 2
geo2xy _Y _X if _ID == `HI', replace ///
        proj(albers, 6378137 298.257223563 8 18 0 `midlon')
replace _Y = _Y / 1.2 + 850000 if _ID == `HI'
replace _X = _X / 1.2 - 800000 if _ID == `HI'

// Puerto Rico
geo2xy _Y _X if _ID == `PR', replace proj(albers)
replace _Y = _Y + 500000 if _ID == `PR'
replace _X = _X + 2200000 if _ID == `PR'

// 48 states - USGS recommends standard parallels of 29.5 and 45.5 north
sum _X if !inlist(_ID, `AK', `HI', `PR')
local midlon = (r(min) + r(max)) / 2
geo2xy _Y _X if !inlist(_ID, `AK', `HI', `PR'), replace ///
        proj(albers, 6378137 298.257223563 29.5 45.5 0 `midlon')
        
* convert to km and save the database and coordinates
replace _Y = _Y / 1000
replace _X = _X / 1000

preserve
keep if !mi(STUSPS)
rename (_Y _X) (y_c x_c)
merge 1:1 _ID using "dbase.dta", assert(match) nogen
sort _ID
save "xy_dbase.dta", replace
restore
drop if !mi(STUSPS)
drop STUSPS
sort _ID, stable
save "xy_coor.dta", replace

* show the basic map made from projected coordinates
use "xy_dbase.dta", clear
spmap using "xy_coor.dta", id(_ID)

graph export "basic.png", width(1000) replace

And here's the map generate:

Click image for larger version

Name: basic.png
Views: 1
Size: 335.3 KB
ID: 1448048

In the next post, I'll show how to derive anchor and label locations for each state.

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

07 Jun 2018, 10:24

With the composite shapefile generated in #2, I now address the issue of label placement. Centroids are a good starting point but they are not well positioned in certain states like Michigan (because the Upper Peninsula pulls it to the north west) and Florida. Note that I'm not an expert on spmap so all the credit goes to Scott Merryman for showing how to create multiline labels in this post. The final goal here is to finalize, for each state, the anchor coordinates (the location that is pointed to by the line that connects the state to its proxy square ) and the label coordinates (where the label goes). For all states where there is no need for a proxy square, the anchor and label coordinates are the same.

Because this requires manual tweaks and that the nature of the label can affect placement, I implement a strategy where coordinates can easily be changed from within the code itself. The code tries to follow the map example in #1 where labels from proxy squares are on a single line. In terms of label placement, I also respect the model so the line connector can connect either to the center-right or center-left of the proxy square. The anchor locations are marked by a red x. In choosing the location, I leave enough space for the proxy square. You can always come back to this step if the final result does not look right.

Code:

* derive from centroids anchor and label points
use "xy_dbase.dta", clear

gen x_anchor = int(x_c)
gen y_anchor = int(y_c)
gen x_label = x_anchor
gen y_label = y_anchor
sort STUSPS
dataex _ID STUSPS x_anchor y_anchor x_label y_label

* Edit the following to taste
clear
input byte _ID str2 STUSPS float(x_anchor y_anchor x_label y_label)
 1 "AK" -1591 3070 -1591 3070
19 "AL"   837 3478   837 3478
21 "AR"   308 3680   308 3680
20 "AZ" -1439 3726 -1439 3726
 2 "CA" -2067 4196 -2067 4196
 3 "CO"  -831 4176  -831 4176
25 "CT"  1892 4658  2090 4510
 4 "DC"  1606 4282  2040 4100
26 "DE"  1745 4290  2040 4300
45 "FL"  1390 3064  1390 3064
27 "GA"  1152 3498  1152 3498
28 "HI"  -750 2650  -750 2650
 7 "IA"   192 4480   192 4480
 5 "ID" -1485 4884 -1485 4884
 6 "IL"   560 4272   560 4272
22 "IN"   808 4276   808 4276
23 "KS"  -219 4079  -219 4079
 8 "KY"   950 4050   950 4050
 9 "LA"   310 3300   310 3300
31 "MA"  1953 4755  2200 4860
10 "MD"  1745 4230  2040 4200
24 "ME"  2058 5133  2058 5133
32 "MI"   900 4620   900 4620
11 "MN"   118 4952   118 4952
12 "MO"   290 4068   290 4068
33 "MS"   574 3453   574 3453
46 "MT" -1050 5107 -1050 5107
39 "NC"  1489 3900  1489 3900
40 "ND"  -351 5084  -351 5084
34 "NE"  -328 4423  -328 4423
36 "NH"  1926 4905  1690 5200
37 "NJ"  1790 4465  2040 4400
38 "NM"  -934 3671  -934 3671
35 "NV" -1763 4367 -1763 4367
13 "NY"  1638 4750  1638 4750
42 "OH"  1096 4354  1096 4354
43 "OK"  -150 3752  -150 3752
14 "OR" -1955 4942 -1955 4942
44 "PA"  1498 4486  1498 4486
52 "PR"  2213 2600  2213 2620
41 "RI"  1987 4687  2200 4600
29 "SC"  1366 3671  1366 3671
30 "SD"  -348 4750  -348 4750
15 "TN"   848 3850   848 3850
16 "TX"  -330 3297  -330 3297
47 "UT" -1346 4282 -1346 4282
17 "VA"  1484 4100  1484 4100
48 "VT"  1831 4926  1600 5100
49 "WA" -1851 5311 -1851 5311
18 "WI"   461 4778   461 4778
50 "WV"  1307 4199  1307 4199
51 "WY"  -947 4640  -947 4640
end

* identify the list of special cases
gen rconnect = 1 if inlist(STUSPS,"NH","VT")
replace rconnect = 0 if inlist(STUSPS,"MA","RI","CT","NJ","DE","MD","DC")

* create 2 labels for each state
merge 1:1 STUSPS using "attribute.dta", assert(match) nogen
expand 2
bysort STUSPS: gen lgroup = _n
gen s = cond(mi(rconnect), STUSPS, STUSPS + ", " + spcchange) if lgroup == 1
replace s = spcchange if mi(rconnect) & lgroup == 2
save "xy_dbase_locations.dta", replace

use "xy_dbase.dta", clear
spmap using "xy_coor.dta", id(_ID) /// 
    label(data("xy_dbase_locations.dta") xcoord(x_label) ycoord(y_label) ///
    by(lgroup) label(s) color(blue) size(*0.6 ..) pos(0 6)) ///
    point(data("xy_dbase_locations.dta") xcoord(x_anchor) ycoord(y_anchor) ///
    shape(x) ocolor(red) )

graph export "locations.png", width(1000) replace

and here's the resulting map:

Click image for larger version

Name: locations.png
Views: 1
Size: 425.6 KB
ID: 1448052

I the next post, I'll create the squares and connectors.

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

07 Jun 2018, 10:53

In order for spmap to assign the state's shading and color to the proxy square, you need to create a polygon for the proxy square and append it to the shapefile's coordinates. The proxy square is essentially like an extra island for the state (with the same _ID value as the original state polygon(s)). I create two types of squares, one that connects center-right and the other center-left. The first observation of a polygon has missing coordinates. The next observation is the starting point. A polygon always ends where it started. To keep it simple, I create two unit polygons, one for each type. I then use joinby to form all pairwise combinations of states with proxy squares with these unit squares. There's room to play with sizes here (I create squares of 60km * 60km).

Code:

* polygon for squares that connect to the right or left
clear
input float(obs x y rconnect)
1  .  . 0
2 -1  0 0
3 -1  1 0
4  1  1 0
5  1 -1 0
6 -1 -1 0
7 -1  0 0
1  .  . 1
2  1  0 1
3  1 -1 1
4 -1 -1 1
5 -1  1 1
6  1  1 1
7  1  0 1
end
tempfile poly
save "`poly'"

* group label by whether they connect to the middle-right or not
use "xy_dbase_locations.dta", clear
keep if !mi(rconnect)
keep if lgroup == 1
sort rconnect STUSPS
keep _ID STUSPS x_label y_label rconnect
list , sepby(rconnect)

* form all pairwise combinations with the polygon points
joinby rconnect using "`poly'"
isid rconnect STUSPS obs, sort

* scale using 30 km for each unit segment
gen x_sq = x_label + cond(rconnect,130,-130) + 30 * x
gen y_sq = y_label + 30 * y
save "squares.dta", replace

* combine with the composite map's coordinates
keep _ID x_sq y_sq
rename (x_sq y_sq) (_X _Y)
keep _ID _X _Y
append using "xy_coor.dta"
sort _ID, stable
save "xy_coor_with_squares.dta", replace

To be able to connect the proxy square to the state, you now need to create polylines that spmap can use. These follow the same structure as the coordinates used to create the shapes of the state except that they do not end where they started. A single line connector will have 3 observations. The first has missing coordinates. The second is the coordinates of the proxy square's initial point. The third is the state's anchor location. In order to allow for tweaks, I use the same dataex technique that I used in #3. Specifically, I added an elbow to the connector for DC.

Code:

* get the state's anchor location
use "xy_dbase_locations.dta", clear
keep if !mi(rconnect)
keep if lgroup == 1
sort STUSPS
keep _ID STUSPS x_anchor y_anchor rconnect
list , sepby(rconnect)
tempfile anchors
save "`anchors'"

* each polyline contains at least 3 obs, starting with a missing value.
* the second obs constains the location of the square connect point
use _ID STUSPS x_sq y_sq obs if obs <= 3 using "squares.dta", clear
merge m:1 STUSPS using "`anchors'", assert(match) nogen
isid STUSPS obs, sort

* move the label location as the final point in the polyline
gen _X = cond(obs == 3, x_anchor, x_sq)
gen _Y = cond(obs == 3, y_anchor, y_sq)

* print out initial polyline data, copy output here and adjust as needed
dataex _ID STUSPS _X _Y

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte _ID str2 STUSPS float(_X _Y)
25 "CT"    .    .
25 "CT" 1930 4510
25 "CT" 1892 4658
 4 "DC"    .    .
 4 "DC" 1880 4100
 4 "DC" 1800 4100
 4 "DC" 1606 4282
26 "DE"    .    .
26 "DE" 1880 4300
26 "DE" 1745 4290
31 "MA"    .    .
31 "MA" 2040 4860
31 "MA" 1953 4755
10 "MD"    .    .
10 "MD" 1880 4200
10 "MD" 1745 4230
36 "NH"    .    .
36 "NH" 1850 5200
36 "NH" 1926 4905
37 "NJ"    .    .
37 "NJ" 1880 4400
37 "NJ" 1790 4465
41 "RI"    .    .
41 "RI" 2040 4600
41 "RI" 1987 4687
48 "VT"    .    .
48 "VT" 1760 5100
48 "VT" 1831 4926
end
drop STUSPS

save "connectors.dta", replace

Finally, it's time to put all of this together.

Code:

use "xy_dbase.dta", clear
merge 1:1 STUSPS using "attribute.dta", assert(match) nogen

spmap pcchange using "xy_coor_with_squares.dta", id(_ID) /// 
    fcolor(Greens) ocolor(gs10 gs9 gs8 gs7)  ///
    label( data("xy_dbase_locations.dta") xcoord(x_label) ycoord(y_label) ///
    by(lgroup) label(s)size(*0.6 ..) pos(0 6) ) ///
    line( data("connectors.dta") ) legend(ring(0) bplace(s) bmargin(25 0 3 0)) 

graph export "map_all.png", width(1000) replace

and here's the resulting map:

Click image for larger version

Name: map_all.png
Views: 1
Size: 469.0 KB
ID: 1448057

In terms of file organization, I broke down each step into separate do-files and have a master do-file run to the whole thing:

Code:

do get_data
do xy_data
do locations
do squares
do connectors
do map_all

Comment

David Radwin

Join Date: Mar 2014

Posts: 368
#5

08 Jun 2018, 07:00

Wow, this is exactly what I need! Thank you very much, Robert Picard! I appreciate it very much.

David

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

08 Jun 2018, 10:01

In light of the sad new this morning about Anthony Bourdain, I thought I could show how to reuse the code to replicate the recent CDC figure on trends in suicide rates.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 STUSPS str13 raw_both str5 spcchange str2 rank float pcchange
"AL" "+ 21.9 % (33)" "+21.9" "33" 21.9
"AK" "+ 37.4 % (13)" "+37.4" "13" 37.4
"AZ" "+ 17.3 % (42)" "+17.3" "42" 17.3
"AR" "+ 36.8 % (15)" "+36.8" "15" 36.8
"CA" "+ 14.8 % (46)" "+14.8" "46" 14.8
"CO" "+ 34.1 % (22)" "+34.1" "22" 34.1
"CT" "+ 19.2 % (34)" "+19.2" "34" 19.2
"DE" "+ 5.9 % (50)"  "+5.9"  "50"  5.9
"DC" "+ 16.1 % (45)" "+16.1" "45" 16.1
"FL" "+ 10.6 % (48)" "+10.6" "48" 10.6
"GA" "+ 16.2 % (44)" "+16.2" "44" 16.2
"HI" "+ 18.3 % (38)" "+18.3" "38" 18.3
"ID" "+ 43.2 %  (7)" "+43.2" "7"  43.2
"IL" "+ 22.8 % (32)" "+22.8" "32" 22.8
"IN" "+ 31.9 % (25)" "+31.9" "25" 31.9
"IA" "+ 36.2 % (18)" "+36.2" "18" 36.2
"KS" "+ 45.0 % (5)"  "+45.0" "5"    45
"KY" "+ 36.6 % (16)" "+36.6" "16" 36.6
"LA" "+ 29.3 % (26)" "+29.3" "26" 29.3
"ME" "+ 27.4 % (29)" "+27.4" "29" 27.4
"MD" "+ 8.5 % (49)"  "+8.5"  "49"  8.5
"MA" "+ 35.3 % (20)" "+35.3" "20" 35.3
"MI" "+ 32.9 % (24)" "+32.9" "24" 32.9
"MN" "+ 40.6 % (8)"  "+40.6" "8"  40.6
"MS" "+ 17.8 % (40)" "+17.8" "40" 17.8
"MO" "+ 36.4 % (17)" "+36.4" "17" 36.4
"MT" "+ 38.0 % (11)" "+38.0" "11"   38
"NE" "+ 16.2 % (43)" "+16.2" "43" 16.2
"NV" "- 1.0 % (51)"  "-1.0"  "51"   -1
"NH" "+ 48.3 % (3)"  "+48.3" "3"  48.3
"NJ" "+ 19.2 % (35)" "+19.2" "35" 19.2
"NM" "+ 18.3 % (39)" "+18.3" "39" 18.3
"NY" "+ 28.8 % (27)" "+28.8" "27" 28.8
"NC" "+ 12.7 % (47)" "+12.7" "47" 12.7
"ND" "+ 57.6 % (1)"  "+57.6" "1"  57.6
"OH" "+ 36.0 % (19)" "+36.0" "19"   36
"OK" "+ 37.6 % (12)" "+37.6" "12" 37.6
"OR" "+ 28.2 % (28)" "+28.2" "28" 28.2
"PA" "+ 34.3 % (21)" "+34.3" "21" 34.3
"RI" "+ 34.1 % (23)" "+34.1" "23" 34.1
"SC" "+ 38.3 % (10)" "+38.3" "10" 38.3
"SD" "+ 44.5 % (6)"  "+44.5" "6"  44.5
"TN" "+ 24.2 % (31)" "+24.2" "31" 24.2
"TX" "+ 18.9 % (36)" "+18.9" "36" 18.9
"UT" "+ 46.5 % (4)"  "+46.5" "4"  46.5
"VT" "+ 48.6 % (2)"  "+48.6" "2"  48.6
"VA" "+ 17.4 % (41)" "+17.4" "41" 17.4
"WA" "+ 18.8 % (37)" "+18.8" "37" 18.8
"WV" "+ 37.1 % (14)" "+37.1" "14" 37.1
"WI" "+ 25.8 % (30)" "+25.8" "30" 25.8
"WY" "+ 39.0 % (9)"  "+39.0" "9"    39
end
replace pcchange = round(pcchange, 1)
replace spcchange = string(pcchange)
tempfile suicide_rates
save "`suicide_rates'"

* update labels
use "xy_dbase_locations.dta", clear
drop pcchange spcchange s
merge m:1 STUSPS using "`suicide_rates'", nogen

gen s = cond(mi(rconnect), STUSPS, STUSPS + ", " + spcchange) if lgroup == 1
replace s = spcchange if mi(rconnect) & lgroup == 2
drop if STUSPS == "PR"
save "xy_dbase_labels.dta", replace

use "xy_dbase.dta", clear
merge 1:1 STUSPS using "`suicide_rates'", nogen
drop if STUSPS == "PR"

spmap pcchange using "xy_coor_with_squares.dta", id(_ID) /// 
    title("Percent change in annual suicide rate (1999-2016)", size(*0.7) color(gs4)) ///
    caption("source: https://stacks.cdc.gov/view/cdc/53785", size(*0.5) color(gs10)) ///
    clbreaks(-1 0 18 30 37 58) clmethod(custom) ///
    fcolor(ltblue "230 200 200" "230 150 150" "230 100 100" "230 50 50") ocolor(gs10 gs9 gs8 gs7)  ///
    label( data("xy_dbase_labels.dta") xcoord(x_label) ycoord(y_label) ///
    by(lgroup) label(s) size(*0.6 ..) pos(0 6) ) ///
    line( data("connectors.dta") ) legend(ring(0) bplace(s) bmargin(25 0 0 0)) 

graph export "map_suicides.png", width(1000) replace

and the resulting map:

Click image for larger version

Name: map_suicides.png
Views: 1
Size: 467.7 KB
ID: 1448210

Comment

David Radwin

Join Date: Mar 2014

Posts: 368
#7

30 Aug 2018, 11:40

Many thanks to Robert Picard and Michael Stepner for their helpful guidance on maps. Today the National Center for Education Statistics in the US Department of Education released a publication that (among other things) uses maps to show regional variation in financial aid applications. Naturally, their contributions are also acknowledged in the references.

https://nces.ed.gov/pubsearch/pubsin...?pubid=2018418

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
1 like
Comment
Ann Obadan

Join Date: Sep 2019

Posts: 1
#8

23 Sep 2019, 18:05

Thank you Robert Picard. I am trying to apply this your codes in step 3 to developing spmap for California county. I have some counties which are really small and i want to use the rectangle style in this post but i cannot seem to get the right coordinates that would place the points outside the map. Please can you clarify how i can derive this estimates. Thank you.
Comment
Mustabsar Awais

Join Date: Jan 2017

Posts: 21
#9

25 Jan 2020, 18:24

Dear Robert Picard

Thanks so much for detailed steps in this post. It is very helpful indeed. I have been able to replicate your code. However, in the final map, the connecting lines for the small states are inconsistent. I would really appreciate if you guide me on this issue. The output map is here:

Thanks

Attached Files
1 like
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#10

26 Jan 2020, 09:32

I downloaded the shapefile anew (link in the first code block of #2) and copied the code from #2, #3, and #4 into separate do-files and ran a master do-file with the following content:

Code:

version 15 do get_data.do // #2 code block 1 do xy_data.do // #2 code block 2 do locations.do // #3 code block 1 do squares.do // #4 code block 1 do connectors.do // #4 code block 2 do map_all.do // #4 code block 3

The final map looks identical to the one in #4. I can't tell where things went wrong for you but check that you copied all the parts correctly and run the do-files as a whole.
1 like
Comment

Fahad Mirza

Join Date: Sep 2018
Posts: 241

#11

25 Sep 2020, 14:08

Hello Robert,

Thank you so much for this detailed code of how you got to make the map above and I think it looks fantastic. I wanted to apply your technique and in a way replicate the map you have on my own dataset that is of Pakistan. The problem is that I was unable to get the final step working where I am able to connect labels outside the map with lines and squares.

I was wondering if you could kindly have a look at my code and let me know where I am going wrong. It will be great if you can help me out with this.

I have also attached an image that shows at what point I reached. Also, I used the grmap option in Stata 16 to get this done.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int _ID float district_new double(_CX _CY) float(Shape_Leng Shape_Area) str19 ADM2_EN float tot_complaints_dist
 96   1 71.57102399300932  33.96826304010452  1.522665 .12526974 "Peshawar"         39520
 76   2 72.09787154023972  34.31495116535756    1.8832 .16054375 "Mardan"           14455
 15   3 71.71738341794575  34.24465505806053 1.4824945 .09679574 "Charsadda"         7496
  8   4 70.65651337271892  32.91955405915595  1.778845 .11399214 "Bannu"             8962
 21   5 70.69899065440383 31.869962607712854  4.436749  .7034219 "Dera Ismail Khan" 10406
127   6 72.03566989875996  35.28192972957805 3.3593645 .37131655 "Upper Dir"         3312
 72   7 71.84545004152172  34.84476356450102 2.0598717  .1572234 "Lower Dir"         4644
  1   8 73.26998212643791 34.108332575252426 2.1874955  .1715639 "Abbottabad"       13679
 10   9 73.12267415270053  34.79141270761624 2.0509117 .13352133 "Batagram"          1629
 12  10 72.51782763039995  34.46521312932733  2.162965 .17023906 "Buner"             3019
 17  11 72.22485265048144  36.22554788585233   8.20395 1.4665498 "Chitral"           1524
 34  12 70.84737720912672 33.430656237237805 2.1548097 .13382126 "Hangu"             2564
 35  13 72.89508128480321  34.00501008422797  2.903675  .1875543 "Haripur"           7846
 48  14 71.10924739977347 33.137505729535434 2.6047754  .2563766 "Karak"             5140
 60  15  71.5394672122824  33.45592088037413  3.987598  .2885411 "Kohat"             7459
 61  16 73.22681996339976 35.326556717856235 4.7075524  .7589301 "Kohistan"          1050
 65  17 70.83372375416293  32.59846602201369  2.557964   .305942 "Lakki Marwat"      7403
 73  18 71.89549074696235  34.53961904383832  1.657575  .0926152 "Malakand PA"       5630
 75  19 73.44297497648662  34.66648399310128  4.276061  .4221201 "Mansehra"          9673
 90  20 71.98608337906539  33.92841726337082 2.1039987 .16916557 "Nowshera"          9537
107  21 72.73041969747807 34.847820027112284  2.130305 .13688222 "Shangla"           2418
117  22 72.45922806355912  34.13610181558486 2.2540567 .14477351 "Swabi"             9409
118  23 72.46511237961398 35.250680745507815  4.142781 .53154844 "Swat"              7632
121  24 70.39242698179864  32.24356165350789  1.905292 .16023873 "Tank"              1853
125  25 72.83423351038117  34.55477365282171 1.3609056  .0458548 "Tor Ghar"           370
  7 138  71.5046286351669  34.75315305669825 1.9832087 .13402374 "Bajaur"            2127
 57 140 71.06682179824192  33.95970556478228 3.4295936  .2675712 "Khyber"            4603
 63 141 70.32403406672428 33.708348327156706 3.0417414  .3305859 "Kurram"            1610
 81 142 71.33862847620053 34.465521288680584 2.4713986 .22176823 "Mohmand"            817
 89 143 69.98518851023353  32.95383641401969 3.8699205  .4752517 "North Waziristan"  1150
 93 144 70.98913307258228  33.70000231484342  2.517406 .13378064 "Orakzai"            605
114 145  69.7494046231574  32.31388890741291  4.787823  .5979149 "South Waziristan"  1024
end

The Code i use is as below:

Code:

* derive from centroids anchor and label points to mark on map later for those
* district names that are difficult to read due to small area            
preserve
generate double x_anchor = _CX
generate double y_anchor = _CY
generate double x_label = x_anchor
generate double y_label = y_anchor
generate stot_complaints_dist = string(tot_complaints_dist)

keep _ID district_new ADM2_EN x_anchor y_anchor x_label y_label stot_complaints_dist
replace x_label = 72.08102399300932 in 1
replace y_label = 33.26826304010452 in 1

replace x_label = 70.71738341794575 in 3
replace y_label = 34.44465505806053 in 3

replace x_label = 71.04545004152172 in 7
replace y_label = 35.14476356450102 in 7

replace x_label = 70.51738341794575 in 18
replace y_label = 34.83961904383832 in 18

replace x_label = 72.45922806355912 in 22
replace y_label = 33.83610181558486 in 22

replace x_label = 73.83423351038117 in 25
replace y_label = 34.1477365282171 in 25


* dataex _ID district_new ADM2_EN x_anchor y_anchor x_label y_label

    
* Example generated by -dataex-. To install: ssc install dataex
*clear
*input int _ID float district_new str19 ADM2_EN double(x_anchor y_anchor x_label y_label)
* 96   1 "Peshawar"         71.57102399300932  33.96826304010452 72.08102399300932  33.26826304010452
* 76   2 "Mardan"           72.09787154023972  34.31495116535756 72.09787154023972  34.31495116535756
* 15   3 "Charsadda"        71.71738341794575  34.24465505806053 70.71738341794575  34.44465505806053
*  8   4 "Bannu"            70.65651337271892  32.91955405915595 70.65651337271892  32.91955405915595
* 21   5 "Dera Ismail Khan" 70.69899065440383 31.869962607712854 70.69899065440383 31.869962607712854
*127   6 "Upper Dir"        72.03566989875996  35.28192972957805 72.03566989875996  35.28192972957805
* 72   7 "Lower Dir"        71.84545004152172  34.84476356450102 71.04545004152172  35.14476356450102
*  1   8 "Abbottabad"       73.26998212643791 34.108332575252426 73.26998212643791 34.108332575252426
* 10   9 "Batagram"         73.12267415270053  34.79141270761624 73.12267415270053  34.79141270761624
* 12  10 "Buner"            72.51782763039995  34.46521312932733 72.51782763039995  34.46521312932733
* 17  11 "Chitral"          72.22485265048144  36.22554788585233 72.22485265048144  36.22554788585233
* 34  12 "Hangu"            70.84737720912672 33.430656237237805 70.84737720912672 33.430656237237805
* 35  13 "Haripur"          72.89508128480321  34.00501008422797 72.89508128480321  34.00501008422797
* 48  14 "Karak"            71.10924739977347 33.137505729535434 71.10924739977347 33.137505729535434
* 60  15 "Kohat"             71.5394672122824  33.45592088037413  71.5394672122824  33.45592088037413
* 61  16 "Kohistan"         73.22681996339976 35.326556717856235 73.22681996339976 35.326556717856235
* 65  17 "Lakki Marwat"     70.83372375416293  32.59846602201369 70.83372375416293  32.59846602201369
* 73  18 "Malakand PA"      71.89549074696235  34.53961904383832 70.51738341794575  34.83961904383832
* 75  19 "Mansehra"         73.44297497648662  34.66648399310128 73.44297497648662  34.66648399310128
* 90  20 "Nowshera"         71.98608337906539  33.92841726337082 71.98608337906539  33.92841726337082
*107  21 "Shangla"          72.73041969747807 34.847820027112284 72.73041969747807 34.847820027112284
*117  22 "Swabi"            72.45922806355912  34.13610181558486 72.45922806355912  33.83610181558486
*118  23 "Swat"             72.46511237961398 35.250680745507815 72.46511237961398 35.250680745507815
*121  24 "Tank"             70.39242698179864  32.24356165350789 70.39242698179864  32.24356165350789
*125  25 "Tor Ghar"         72.83423351038117  34.55477365282171 73.83423351038117  34.1477365282171
*  7 138 "Bajaur"            71.5046286351669  34.75315305669825  71.5046286351669  34.75315305669825
* 57 140 "Khyber"           71.06682179824192  33.95970556478228 71.06682179824192  33.95970556478228
* 63 141 "Kurram"           70.32403406672428 33.708348327156706 70.32403406672428 33.708348327156706
* 81 142 "Mohmand"          71.33862847620053 34.465521288680584 71.33862847620053 34.465521288680584
* 89 143 "North Waziristan" 69.98518851023353  32.95383641401969 69.98518851023353  32.95383641401969
* 93 144 "Orakzai"          70.98913307258228  33.70000231484342 70.98913307258228  33.70000231484342
*114 145 "South Waziristan"  69.7494046231574  32.31388890741291  69.7494046231574  32.31388890741291
*end


* identify the list of special cases that need to be moved outside the map
gen rconnect = 1 if inlist(ADM2_EN,"Peshawar","Charsadda","Malakand PA","Lower Dir")
replace rconnect = 0 if inlist(ADM2_EN,"Swabi","Tor Ghar")

* create/make 2 labels for each district
expand 2
bysort district_new: gen lgroup = _n
generate s = cond(mi(rconnect),ADM2_EN,ADM2_EN + ", " + stot_complaints_dist) if lgroup==1
replace s = stot_complaints_dist if mi(rconnect) & lgroup==2     
save "xy_dbase_locations.dta", replace
restore


* NOW TO MAKE SQUARE MARKINGS FOR LOCATIONS
preserve
clear
input float(obs x y rconnect)
1  .  . 0
2 -1  0 0
3 -1  1 0
4  1  1 0
5  1 -1 0
6 -1 -1 0
7 -1  0 0
1  .  . 1
2  1  0 1
3  1 -1 1
4 -1 -1 1
5 -1  1 1
6  1  1 1
7  1  0 1
end
tempfile poly
save "`poly'"

* group label by whether they connect to the middle-right or not
use "xy_dbase_locations.dta", clear
keep if !mi(rconnect)
keep if lgroup == 1
sort rconnect ADM2_EN
keep _ID ADM2_EN x_label y_label rconnect
list , sepby(rconnect)

* form all pairwise combinations with the polygon points
joinby rconnect using "`poly'"
isid rconnect ADM2_EN obs, sort

* scale using 30 km for each unit segment
gen x_sq = x_label + cond(rconnect,130,-130) + 30 * x
gen y_sq = y_label + 30 * y
save "squares.dta", replace

* combine with the composite map's coordinates
keep _ID x_sq y_sq
rename (x_sq y_sq) (_CX _CY)
keep _ID _CX _CY
append using "./master/merged_v1"
sort _ID, stable
save "xy_coor_with_squares.dta", replace

restore

* get the state's anchor location
use "xy_dbase_locations.dta", clear
keep if !mi(rconnect)
keep if lgroup == 1
sort ADM2_EN
keep _ID ADM2_EN x_anchor y_anchor rconnect
list , sepby(rconnect)
tempfile anchors
save "`anchors'"

* each polyline contains at least 3 obs, starting with a missing value.
* the second obs constains the location of the square connect point
use _ID ADM2_EN x_sq y_sq obs if obs <= 3 using "squares.dta", clear
merge m:1 ADM2_EN using `"`anchors'"', assert(match) nogen
isid ADM2_EN obs, sort

* move the label location as the final point in the polyline
gen _X = cond(obs == 3, x_anchor, x_sq)
gen _Y = cond(obs == 3, y_anchor, y_sq)

drop ADM2_EN

save "connectors.dta", replace


use "merged_v1", clear


    
* Test RUN

    #delimit ;
    
    grmap tot_complaints_dist,  
    title("Total Complaints Per District", size(*1) color(gs4))
    caption("source: KPK Administrative Data", size(*0.5) color(gs10))
    fcolor("230 250 250" "230 200 200" "230 150 150" "230 100 100" "230 50 50" "230 25 25" gs8)
    ocolor(white ..)
    osize(medthin ..)
    legenda(on)
    legstyle(3)
    legend(ring(0) position(4))
    plotregion(icolor(white))
    graphregion(icolor(white))
    legtitle("Complaint Totals")
    clbreaks(0 1001 2001 3001 5001 10001 15001 40000)
    clmethod(custom)
    
    legend(label(2 "0 to 1,000") label(3 "1,001 to 2,000")  label(4 "2,001 to 3,000")  
    label(5 "3,001 to 5,000") label(6 "5,001 to 10,000" ) label(7 "10,001 to 15,000") label(8 "35000+"))
    
    label(data("xy_dbase_locations.dta") xcoord(x_label) ycoord(y_label) by(lgroup) label(s)
    color(gs3) size(*0.55 ..) pos(0 6))
    line( data("./connectors.dta") )
    
    ;
    
    #delimit cr

Click image for larger version

Name: Sample.png
Views: 1
Size: 78.9 KB
ID: 1574321

Comment

Fahad Mirza

Join Date: Sep 2018

Posts: 241
#12

03 Oct 2020, 13:29

Anyone that may be able to help?
Comment
Nick Barton

Join Date: Apr 2015

Posts: 19
#13

23 Sep 2024, 13:40

While it is much too late for you Fahad, maybe it will help someone else trying to use this thread to help them later. I think your biggest problem is in the code:

Code:

* scale using 30 km for each unit segment gen x_sq = x_label + cond(rconnect,130,-130) + 30 * x gen y_sq = y_label + 30 * y save "squares.dta", replace

The original example switched to planar coordinates allowing for everything to be measured in km. You are using a coordinate system measured in degrees.
In my case when working with degrees for a map of Sindh, I found that multiplying x and y by 0.1 each worked nicely to get a 0.2 x 0.2 degree square
Comment

Announcement