Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New geoplot command to draw maps available from SSC

    Over the last month or so I developed a new command to draw maps. The command relies on frames and thus requires Stata 17 or newer. To install the command, type

    Code:
    ssc install geoplot, replace
    ssc install palettes, replace  // if not already installed
    ssc install colrspace, replace // if not already installed
    ssc install moremata, replace  // if not already installed
    Basic functionality of geoplot is similar to the one offered by the well-known and much appreciated spmap command by Maurizio Pisati, but geoplot has more streamlined syntax (i.e. more similar to usual graph syntax in Stata), is more flexible in terms of how layers of objects can be combined/arranged, and should also be easier to expand.

    The geoplot package contains two commands, geoframe and geoplot. The general procedure is to first use geoframe to prepare one or several frames containing the source data (e.g. shape files translated by official Stata's spshape2dta command) and then use these frames in geoplot to draw a map. Multiple layers of elements such as regions, borders, lakes, roads, labels, and symbols can be freely combined and the look of the elements (e.g. their color) can be varied depending on the values of variables. See https://github.com/benjann/geoplot/ for a few examples.

    I thank Asjad Naqvi for many valuable comments during the development of this command.

    ben

  • #2
    An update to geoplot is available from SSC. To install the update, type

    Code:
    . ssc install geoplot, replace
    or use the adoupdate command.

    Main additions, apart from minor fixes and refinements, are:
    • geoplot has a new zoom() option to increase the size of selected objects; the option can be repeated to create multiple zooms (see examples in this post)
    • new command geoframe bbox can be used to compute (minimum-area) bounding boxes, convex hulls, and minimum enclosing circles
    • various Mata functions used by geoplot and geoframe are now exposed in Mata library lgeoplot.mlib so that they can be used directly by programmers; see help lgeoplot_source after updating
    Asjad Naqvi compiled an awesome post on geoplot at https://medium.com/the-stata-guide/m...t-a764cf42688a. Check it out.

    ben

    Comment


    • #3
      Dear Ben,

      Thank you very much for this new and very useful 'geoplot' command. I also went through the great post by Asjad Naqvi.

      Yet, I am still wondering whether and how it is possible to add lines or tiles on specific administrative areas on a map.
      In particular, the subnational areas on my map (administrative level 2) have different characteristics and this is highlighted by different color codes. In addition to the colors, I want to distinguish some areas (the same admin level 2) based on a different set of characteristics. I can do that with points, by adding circles or diamonds on the map, but it becomes quite messy (as administrative areas are small). I am attaching an example of the desired map. Any chance that it is possible to add lines, tiles or other distinctive geometric features (in addition to the colors) with geoplot?

      Thank you very much!
      Ersilia
      Attached Files
      Over the last month or so I developed a new command to draw maps. The command relies on frames and thus requires Stata 17 or newer. To install the command,

      Comment


      • #4
        Ben Jann first of all, thank you so much for a brilliant command!

        I am wondering if there is a bug in geoplot.

        Specifically, when I have missing data, and I use the clegend() option, the legend for the missing data disappears.

        To reproduce, using your example here, if I modify as follows:


        Code:
        frame change regions
        gen fortell_miss = fortell in 6/l
        and then run

        Code:
        geoplot (area regions fortell_miss) (line regions), ///
            clegend(pos(3)) zlabel(#3)
        There is no legend for missing data. However, if I simply run
        Code:
        geoplot (area regions fortell_miss) (line regions)
        it shows up fine. Am I missing something?


        The errant plot is below:
        Click image for larger version

Name:	Screenshot 2023-07-27 at 1.15.13 PM.png
Views:	1
Size:	382.1 KB
ID:	1721982
        Last edited by Hemanshu Kumar; 27 Jul 2023, 01:46.

        Comment


        • #5
          Ah, I figured out the mistake I was making in #4 -- at least partly. As you mention on your site, as of May 21, 2023 the following behaviour is enforced:
          Code:
          - clegend() no longer include missing by default; specify suboption -missing-
          to include missing
          By including the missing sub option in clegend, I can get the colour for the missing value to show up in the legend. I still have not been able to figure out how to label it, though. The help file suggests using the missing() option for this, but I cannot get that to work, as of now.

          Comment


          • #6
            Hi Hemanshu, clegend() uses a continuous axis to create the legend which is somewhat at odds with displaying a category for missing. geoplot solves this by adding an extra bin at the bottom of the scale and labels it as "no data". Here is how this looks like by default:

            Code:
            geoplot (area regions fortell_miss) (line regions), clegend(missing)
            Click image for larger version

Name:	Graph.png
Views:	1
Size:	65.6 KB
ID:	1722706

            You can change the default label as follows:

            Code:
            geoplot (area regions fortell_miss, missing(label("N.A."))) (line regions), ///
                clegend(missing)
            Click image for larger version

Name:	Graph.png
Views:	1
Size:	65.3 KB
ID:	1722707

            If you use option zlabel() to modify the labels, you have to make sure to also set the label for missing. Example:

            Code:
            geoplot (area regions fortell_miss) (line regions), ///
                clegend(missing) zlabel(4 "no data" 8 17 26)
            Click image for larger version

Name:	Graph.png
Views:	1
Size:	61.1 KB
ID:	1722708

            However, a better approach to get nice labels probably is to set the color cuts to nice values in the first place:

            Code:
            geoplot (area regions fortell_miss, cuts(7(4)27)) ///
                (line regions),clegend(missing)
            Click image for larger version

Name:	Graph.png
Views:	1
Size:	61.8 KB
ID:	1722709

            ben

            Comment


            • #7
              Hi Ersilia, you can add titles using for the regions using layertype label. There is an example at https://github.com/benjann/geoplot/.

              Adding patterns to areas as in the picture you posted is currently not supported. You would have to compute the coordinates of the lines (which is complicated) and then add them to the graph using layertype line.

              Adding a pattern is easier (but still somewhat involved) if the pattern is composed of points rather than lines. The idea is to first generate data containing a regular grid covering the whole map and then do a spacial join to merge grid points and regions. Here is an example:

              Code:
              // get data
              local url shp/it/
              geoframe create regions  `url'Italy-RegionsData.dta, id(id) coord(xcoord ycoord) ///
                  shpfile(`url'Italy-RegionsCoordinates.dta)
              
              // create a frame containing grid points
              frame regions_shp: su _X _Y
              frame change default
              gen _X = .
              gen _Y = .
              mata:
                  step = 10000
                  xmin = 310000
                  xmax = 1320000
                  ymin = 4070000
                  ymax = 5230000
                  X = range(xmin,xmax,step)
                  Y = range(ymin,ymax,step)
                  XY = J(0,2,.)
                  for (i=1;i<=rows(X);i++) XY = XY \ (J(rows(Y),1,X[i]), Y)
                  st_addobs(rows(XY))
                  st_store(., ("_X","_Y"), XY)
              end
              
              // merge grid points and regions (i.e. add the relevant region ID to each
              // grid point; the ID will be missing for grid points that are not inside
              // of any of the regions)
              geoframe spjoin regions
              geoplot (area regions fortell) (point default, coord(_X _Y) ms(p))
              Click image for larger version

Name:	g1.png
Views:	1
Size:	203.5 KB
ID:	1722713


              Code:
              geoplot (area regions fortell) (point default if _ID<., coord(_X _Y) ms(p))
              Click image for larger version

Name:	g2.png
Views:	1
Size:	108.2 KB
ID:	1722714


              Code:
              // add more information to the grid dataset and plot grid depending on this
              // information
              geoframe copy regions zone
              geoplot (area regions fortell) ///
                  (point default if zone==1, coord(_X _Y) ms(p) color(red)) ///
                  (point default if zone==3, coord(_X _Y) ms(p) color(blue))
              Click image for larger version

Name:	g3.png
Views:	1
Size:	100.4 KB
ID:	1722715


              ben

              Comment


              • #8
                An update to geoplot is now available from SSC (various smaller and larger changes; see changes since 02jul2023 at https://github.com/benjann/geoplot/). Type
                Code:
                . ssc install geoplot, replace
                to install the update.

                Slides from my UK Stata Conference presentation on geoplot are available from https://ideas.repec.org/p/boc/lsug23/23.html.

                ben

                Comment


                • #9
                  A change that might be relevant for some is that Stata 16 is now also supported, not only Stata 17 and Stata 18.

                  Comment


                  • #10
                    Hi everyone,

                    I obtain the following error in doing geoplot:

                    Code:
                    . geoplot ///
                    >         (area Spain_provinc obs_by_zipcode_solar, color(sfso violet, reverse) cuts(0 50 100 200 300 400 500 1500 2500 4500)) ///
                    >         (line Spain_auton, lwidth(0.4)) ///
                    >         (line Spain_provinc, lwidth(0.1)) ///
                    >         (label Spain_provinc NAMEUNIT, size(tiny) color(black)), tight legend(pos(9))
                    coordinates not found; declare coordinates using geoframe
                    (error in layer 1: area ...)
                    r(111);
                    
                    end of do-file
                    
                    r(111);
                    I don't find my error. Could anyone help me please?
                    Here is the all bunch of code trying to produce a geoplot:

                    Code:
                    // global path***
                    global path_geoloc "C:/Users/miduarte/Desktop/test_HolaLuz_Data/stata/0_rawdata/stata_raw_data/dofiles/02112023_EEL_ERG_Meeting/lineas_limite/SHP_ETRS89"
                    
                    global export_contratos_data "C:/Users/miduarte/Desktop/test_HolaLuz_Data/stata/0_rawdata/stata_raw_data"
                    
                    global EEL_ERG_02112023 "C:/Users/miduarte/Desktop/test_HolaLuz_Data/stata/0_rawdata/stata_raw_data/dofiles/02112023_EEL_ERG_Meeting"
                    
                    ***
                    //relative path
                    cd ${export_contratos_data}
                    
                    use export_contratos.dta, clear
                    
                    preserve
                    
                    gen solar_dummy = !missing(fecha_operativa_pv)
                    keep if solar_dummy == 1
                    drop if tariff_ekon_id == "30" | tariff_ekon_id == "30TD"
                    
                    tempfile solar_panel_info
                    save `solar_panel_info', replace
                    
                    restore
                    
                    
                    clear all
                    use `solar_panel_info', clear
                    tostring sp_zipcode, gen(my_newzip) format(%07.0f)
                    gen sp_zipcode_twodigits = substr(my_newzip,3, 2)
                    replace sp_zipcode_twodigits = substr(my_newzip,3, 2)
                    destring sp_zipcode_twodigits, replace
                    
                    *egen #observations, by zipcode
                    egen tag_zip = tag(id sp_zipcode_twodigits)
                    egen obs_by_zipcode_solar = total(tag_zip), missing by(sp_zipcode_twodigits)
                    drop tag_zip
                    
                    
                    collapse obs_by_zipcode_solar, by(sp_zipcode_twodigits)
                    sort sp_zipcode_twodigits
                    
                    
                    rename sp_zipcode_twodigits _ID
                    
                    
                    
                    // Be careful with _ID, as they can diverge between datasets! Check at it if necessary!
                    export excel using ${EEL_ERG_02112023}/data/solar_panel_info_collapse.xlsx, firstrow(variables) replace
                    
                    
                    
                    
                    ***
                    cd ${path_geoloc}
                    
                    
                    spshape2dta "./recintos_autonomicas_inspire_peninbal_etrs89/recintos_autonomicas_inspire_peninbal_etrs89.shp", replace saving(Spain_auton)
                    spshape2dta "./recintos_provinciales_inspire_peninbal_etrs89/recintos_provinciales_inspire_peninbal_etrs89.shp", replace saving(Spain_provinc)
                    
                    
                    *1)Creation of Provincias limits
                    geoframe create Spain_provinc, replace
                    geoframe describe Spain_provinc    
                    
                    *2) Merge data to Provincias, to be filled (do that only if data_to_plot does not exist beforehand. Otherwise, go from 1) to 3).
                    cd ${EEL_ERG_02112023}/data
                    import excel using solar_panel_info_collapse, firstrow clear
                    save ${path_geoloc}/solar_panel_info_collapse.dta, replace
                    
                    
                    *3)Some data cleaning, if needed
                    geoframe describe Spain_provinc
                    merge 1:1 _ID using solar_panel_info_collapse.dta, nogenerate
                    drop in -1/l // erase "Territorio..."
                    save Spain_provinc, replace
                    
                    cd ${path_geoloc}
                    *4)Creation of CCAA limits
                    geoframe create Spain_auton
                    geoframe describe Spain_auton_shp
                    
                    *5)Creation of solar_panel_info_collapse connexion
                    geoframe create solar_panel_info_collapse, replace
                    geoframe describe solar_panel_info_collapse
                    
                    
                    geoplot ///
                        (area Spain_provinc obs_by_zipcode_solar, color(sfso violet, reverse) cuts(0 50 100 200 300 400 500 1500 2500 4500)) ///
                        (line Spain_auton, lwidth(0.4)) ///
                        (line Spain_provinc, lwidth(0.1)) ///
                        (label Spain_provinc NAMEUNIT, size(tiny) color(black)), tight legend(pos(9))
                    
                    
                    cd ${EEL_ERG_02112023}/figures
                    
                    graph export map_obs_by_zipcodes_solarp_geoplot.png, replace
                    graph export map_obs_by_zipcodes_solarp_geoplot.pdf, replace
                    Thank you.

                    All the best,
                    Michael

                    Comment


                    • #11
                      Dear Prof. Ben Jann and others,

                      I want to replicate Fig. 2 in the below paper, where the paper's authors used QGIS to produce it. They provided data to replicate the figure in supporting information Section (S1 data and S2 data). I would greatly appreciate if any could help me out.

                      Paper: https://journals.plos.org/globalpubl...0000463#sec017

                      Thank you.

                      Comment


                      • #12
                        Hi Michael, from the error message it seems that geoplot cannot find the coordinate variables in frame Spain_provinc. This is surprising because it seems to you created Spain_provinc using spshape2dta. Without seeing the data I cannot say what is going wrong. Would it be possible for you to share the data? ben

                        Comment


                        • #13
                          Originally posted by Ben Jann View Post
                          Hi Michael, from the error message it seems that geoplot cannot find the coordinate variables in frame Spain_provinc. This is surprising because it seems to you created Spain_provinc using spshape2dta. Without seeing the data I cannot say what is going wrong. Would it be possible for you to share the data? ben
                          Dear Prof. Ben Jann,

                          Could you please take a look at my request in #11? I would appreciate your solutions if any. Thank you.

                          Comment


                          • #14
                            What is going wrong in #10, I believe, is that you accidentally load other data into frame Spain_provinc. Here is a copy of the relevant section of your code:

                            Code:
                            ...
                            *1)Creation of Provincias limits
                            geoframe create Spain_provinc, replace
                            geoframe describe Spain_provinc    
                            
                            *2) Merge data to Provincias, to be filled (do that only if data_to_plot does not exist beforehand. Otherwise, go from 1) to 3).
                            cd ${EEL_ERG_02112023}/data
                            import excel using solar_panel_info_collapse, firstrow clear
                            save ${path_geoloc}/solar_panel_info_collapse.dta, replace
                            ...
                            You first load the province data into frame Spain_provinc using geoframe create, but later on command import excel overwrites the data, which creates a mess. This is because geoframe create switches focus to the created frame by default.

                            You should switch back to the default frame before loading new data, e.g. type

                            Code:
                            frame change default
                            before running import excel (or run inport excel before running geoframe create). An alternative is to add option nocurrent to geoframe create. This will prevent that geoframe create changes the current frame to the created frame.
                            ben

                            Comment


                            • #15
                              #11: The data you need is in the S2 Data excel file available from https://doi.org/10.1371/journal.pgph.0000463. One approach would be to get world map shape data from some alternative source and then merge the relevant variables from S2 Data into that data. The only challenge here is that you need to make sure that matching country identifiers are available in both sources.

                              However, S2 Data does contain a column providing the coordinates of the polygons for each country, so you could use this information rather than obtaining shape files from somewhere else. The column is called wkt_geom and contains data written by some QGIS export filter, I assume. If you look at the data you see that each cell starts with "MultiPolygon" and then continues with lists of coordinates, grouped by nested parentheses. I quickly wrote a parser to read this data and store it as a shape file in Stata format. The command is called qgis2dta and looks as follows (I did not invest time into making the command robust and efficient, but at least with the S2 Data it works without problem):

                              Code:
                              *! version 1.0.0  22nov2023  Ben Jann
                              
                              /*
                                  Syntax
                              
                                  qgis2dta name shpvar [, replace]
                              
                                  name:    name of the output file; two files will be stored, name.dta and name_shp.dta
                                  shpvar:  name of the variable containing the QGIS shapes
                                  replace: allow overwriting existing files
                              */
                              
                              program qgis2dta
                                  version 16
                                  // syntax
                                  gettoken fn 0 : 0, parse(" ,")
                                  syntax varname(string) [, replace ]
                                  local shpvar `varlist'
                                  // loop over obs and write shapes into temporary frame
                                  tempname frame
                                  frame create `frame' double _ID _X _Y
                                  forv i = 1/`=_N' {
                                      mata: qgis2dta("`frame'", `i', st_sdata(`i', "`shpvar'"))
                                  }
                                  // save attribute file
                                  preserve
                                  drop `shpvar'
                                  capt confirm new var _ID
                                  if _rc drop _ID
                                  qui gen byte _ID = .
                                  qui replace _ID = _n
                                  order _ID
                                  save `fn', `replace'
                                  // save shape file
                                  frame `frame' {
                                      qui compress _ID
                                      save `fn'_shp, `replace'
                                  }
                              end
                              
                              version 16
                              mata:
                              mata set matastrict on
                              
                              // main function to parse MultiPolygon and append to temporary frame
                              void qgis2dta(string scalar frame, real scalar id, string scalar s)
                              {
                                  real scalar   n0, r
                                  string scalar cframe
                                  real matrix   XY
                                  
                                  XY = _qgis2dta(s)
                                  r  = rows(XY)
                                  cframe = st_framecurrent()
                                  st_framecurrent(frame)
                                  n0 = st_nobs()
                                  st_addobs(r)
                                  st_store((n0+1,n0+r), tokens("_ID _X _Y"), (J(r,1,id), XY))
                                  st_framecurrent(cframe)
                              }
                              
                              // check whether is a MultiPolygon and launch recursive parser
                              real matrix _qgis2dta(string scalar s)
                              {
                                  real matrix  XY
                                  
                                  if (substr(s,1,13)!="MultiPolygon ") return(J(1,2,.)) // empty
                                  XY = J(0,2,.)
                                  _qgis2dta_multipoly(XY, substr(s,14,.))
                                  return(XY)
                              }
                              
                              // recursive parser for MultiPolygon
                              void _qgis2dta_multipoly(real matrix XY, string scalar s)
                              {
                                  transmorphic  t
                                  string scalar tok
                                  
                                  if (substr(s,1,1)!="(") {
                                      // reached lowest nesting level; expand the polygon and return
                                      XY = XY \ _qgis2dta_poly(s)
                                      return
                                  }
                                  // recursively collect polygons
                                  t = tokeninit(" ", ",", "()")
                                  tokenset(t, s)
                                  while ((tok = tokenget(t))!="") {
                                      if (tok==",") continue
                                      _qgis2dta_multipoly(XY, substr(tok, 2, strlen(tok)-2))
                                  }
                              }
                              
                              // expand a single polygon
                              real matrix _qgis2dta_poly(string scalar s)
                              {
                                  real scalar      i, n, j
                                  string rowvector S
                                  real rowvector   xy
                                  real matrix      XY
                                  
                                  S = tokens(s, ",")
                                  S = select(S, S:!=",")
                                  n = length(S)
                                  if (!n) return(J(0,2,.)) // empty polygon
                                  XY = J(n, 2, .)
                                  for (i=1;i<=n;i++) {
                                      xy = strtoreal(tokens(S[i]))
                                      j = length(xy)
                                      if (!j) continue
                                      if (j==2)      XY[i,]  = xy
                                      else if (j==1) XY[i,1] = xy        // too few elements
                                      else           XY[i,]  = xy[(1,2)] // too many elements
                                  }
                                  return((.,.) \ XY) // add leading row containing missing
                              }
                              
                              end
                              I also attached a corresponding ado-file to this message. Download the ado-file and store in your working directory. You can then convert the S2 Data to Stata format about as follows:

                              Code:
                              import excel journal.pgph.0000463.s003.xlsx, firstrow
                              keep wkt_geom SOVEREIGNT SOV_A3 PLHIVreceivingART PregnantwomenwhoreceivedARV HIVprevalence
                              qgis2dta HIV wkt_geom
                              This will create two files in the working directory, attribute file HIV.dta and shape file HIV_shp.dta. Note that I removed most of the data from S2 Data and only kept the columns relevant for our replication exercise (I also kept the country names and country codes for sake of completeness, but they will not be used below).

                              You are now all set to start working with the data and reproduce the maps. That is, load the data using geoframe create and then draw maps using geoplot. About as follows:

                              Code:
                              geoframe create HIV
                              geoplot ///
                                  (area HIV PLHIV, lcolor(black) ///
                                      cuts(0 6000 44300 4788139) color(spmap greens) ///
                                      label(@lb-@ub, reverse) ///
                                      missing(nolabel fcolor(none))) ///
                                  , tight legend(pos(sw) bmargin(b=35) colgap(0) layout(1 ///
                                      | - "(33%)" - "(66%)" - "(100%)" | - "1st" - "2nd" - "3rd"))
                              Click image for larger version

Name:	g1.png
Views:	1
Size:	287.2 KB
ID:	1734677


                              Code:
                              geoplot ///
                                  (area HIV Pregnant, lcolor(black) ///
                                      cuts(0 216 1904 247690.3) color(spmap blues) ///
                                      label(@lb-@ub, format(%9.0f) reverse) ///
                                      missing(nolabel fcolor(none))) ///
                                  , tight legend(pos(sw) bmargin(b=35) colgap(0) layout(1 ///
                                      | - "(33%)" - "(66%)" - "(100%)" | - "1st" - "2nd" - "3rd"))
                              Click image for larger version

Name:	g2.png
Views:	1
Size:	291.3 KB
ID:	1734678


                              Code:
                              geoplot ///
                                  (area HIV HIVprev, lcolor(black) ///
                                      cuts(0 0.3 0.8 27.3) color(spmap reds) ///
                                      label(@lb-@ub, format(%9.1f) reverse) ///
                                      missing(nolabel fcolor(none))) ///
                                  , tight legend(pos(sw) bmargin(b=35) colgap(0) layout(1 ///
                                      | - "(33%)" - "(66%)" - "(100%)" | - "1st" - "2nd" - "3rd"))
                              Click image for larger version

Name:	g3.png
Views:	1
Size:	288.7 KB
ID:	1734679


                              The colors are not exactly the same, I just picked some schemes that are somewhat similar. If needed, this could be improved, of course.
                              ben
                              Attached Files

                              Comment

                              Working...
                              X