Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New on SSC: Match geographic locations to shapefile polygons using -geoinpoly-

    Thanks to Kit Baum, geoinpoly is now available from SSC.

    geoinpoly finds, for geographic locations in memory, polygons from a shapefile that spatially overlay the points. This is a point-in-polygon operation, also known as a spatial join in GIS (geographic information system) terminology.

    geoinpoly uses a ray casting algorithm to determine if a point falls inside a polygon. The point's longitude determines the ray used (i.e. meridian or line of longitude that passes by the point's location). geoinpoly assumes that polygon segments represent lines of constant bearing (also referred to as rhumb lines or loxodromes). To precisely calculate the latitude at which a polygon segment intersects the meridian, geoinpoly transforms all geographic coordinates to (x-coor, y-coor) using an ellipsoid Mercator projection. This projection has the property of representing lines of constant bearing as straight lines on the projected plane.

    geoinpoly results should match exactly those produced by GIS software. A test dataset of 573,100 points spread over the state of Texas and its neighbors was matched to U.S. states using a U.S. Census Cartographic Boundary shapefile of all U.S. states using both geoinpoly and ArcGIS. The results were identical.

    geoinpoly incorporates the same divide and conquer strategy used in geonear (from SSC) to significantly reduce the number of polygon segments that need to be counted. On my computer, geoinpoly takes about 30 seconds to process the 573K test dataset; using ArcGIS to perform the equivalent spatial join took 3 1/2 minutes (not really a fair performance test because I ran the ArcGIS test on a remote desktop server).

    To install geoinpoly, type in Stata's command window

    Code:
    ssc install geoinpoly

  • #2
    Have you been able to test this against performance of PostGIS/Postgres?

    Comment


    • #3
      Sorry, I do not use PostGIS/Postgres.

      Comment


      • #4
        Thanks so much, Robert. Is there a .dta file that allows lat/lon to be matched with census tracts?

        Comment


        • #5
          The main page for the U.S. Census TIGER/Line Shapefiles is at

          https://www.census.gov/geo/maps-data...iger-line.html

          For example, you can get the Michigan Census Tract Shapefile for 2010 by
          • select the 2010 tab;
          • click on the Download section;
          • select the Web Interface;
          • select "Census Tracts" as layer type;
          • select Michigan in the Census Tract (2010) pull-down menu;
          • select "All counties in one state-base file";
          • click download
          After you unzip the downloaded file, you use shp2dta (from SSC) to convert the shapefile to Stata datasets. Here's a repeat of the first example of geoinpoly's help file using census tracts. Note that since it takes far more segments to describe all the polygons for all tracts, it will take more time for geoinpoly to perform the statial join.

          Code:
          * Convert the shapefile to Stata datasets
          shp2dta using "tl_2010_26_tract10/tl_2010_26_tract10.shp", ///
              genid(_ID) data("MI_data.dta") coor("MI_coor.dta") replace
          
          * Generate some points at random locations over the Midwest
          set seed 42134123
          clear
          set obs 1000
          gen double _Y = 40 + 8.5 * runiform()
          gen double _X = -90 + 9 * runiform()
          
          * Match the points to census tract polygons
          geoinpoly _Y _X using "MI_coor.dta"
          
          * merge the matched polygons with the database and get attributes
          merge m:1 _ID using "MI_data.dta", keep(master match) nogen
          
          
          * Make a map
          
          rename _ID _ID0
          append using  "MI_coor.dta"
              
          // use the standard projection
          geo2xy _Y _X, gen(lat lon) proj(albers)    
                  
          scatter lat lon if mi(_ID0) & mi(_ID), msymbol(point) mcolor(gs14) || ///
          scatter lat lon if !mi(_ID0), msymbol(point) mcolor(blue) || ///
          line lat lon if !mi(_ID) , lwidth(vthin) lcolor(gray) cmissing(n)  ///
              ylabel(minmax) yscale(off) ///
              xlabel(minmax) xscale(off) ///
              aspectratio(`=r(aspect)') legend(off)

          Comment


          • #6
            Almost a year later, I got to this and your instructions were absolutely spot-on, Robert. Thanks a bunch!

            Comment


            • #7
              Also, installed a couple of minutes ago. Great achievement. Thank you, Robert.
              Best regards,

              Marcos

              Comment


              • #8
                Hi all,
                I successfully used geoinpoly once, and now that I'm using it with some different data, I'm getting this error message: "Results differ when starting from opposite poles. Problem occurs with _ID == 2681, 2688." Any thoughts? Is the problem with the coordinates file or with my data? Is there a way to force the program to create all the _ID values except for 2681 and 2688?
                Thanks so much,
                Brenden

                Comment


                • #9
                  I guess this could happen if the shapefile includes in polygons that are not closed (e.g. the shapefile includes lines or polylines) or if a polygon circles the poles. It might be something else that I overlooked but that's hard to tell without seeing the shapefile.

                  Can you send me by email ([email protected]) the shapefile and a small set of points that trigger the error?

                  In the mean time, you can create a coordinates file that omits the offending polygons using something like

                  Code:
                  use "mycoor.dta"
                  drop if inlist(_ID,2681,2688)
                  save "mycoor2.dta"
                  and try again with the reduced coordinates dataset. Of course if the problem is with the shapefile, it's possible that the problem will reoccur, even with the two _IDs removed.

                  Comment


                  • #10
                    The error was fixed by updating my shp2dta and geoinpoly programs using the command: adoupdate, update

                    Comment

                    Working...
                    X