Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting geo coordinates to FIPS codes

    Hello!

    My goal is to convert a (large) number of latitudes and longitudes to Federal Information Processing Standard (FIPS) codes. A convenient way to do it is to use the FCC API (https://www.fcc.gov/general/census-b...onversions-api), however its processing rate is relatively slow (e.g., it takes about 10 minutes to process 1000 requests). Considering I have 20 million rows of coordinates to be converted, it will take forever with an API. Therefore, I am elaborating on another approach for which I am seeking your help, dear forum members.

    Firstly, I have a data set #1 (N[1] = about 20 million rows) that contains pairs of geo coordinates to be converted to FIPS codes. The data looks like this:

    Code:
    latitude          longitude
    -37.810138702    144.9573822
    6.0547738075    116.14962006
    50.792110443    -1.0856829882
    40.692913055    29.619443893
    -21.339885712    25.356695175
    33.99703598    -84.705795288
    51.279109955    1.0835139751
    51.55103302    -.07526999712
    53.502655029    -1.1799679995
    6.7345790863    3.4259450436
    ...
    And secondly, I have a data set #2 (N[2] = 3221 rows) that contains FIPS codes with respective geo coordinates. The data looks like this:

    Code:
    FIPS    latitude           longitude
    01001    32.5003890991210938    -86.4941635131835938
    01003    30.5489234924316406    -87.7623825073242188
    01005    31.8440361022949219    -85.3100357055664063
    01007    33.0309219360351563    -87.1276626586914063
    01009    33.9552421569824219    -86.59149169921875
    01011    32.1163253784179688    -85.7011947631835938
    01013    31.7735385894775391    -86.6535491943359375
    01015    33.7254600524902344    -85.8194427490234375
    01017    32.8604393005371094    -85.2664718627929688
    01019    34.1793327331542969    -85.6291961669921875
    ...
    To convert the coordinates, I assume it is possible for the code to (1) grab a pair of coordinates from data set #1, (2) compare it with a list of coordinates in data set #2, and (3) if there is a match (say, to the 5th digit after decimal point), record the corresponding FIPS value to data set #1 (or a missing value if there is no match).

    Your help with this task would be greatly appreciated, as my programming skill set is limited.

  • #2
    Anton,

    I'm not sure that there is a foolproof way to do this with what you have. I would not want to assume that rounding to the 5th decimal point is always going to give you the right answer. There is significant variability in the size of US counties that would call that method into question.

    You could also calculate the distance between the coordinates in question and each set of coordinates in the lookup database using the geodist command from SSC. However, that might misclassify locations that are near the borders of the county and which happen to be closer to the center of a different county.

    If you had a shape database for each FIPS code (i.e., the coordinates of each vertex for each county) you could probably be more precise using methods such as those described by Tim Brophy at the 2013 Stata Conference (http://www.stata.com/meeting/new-orleans13/abstracts/). He described a command called gpsmap, but I don't know if it is available. However, the PowerPoint might be helpful for developing a method or perhaps he can provide more information.

    Sorry I can't be more helpful.

    Regards,
    Joe

    Comment


    • #3
      You can use geoinpoly (from SSC) to precisely match each point to a polygon from a shape file.

      Comment


      • #4
        I was boarding a plane when I pointed you to geoinpoly. Now that I have some time, I think it would be nice to show how to use geoinpoly to match a set of points (lat/lon) to state and county FIPS codes. First, you need to get a shapefile for the US Counties. The following uses the shapefile found on this Census page. The direct link to the shapefile is:
        Code:
        http://www2.census.gov/geo/tiger/GENZ2016/shp/cb_2016_us_county_500k.zip
        You unzip the archive and use shp2dta (from SSC) to convert the shapefile to Stata datasets:
        Code:
        shp2dta using "cb_2016_us_county_500k/cb_2016_us_county_500k.shp", ///
               data("cb_data.dta") coor("cb_coor.dta") genid(cid) gencentroids(cent) replace
        The following generates 15,000 points over the midwest and matches them to US counties. Some of the points will fall into Canada or the Great Lakes.
        Code:
        * generate points over the midwest states
        set seed 42134123
        clear
        set obs 15000
        gen long pointid = _n
        gen double _Y = 40 + 8.5 * runiform()
        gen double _X = -89 + 9 * runiform()
        
        * match each point to county polygons and save the results
        geoinpoly _Y _X using "cb_coor.dta"
        
        * get the matching FIPS code for state and county and save results
        rename _ID cid
        merge m:1 cid using "cb_data.dta", keep(master match) keepusing(STATEFP COUNTYFP) nogen
        save "points2fips.dta", replace
        geoinpoly is very very fast. On my computer, if you redo the example above with 1 million points, geoinpoly finds the matching counties in about 20 seconds.

        To make things more interesting, I now do some dataset gymnastics to make a list of all the states with points and create a map of all the counties in those states. Points that fall in the water or in Canada are colored red. Points with odd county identifiers are colored blue and those from even numbered counties are colored green. I use geo2xy (from SSC) to convert all coordinates to planar using a Google Maps projection.
        Code:
        * get the coordinates of the counties to map
        use "cb_coor.dta", clear
        gen long obs = _n
        rename _ID cid
        merge m:1 cid using "counties_to_map.dta", keep(match) keepusing(cid) nogen
        sort obs
        count
        
        * append the location of the points to map
        append using "points2fips.dta"
        
        * convert lat/lon to planar coordinates; to install, type: ssc install geo2xy
        geo2xy _Y _X, gen(ycoor xcoor)
        
        * the projection details, calculate graph height
        return list
        local yheight = 6 * `r(aspect)'
        
        * make a pretty map, color points from odd numbered polygons differently
        gen odd = mod(cid,2)
        scatter ycoor xcoor if !mi(pointid) & mi(cid), msymbol(point) mcolor(red) || ///
            scatter ycoor xcoor if !mi(pointid) & odd==1, msymbol(point) mcolor(blue) || ///
            scatter ycoor xcoor if !mi(pointid) & odd==0, msymbol(point) mcolor(green) || ///
            line ycoor xcoor if mi(pointid) , lwidth(thin) lcolor(gray) cmissing(n)  ///
                xsize(6) ysize(`yheight') ///
                ylabel(minmax, nogrid) yscale(off) ///
                xlabel(minmax, nogrid) xscale(off) ///
                plotregion(margin(small)) graphregion(margin(small)) ///
                legend(off)
        
        graph export "points2fips.png", width(1200) replace
        and the map produced:
        Click image for larger version

Name:	points2fips.png
Views:	1
Size:	1.53 MB
ID:	1399847

        Comment


        • #5
          Dear Robert,

          I ma very thankful for your comprehensive reply, sir. The task has been successfully accomplished It took about 6-7 minutes to process all 20 million pairs of lats/lons and obtain the FIPS codes. I will provide another updated once do the map plotting.

          Sincerely,
          Anton

          Comment


          • #6
            Hello,

            The above case describes geo points to fips matching in midwester states. I am interested in mapping of geo points to tract for the US.

            The code above creates _Y and _X as following:

            generate points over the midwest states

            set seed 42134123
            clear
            set obs 15000
            gen long pointid = _n
            gen double _Y = 40 + 8.5 * runiform()
            gen double _X = -89 + 9 * runiform()

            My question is whether the above code will also change?

            Best wishes,
            Mg

            Comment


            • #7
              Sorted.

              Comment


              • #8
                Robert Picard
                Robert Picard

                This is a great post. Thank you. By the way, it appears the census data
                (http://www2.census.gov/geo/tiger/GEN...ounty_500k.zip) is based on 2016 county definitions. But there have been several changes in county borders or renaming.

                Do you know where I can obtain analogous file at 1990 county border?

                At this link (https://www2.census.gov/geo/tiger/), I could find "GENZ" goes back to 2010, but not before 2010.

                This (https://www2.census.gov/geo/maps/blk1990/) and this (https://www2.census.gov/geo/maps/trt1990/) have something about 1990, but the structure of data seem to be different?

                Comment

                Working...
                X