Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating distances between two points

    Good Morning I am dealing with the following problem:

    I have latitude and longitude for a set of drowning events occurred in the Mediterranean Sea, I also have the data, as a shape file, of the coastal border of Libya Tunisia and Egypt, what I would like to do is: compute all the possible distances between every single drownig event and the coastal border of the countries stated above.
    For example I have that a boat (identified as event 4) capsized at coordinates: 32.6853649 (d_latitude) 25.9012421(d_longitude) I want then to compute all the pairwise distances from that point to each country border and then take the smllest of such distance.
    I need that because after that I am gonna say that the boat departed from Libya (if libya is the closest country), and I will build a variable called number of events by departure point.
    the dataset looks as follows:

    _ID longitude latitude d_latitude d_longitude Year event_id
    3 23.096664 32.575554 33.5784085 11.9031222 2016 1
    3 19.919441 31.757496 32.6853649 25.9012421 2016 2
    3 24.771942 30.303886 32.8708471 12.0631696 2016 3
    3 9.398333 26.153332 37.2912498 13.2358149 2016 4
    3 23.726387 32.174721 35.3626838 15.9465353 2016 5
    3 19.92083 31.727215 33.5259117 8.0491191 2016 6
    3 10.306389 31.713055 33.2286864 12.033724 2016 7
    3 24.646664 32.021385 33.1328706 12.2549685 2016 8


    where:
    • _ID identifies the Country LIbya=3 Tunisia=5 Egypt=2
    • latitude and longitude are instead the coordinates that define the Lybian border (have them for all the countries)
    • d_latitude d_longitude defines the coordinates of where the drowning event happened
    • year goes from 2016 to 2000
    • event_id: is the id of the drowning event
    Also please notice that I have 343 drowning events and many more data for latitude and longitude.
    Does anyone knows a way to do this ??
    I am using STATA 14


    Kind regards,
    Luca

  • #2
    Hello Lucas,

    Welcome to the Stata Forum.

    I gather this text (http://www.stata.com/statalist/archi.../msg00096.html) may interest you.

    Best,

    Marcos
    Best regards,

    Marcos

    Comment


    • #3
      For the benefit of others who would like to find the nearest border point, here's a short example using the sample points above. The first step is to locate a shapefile that describes the borders of the target countries. The following example uses the "World Borders Dataset" from http://thematicmapping.org. The download page and hard link to the shapefile are:

      Code:
      http://thematicmapping.org/downloads/world_borders.php
      http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip
      Once you have downloaded and unzipped the shapefile, you need to input the database and coordinates into Stata datasets using shp2dta (from SSC):
      Code:
      shp2dta using "TM_WORLD_BORDERS-0.3/TM_WORLD_BORDERS-0.3.shp", ///
          data("world_data.dta") coor("world_coor.dta") replace
      Since this is the Mediterranean Sea and you want to find the suspected country of origin and not necessarily the closest border from the drownings (e.g. Italy), you need to create a dataset of coordinates of countries of interest. In this case, you look up the database and find the _ID of these countries:

      Code:
      . use "world_data.dta", clear
      
      . list if inlist(_ID, 107, 201, 50)
      
           +-------------------------------------------------------------------------------------------------------------------+
           | FIPS   ISO2   ISO3    UN                     NAME     AREA    POP2005   REGION   SUBREG~N      LON      LAT   _ID |
           |-------------------------------------------------------------------------------------------------------------------|
       50. |   EG     EG    EGY   818                    Egypt    99545   72849793        2         15   29.872   26.494    50 |
      107. |   LY     LY    LBY   434   Libyan Arab Jamahiriya   175954    5918217        2         15   18.023   27.044   107 |
      201. |   TS     TN    TUN   788                  Tunisia    15536   10104685        2         15    9.596   35.383   201 |
           +-------------------------------------------------------------------------------------------------------------------+
      You then create coordinates for the borders of these countries using something like:
      Code:
      * Restrict to borders of the 3 specified countries
      use "world_coor.dta", clear
      keep if inlist(_ID, 107, 201, 50)
      
      * add an observation identifier
      gen long coor_id = _n
      save "borders.dta", replace
      With these coordinates in hand, to find the nearest border point:
      Code:
      clear 
      input _ID double(longitude latitude d_latitude d_longitude) Year event_id
      3 23.096664 32.575554 33.5784085 11.9031222 2016 1
      3 19.919441 31.757496 32.6853649 25.9012421 2016 2
      3 24.771942 30.303886 32.8708471 12.0631696 2016 3 
      3 9.398333 26.153332 37.2912498 13.2358149 2016 4
      3 23.726387 32.174721 35.3626838 15.9465353 2016 5
      3 19.92083 31.727215 33.5259117 8.0491191 2016 6
      3 10.306389 31.713055 33.2286864 12.033724 2016 7
      3 24.646664 32.021385 33.1328706 12.2549685 2016 8
      end
      
      geonear event_id d_latitude d_longitude using "borders.dta", n(coor_id _Y _X)
      
      * get the polygon of the nearest matching border point and its coordinates
      rename nid coor_id
      rename _ID someID
      merge m:1 coor_id using "borders.dta", keep(master match) nogen
      and the results:
      Code:
      . list Year event_id d_latitude d_longitude coor_id km_to_nid _ID _X _X
      
           +---------------------------------------------------------------------------------------------+
           | Year   event_id   d_latit~e   d_longi~e   coor_id   km_to_nid   _ID          _X          _X |
           |---------------------------------------------------------------------------------------------|
        1. | 2016          1   33.578409   11.903122      1196   55.405744   107   11.797222   11.797222 |
        2. | 2016          7   33.228686   12.033724      1200   23.031751   107   11.908054   11.908054 |
        3. | 2016          8   33.132871   12.254969      1203   26.053769   107   12.077776   12.077776 |
        4. | 2016          3   32.870847    12.06317      1203   9.0888422   107   12.077776   12.077776 |
        5. | 2016          5   35.362684   15.946535      1258   336.58259   107   14.200277   14.200277 |
           |---------------------------------------------------------------------------------------------|
        6. | 2016          2   32.685365   25.901242      1557   117.15758   107   25.029442   25.029442 |
        7. | 2016          4    37.29125   13.235815      2007   193.93444   201   11.067778   11.067778 |
        8. | 2016          6   33.525912   8.0491191      2364   32.309413   201    7.725027    7.725027 |
           +---------------------------------------------------------------------------------------------+

      Comment


      • #4
        Here's some more code that shows the location of the drowning points and the borders of the 3 target countries. The "results.dta" dataset contains the results from the code in #3. Note that geo2xy is from SSC.

        Code:
        * Append the results with the coordinates from the shapefile
        use "results.dta", clear
        append using "borders.dta"
        
        * convert coordinates using Google Maps projection
        gen double xlon = d_longitude
        replace xlon = _X if mi(d_latitude)
        gen double ylat = d_latitude
        replace ylat = _Y if mi(d_latitude)
        geo2xy ylat xlon, replace
        
        * determine the size of the graph using the aspect ratio that
        * is calculated by -geo2xy-
        ret list
        local xw = 6
        local yw = min(`xw' * `r(aspect)',20)
        
        * build options for a making a map, start by removing the legend
        local options legend(off)
        
        * spread the points along the full width of the plot
        local options `options' ylabel(minmax, nogrid) xlabel(minmax, nogrid)
        
        * do not display the axes
        local options `options' yscale(off) xscale(off)
        
        * add a small margin to the plot region so that points are not up to the side
        local options `options' plotregion(margin(small))
        
        * a small margin to the graph region to make a gray border
        local options `options' graphregion(margin(small) fcolor(gs15))
        
        * make a scatter plot of all drownings, use separate colors by nearest country
        scatter ylat xlon if !mi(d_latitude) & _ID == 50,  msymbol(x) mcolor(blue) || ///
        scatter ylat xlon if !mi(d_latitude) & _ID == 107,  msymbol(x) mcolor(red) || ///
        scatter ylat xlon if !mi(d_latitude) & _ID == 201,  msymbol(x) mcolor(green) || ///
        line ylat xlon if mi(lat) , lwidth(vthin) lcolor(gs10) cmissing(n) xsize(`xw') ysize(`yw') `options'
        
        graph export "drowning_map.png", replace width(1000)
        and the resulting map:
        Click image for larger version

Name:	drowning_map2.png
Views:	1
Size:	148.7 KB
ID:	1347066

        Last edited by Robert Picard; 27 Jun 2016, 09:17.

        Comment


        • #5
          Dear Robert and Marcos many thanks for the suggestions, I'll give it a try.

          Regards,

          Luca

          Comment


          • #6
            Luca, note, however, that following your proximity approach you formally need not the distance between points, but distance from a point to a polygon, which is not the same (see illustration):
            a is distance to the closest point of the Red country, b is distance to the closest point of the Blue country, but c is distance to the Red country.

            With the coastline it is probably not a problem, as you will have fine granulation of points along the borders, but in other cases it may matter.

            Best, Sergiy.



            Click image for larger version

Name:	distances.png
Views:	1
Size:	8.9 KB
ID:	1347075

            Comment


            • #7
              Sergiy raises a good point but most shapefiles keep the distance between the vertices quite small to avoid issues related to map projections. I think that my approach is a reasonable approximation of a distance to polygon problem. If you are worried about the distance between polygon vertices, I wrote a program to interpolate points with a maximum distance between each vertices. For example, the following forces each vertex to be at most 100 meters from the next vertex.

              Code:
              use "borders.dta", clear
              geopolyfill, max(.1)
              gen double nextX = _X[_n+1]
              gen double nextY = _Y[_n+1]
              geodist _Y _X nextY nextX, gen(d)
              sum d
              
              replace coor_id = _n
              save "borders_100m.dta", replace
              To sum results are:
              Code:
              . sum d
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                         d |    208,736    .0689984    .0140087   .0498653   .1001657
              Note that this is still a work in progress and no help file yet. Just copy the code below and save it in a file called "geopolyfill.ado", along the ado path (or in the current directory).
              Code:
              *! version 1.0.0  25nov2015 Robert Picard, [email protected]
              program define geopolyfill
              
                  version 9
              
                  syntax [varlist(numeric default=none)], ///
                      [                    ///
                      max(real 1)            /// in kilometers
                      Radius(real 6371)    /// in kilometers
                      ]
                  
                  // if no variables are provided, assume those generated by -shp2dta-
                  if "`varlist'" == "" local varlist _ID _Y _X
              
                  confirm numeric var `varlist'
                  
                  local n : word count `varlist'
                  if `n' != 3 {
                      dis as err "You must specify 3 variables that hold, in order:"
                      dis as err " 1. polygon identifier"
                      dis as err " 2. latitude "
                      dis as err " 3. longitude "
                      exit 198
                  }
                  
                  tokenize `varlist'
                  local poly `1'
                  local lat1 `2'
                  local lon1 `3'
                  
                  cap assert inrange(`lat1',-90, 90) if !mi(`lat1')
                  if _rc {
                      dis as err "The latitude variable should range from -90 to 90"
                      exit 9
                  }
                  cap assert inrange(`lon1',-180, 180) if !mi(`lon1')
                  if _rc {
                      dis as err "The latitude variable should range from -180 to 180"
                      exit 9
                  }
                  
                  // group polygon rings
                  tempvar ring
                  gen long `ring' = sum(mi(`lat1', `lon1'))
                  
                  // make the sort stable
                  tempvar order
                  gen long `order' = _n
                  sort `ring' `order'
                  
                  // ESRI format: minimum of 4 points (plus missing coordinates obs)
                  cap by `ring': assert _N >= 5
                  if _rc {
                      dis as err "Polygon rings must contain at least 4 points to form a closed loop"
                      exit 9
                  }
                  
                  // ESRI format: points must form a closed loop.
                  cap by `ring': assert `lon1'[2] == `lon1'[_N]
                  local rc = _rc
                  cap by `ring': assert `lat1'[2] == `lat1'[_N]
                  if `rc' | _rc {
                      dis as err "Polygon rings must form a closed loop; first and last vertex do not match"
                      exit 9
                  }
                  
                  tempname d2r
                  scalar `d2r' = c(pi) / 180
                  
                   tempvar lat2 lon2 d midpoint mlat mlon
                 
                  local more 1
                  qui while `more' {
                  
                      gen double `lat2' = `lat1'[_n+1]
                      gen double `lon2' = `lon1'[_n+1]
                      
                      // the following is equivalent of using -geodist- (from SSC)
                      // geodist `lat1' `lon1' `lat2' `lon2', gen(`d') sphere
                      qui gen double `d' =  2 * asin(min(1,sqrt( ///
                          sin((`lat2' - `lat1') * `d2r' / 2)^2 + ///
                          cos(`lat1' * `d2r') * cos(`lat2' * `d2r') * ///
                          sin((`lon2' - `lon1') * `d2r' / 2)^2))) ///
                          * `radius' if !mi(`lat1',`lon1',`lat2',`lon2')
                  
                      expand 2 if `d' > `max' & !mi(`d')
                      bysort `ring' `order': gen `midpoint' = _n == 2
                      
                      count if `midpoint'
                      local more = r(N)
                      if r(N) {
                      
                          midpoint `lat1' `lon1' `lat2' `lon2' `mlat' `mlon' `midpoint'
                          replace `lat1' = `mlat' if `midpoint'
                          replace `lon1' = `mlon' if `midpoint'
                          replace `order' = _n
                          drop `mlat' `mlon'  `midpoint' `d' `lat2' `lon2'
                          
                      }
                      
                  }
                  
                  sum `d', meanonly
                  dis as txt "longest segment = " as res r(max) " km"
                  
              end
              
              
              program midpoint
              
              /*
              Source: http://www.movable-type.co.uk/scripts/latlong.html
              Formula: loxodromic midpoint
              
              if (Math.abs(λ2-λ1) > Math.PI) λ1 += 2*Math.PI; // crossing anti-meridian
              
              var φ3 = (φ1+φ2)/2;
              var f1 = Math.tan(Math.PI/4 + φ1/2);
              var f2 = Math.tan(Math.PI/4 + φ2/2);
              var f3 = Math.tan(Math.PI/4 + φ3/2);
              var λ3 = ( (λ2-λ1)*Math.log(f3) + λ1*Math.log(f2) - λ2*Math.log(f1) ) / Math.log(f2/f1);
              
              if (!isFinite(λ3)) λ3 = (λ1+λ2)/2; // parallel of latitude
              
              λ3 = (λ3+3*Math.PI) % (2*Math.PI) - Math.PI;  // normalise to -180..+180°
              */
              
                  args lat1 lon1 lat2 lon2 mlat mlon touse
              
                  tempname d2r r2d
                  scalar `d2r' = _pi / 180
                  scalar `r2d' = 180 / _pi
              
                  // convert longitudes to radians
                  tempvar lon1r lon2r
                  gen double `lon1r' = `lon1' * `d2r' if `touse'
                  gen double `lon2r' = `lon2' * `d2r' if `touse'
                  
                  // if (Math.abs(λ2-λ1) > Math.PI) λ1 += 2*Math.PI;
                  replace `lon1r' = `lon1r' + 2 * _pi if abs(`lon2r' - `lon1r') > _pi & `touse'
              
                  // var φ3 = (φ1+φ2)/2;
                  gen double `mlat' = (`lat1' + `lat2') / 2 if `touse'
                  
                  tempvar f1 f2 f3
                  
                  // var f1 = Math.tan(Math.PI/4 + φ1/2);
                  gen double `f1' = tan(_pi/4 + `lat1' * `d2r' / 2) if `touse'
                  
                  // var f2 = Math.tan(Math.PI/4 + φ2/2);
                  gen double `f2' = tan(_pi/4 + `lat2' * `d2r' / 2) if `touse'
                  
                  // var f3 = Math.tan(Math.PI/4 + φ3/2);
                  gen double `f3' = tan(_pi/4 + `mlat' * `d2r' / 2) if `touse'
              
                  // var λ3 = ( (λ2-λ1)*Math.log(f3) + λ1*Math.log(f2) - λ2*Math.log(f1) ) / Math.log(f2/f1);
                  gen double `mlon' = ( (`lon2r' - `lon1r') * log(`f3') + ///
                      `lon1r' * log(`f2') - `lon2r' * log(`f1') ) / log(`f2' / `f1') if `touse'
                  
                  // if (!isFinite(λ3)) λ3 = (λ1+λ2)/2;
                  replace `mlon' = (`lon1r' + `lon2r') / 2 if mi(`mlon') & `touse'
              
                  // λ3 = (λ3+3*Math.PI) % (2*Math.PI) - Math.PI;
                  replace `mlon' = (mod(`mlon' + 3 * _pi, 2 * _pi) - _pi) * `r2d' if `touse'
              
              end
              If you redo the example in #4 using these new coordinates:
              Code:
              clear 
              input _ID double(longitude latitude d_latitude d_longitude) Year event_id
              3 23.096664 32.575554 33.5784085 11.9031222 2016 1
              3 19.919441 31.757496 32.6853649 25.9012421 2016 2
              3 24.771942 30.303886 32.8708471 12.0631696 2016 3 
              3 9.398333 26.153332 37.2912498 13.2358149 2016 4
              3 23.726387 32.174721 35.3626838 15.9465353 2016 5
              3 19.92083 31.727215 33.5259117 8.0491191 2016 6
              3 10.306389 31.713055 33.2286864 12.033724 2016 7
              3 24.646664 32.021385 33.1328706 12.2549685 2016 8
              end
              
              geonear event_id d_latitude d_longitude using "borders_100m.dta", n(coor_id _Y _X)
              
              * get the polygon of the nearest matching border point and its coordinates
              rename nid coor_id
              rename _ID someID
              merge m:1 coor_id using "borders_100m.dta", keep(master match) nogen
              
              list Year event_id d_latitude d_longitude coor_id km_to_nid _ID _X _X
              
              save "results_100m.dta", replace
              the results are:
              Code:
              . list Year event_id d_latitude d_longitude coor_id km_to_nid _ID _X _X
              
                   +---------------------------------------------------------------------------------------------+
                   | Year   event_id   d_latit~e   d_longi~e   coor_id   km_to_nid   _ID          _X          _X |
                   |---------------------------------------------------------------------------------------------|
                1. | 2016          1   33.578409   11.903122     94246   55.405744   107   11.797222   11.797222 |
                2. | 2016          7   33.228686   12.033724     94417   23.016115   107   11.900268   11.900268 |
                3. | 2016          8   33.132871   12.254969     94724   25.862621   107   12.106315   12.106315 |
                4. | 2016          3   32.870847    12.06317     94730   8.3213805   107    12.11082    12.11082 |
                5. | 2016          5   35.362684   15.946535     97870   336.58259   107   14.200277   14.200277 |
                   |---------------------------------------------------------------------------------------------|
                6. | 2016          2   32.685365   25.901242    118886   117.15758   107   25.029442   25.029442 |
                7. | 2016          4    37.29125   13.235815    177243    193.8101   201   11.082182   11.082182 |
                8. | 2016          6   33.525912   8.0491191    199512   32.309413   201    7.725027    7.725027 |
                   +---------------------------------------------------------------------------------------------+
              And to be sure, this is the map of the points and the 100m coordinates:
              Code:
              * Append the results with the coordinates from the shapefile
              use "results_100m.dta", clear
              append using "borders_100m.dta"
              
              * convert coordinates using Google Maps projection
              gen double xlon = d_longitude
              replace xlon = _X if mi(d_latitude)
              gen double ylat = d_latitude
              replace ylat = _Y if mi(d_latitude)
              geo2xy ylat xlon, replace
              
              * determine the size of the graph using the aspect ratio that
              * is calculated by -geo2xy-
              ret list
              local xw = 6
              local yw = min(`xw' * `r(aspect)',20)
              
              * build options for a making a map, start by removing the legend
              local options legend(off) 
              
              * spread the points along the full width of the plot
              local options `options' ylabel(minmax, nogrid) xlabel(minmax, nogrid)
              
              * do not display the axes
              local options `options' yscale(off) xscale(off)
              
              * add a small margin to the plot region so that points are not up to the side
              local options `options' plotregion(margin(small))
              
              * a small margin to the graph region to make a gray border
              local options `options' graphregion(margin(small) fcolor(gs15))
              
              * make a scatter plot of all drownings, use separate colors by nearest country
              scatter ylat xlon if !mi(d_latitude) & _ID == 50,  msymbol(x) mcolor(blue) || ///
              scatter ylat xlon if !mi(d_latitude) & _ID == 107,  msymbol(x) mcolor(red) || ///
              scatter ylat xlon if !mi(d_latitude) & _ID == 201,  msymbol(x) mcolor(green) || ///
              line ylat xlon if mi(lat) , lwidth(vthin) lcolor(gs10) cmissing(n) xsize(`xw') ysize(`yw') `options'
              
              graph export "drowning_map_100m.png", replace width(1000)
              Click image for larger version

Name:	drowning_map_100m.png
Views:	1
Size:	108.4 KB
ID:	1347086

              Comment


              • #8
                Thank you very much for the many suggestions.

                Regards Luca

                Comment


                • #9
                  Hello,
                  ​Which shapefile can I use that describes the distances to nearest main roads or shopping centres in a particular country (Kenya) the same way "World Borders Dataset" was used here.

                  I am using GIS data for Kenya and I would like to construct a variable that shows distances to the nearest main roads or shopping centres.

                  Comment

                  Working...
                  X