Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • seeking help to calculate euclidean distances in a large dataset

    Hello! I have a large dataset derived from a questionnaire in which we asked several thousand people to name up to 5 places that they go to and we subsequently assigned lat/long coordinates to these places. Some participants only named 1 or 2 places and some named more. What I'd like to do is calculate each participant's geographic "range". This basically means, for each participant, calculating the distances of every combination of places they named and then being able to identify the largest of those distances for every person. (For some the distance will be 0, if they only named 1 place.) I can imagine that there are several different ways to do this and I've made a few attempts and learned a few things over the last few hours, but I haven't really found the solution yet. Any tips or example code would be very much appreciated!
    -Hilary

  • #2
    The Earth is not flat despite the claims of some. Therefore, I think what you want are geodetic distances. Have a look at Robert Picard's geodist command (SSC)

    Having data in the form below, you can proceed as follows:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str20 City float(Latitude Longitude)
    1 "Oslo"                 59.91 10.75
    1 "Stavanger"            58.97  5.71
    1 "Trondheim"            63.44  10.4
    1 "Fredrikstad-sarpsbor" 59.24 10.94
    1 "Drammen"              59.75  10.2
    2 "Porsgrunn-skien"      59.15  9.66
    2 "Kristiansand"         58.15  7.99
    2 "Oslo"                 59.91 10.75
    2 "Stavanger"            58.97  5.71
    2 "Trondheim"            63.44  10.4
    2 "Fredrikstad-sarpsbor" 59.24 10.94
    2 "Trondheim"            63.44  10.4
    2 "Fredrikstad-sarpsbor" 59.24 10.94
    3 "Drammen"              59.75  10.2
    3 "Porsgrunn-skien"      59.15  9.66
    3 "Kristiansand"         58.15  7.99
    end
    Code:
    *CREATE PAIRWISE COMBINATIONS BY ID
    preserve
    rename (City Latitude Longitude) (City2 Latitude2 Longitude2)
    tempfile City2
    save `City2'
    restore
    joinby id using `City2'
    drop if City==City2
    
    *INSTALL COMMANDS
    ssc install geodist
    
    *CALCULATE DISTANCES
    geodist Latitude Longitude Latitude2 Longitude2, gen(distance)
    
    
    *LIST MAX DISTANCES BY ID
    sort id distance
    by id: list City City2 distance if _n==_N
    Code:
    . by id: list City City2 distance if _n==_N
    
    -----------------------------------------------------------
    -> id = 1
    
         +-----------------------------------+
         |      City       City2    distance |
         |-----------------------------------|
     20. | Stavanger   Trondheim   557.93652 |
         +-----------------------------------+
    
    -------------------------------------------------------------
    -> id = 2
    
         +--------------------------------------+
         |      City          City2    distance |
         |--------------------------------------|
     52. | Trondheim   Kristiansand   603.76359 |
         +--------------------------------------+
    
    ---------------------------------------------------------------
    -> id = 3
    
         +------------------------------------+
         |         City     City2    distance |
         |------------------------------------|
      6. | Kristiansand   Drammen   218.94267 |
         +------------------------------------+

    Comment


    • #3
      Thank you very much! You've helped immensely. For the benefit of others, here is my complete code to do this. The last 3 lines result in a dataset with only one record per person, with the maximum of all the calculated distances for each person retained.

      use "example.dta"
      preserve
      rename place place2
      rename lat lat2
      rename lon lon2
      tempfile place2
      save `place2'
      restore
      joinby sc using `place2'
      drop if place==place2
      geodist lat lon lat2 lon2, gen(distance)
      sort sc distance
      gen maxdistance=.
      by sc: replace maxdistance=distance if _n==_N
      keep if maxdistance!=.

      Comment


      • #4
        If possible, could you please confirm that the unit is "kilometers"?
        Last edited by Hilary Walsh; 19 Apr 2018, 12:00.

        Comment


        • #5
          Yes, the default is kilometers. If you want distance in miles, specify this as an option, e.g.,

          Code:
           geodist Latitude Longitude Latitude2 Longitude2, miles gen(distance)
          All this information and more is available at

          Code:
          help geodist
          Added note: You can discard this line in your code

          drop if place==place2
          so that you may identify those who have only specified one location (in which case distance= 0). This does not affect max distance for those who have specified multiple locations.
          Last edited by Andrew Musau; 19 Apr 2018, 12:50. Reason: Added note

          Comment

          Working...
          X