Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate the average distance between location A and other locations

    Hi everyone,

    I have a question about how to calculate the average distance between one location to other locations and repeat this process for each of the locations that I have.

    My data looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str36 estate_name_en float(longtitude latitude) double d1 float avg_d
    "A" 114.1539 22.2488 .00003425893386243101 47.4626
    "B"  113.975 22.4032    25.140266644104845       .
    "C" 114.2438 22.4287    21.968478773587996       .
    "D" 114.2286 22.2811      8.48942502095412       .
    "E" 114.1036 22.3783     15.24788118107749       .
    "F" 114.2171 22.3251     10.66804262866889       .
    "G" 114.1562 22.2414     .8529668526375733       .
    "H" 114.1472  22.334      9.45973376143329       .
    "I" 114.2623 22.3064    12.864155508000797       .
    "J" 114.0614 22.3674    16.226922577966796       .
    end
    And the code that I used is below:
    Code:
    geodist 22.2488 114.1539 latitude longtitude , generate(d1)
    egen avg_d=mean(d1)
    replace avg_d=. if _n>1
    The code helps me generate the distance between location A and other locations and calculate the average of these distances, shown in the first value of avg_d.

    Now what I hope to achieve is to repeat this process and do the same for location B, C...J, etc and put the average of the distances in the second value, third value... of avg_d.

    In the end, avg_d would be a variable that indicates the average distance of location _n to all the other locations.

    I'm new to Stata and wasn't able to successfully write a loop to generate this process. It would be great if anyone here can help me out.

    Thank you so much!


  • #2
    To do this with a loop, you have to adapt to what the user-written command -geodist- (available at SSC) requires as arguments:

    Code:
    drop d1 avg_d
    forval i = 1/`=_N' {
      local olat = latitude[`i']
      local olong = longitude[`i']  // note misspelling of longitude in your example
      geodist latitude longitude `olat' `olong', gen(dist`i')
    }
    egen avgdist = rowtotal(dist*)
    replace avgdist = avgdist/(_N-1)  // no self distance
    If your number of locations is not too large relative to your -set maxvar- setting, you also could do this without writing a loop by installing and using a related user-written command from SSC, -geonear-.
    Code:
    drop d1 avg_d
    // -encode- large ID variable to save space
    encode estate_name, gen(selfID)
    //
    tempfile neighbors
    rename selfID nayID
    save `neighbors'
    //
    rename nayID selfID
    local N = _N - 1 // all except self
    geonear selfID latitude longitude using `neighbors', wide nearcount(`N') ///
          neighbors(nayID latitude longitude) ignoreself
    drop nid*
    egen avgdist = rowmean(km*)

    Comment


    • #3
      Originally posted by Mike Lacy View Post
      To do this with a loop, you have to adapt to what the user-written command -geodist- (available at SSC) requires as arguments:

      Code:
      drop d1 avg_d
      forval i = 1/`=_N' {
      local olat = latitude[`i']
      local olong = longitude[`i'] // note misspelling of longitude in your example
      geodist latitude longitude `olat' `olong', gen(dist`i')
      }
      egen avgdist = rowtotal(dist*)
      replace avgdist = avgdist/(_N-1) // no self distance
      If your number of locations is not too large relative to your -set maxvar- setting, you also could do this without writing a loop by installing and using a related user-written command from SSC, -geonear-.
      Code:
      drop d1 avg_d
      // -encode- large ID variable to save space
      encode estate_name, gen(selfID)
      //
      tempfile neighbors
      rename selfID nayID
      save `neighbors'
      //
      rename nayID selfID
      local N = _N - 1 // all except self
      geonear selfID latitude longitude using `neighbors', wide nearcount(`N') ///
      neighbors(nayID latitude longitude) ignoreself
      drop nid*
      egen avgdist = rowmean(km*)
      Hi Mike,

      I just tried the code that you wrote. They work perfectly well. That's exactly what I want! Thank you so much for your support and help. It really means a lot to me!

      Comment

      Working...
      X