Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating distances between geographical coordinates for large sample.


    Hello,

    I have data of geographic coordinates for around 20,000 houses identified by a unique "ID" for each house.
    I am looking at the peer effect for certain outcomes (example, electricity consumption) so I need to identify neighbours using the distance between their houses (the shorter the distance between two houses means that they are direct neighbour and so on). I want to look at close neighbour and then houses within certain distance and so on.

    Here is my dataset:

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID double(Lat Long)
    500001 50.0388155 -110.68781
    500002 50.0387258 -110.6882361
    500003 50.038908 -110.6885515
    500004 50.0389398 -110.6888312
    500005 50.0388173 -110.6891523
    500006 50.0391154 -110.6891256
    500007 50.0388441 -110.6896837
    500008 50.0388129 -110.6898514
    500009 50.0387675 -110.6900962
    500010 50.0388228 -110.6904177
    500011 50.038831 -110.6907113
    500012 50.0387884 -110.6911096
    500013 50.0387835 -110.6913613
    500014 50.038789 -110.6915571
    500015 50.0387529 -110.6919764
    500017 50.0383062 -110.6874445
    500019 50.038298 -110.6881155
    500020 50.03828 -110.6883251
    500022 50.0382674 -110.6887305
    500023 50.0382363 -110.6888982
    500024 50.0382314 -110.6891498
    500025 50.0382506 -110.689835
    500027 50.0383217 -110.6902408
    500028 50.03832 -110.6903
    500029 50.0383064 -110.6905483
    500030 50.0383339 -110.690878
    500031 50.0382966 -110.6910516
    500032 50.0382211 -110.6913449
    500033 50.038305 -110.6915762
    500034 50.038277 -110.6920582
    500035 50.0383087 -110.692338
    500036 50.0382934 -110.6926455
    500037 50.0383355 -110.6928694
    500038 50.038341 -110.6930652
    500039 50.0382261 -110.6932325
    500051 50.03767 -110.6878891
    500052 50.0376755 -110.6880849
    500053 50.0377073 -110.6883647
    500054 50.0376761 -110.6885323
    500055 50.0376816 -110.6887281
    500062 50.0379067 -110.6892188
    500063 50.0377194 -110.689651
    500064 50.0377145 -110.6899026
    500065 50.03772 -110.6900984
    500066 50.0377255 -110.6902942
    500067 50.037726 -110.6907416
    500068 50.0377315 -110.6909373
    500069 50.0376664 -110.6911747
    500070 50.0377009 -110.6915524
    500071 50.0377381 -110.6920279
    500072 50.0377187 -110.6921886
    500073 50.0377228 -110.6923355
    500074 50.0376577 -110.6925729
    500075 50.0377026 -110.6928946
    500076 50.0377212 -110.6931324
    500077 50.0377398 -110.6933702
    500078 50.0378082 -110.6936781
    500079 50.0378033 -110.6939297
    500081 50.0378301 -110.6944612
    500082 50.0378619 -110.694741
    500083 50.0378777 -110.6948809
    500084 50.0379461 -110.6951888
    500085 50.0379308 -110.6954963
    500086 50.0380604 -110.6967133
    500087 50.037671 -110.6968934
    500088 50.0374434 -110.6968784
    500089 50.0372821 -110.6872723
    500090 50.0372853 -110.6876172
    500091 50.0372882 -110.6879155
    500092 50.0372505 -110.6880621
    500093 50.0372743 -110.688272
    500094 50.0372888 -110.6883629
    500095 50.037268 -110.6884747
    500096 50.0372472 -110.6885864
    500097 50.0372527 -110.6887822
    500098 50.0372478 -110.6890338
    500099 50.0372539 -110.689677
    500100 50.0372636 -110.6898716
    500101 50.0372626 -110.6901689
    500102 50.0372184 -110.6905437
    500103 50.0372709 -110.6907117
    500104 50.0372732 -110.6908722
    500105 50.0372458 -110.6915225
    500106 50.0372073 -110.6921294
    500107 50.0371687 -110.6926127
    500108 50.0372163 -110.6930324
    500109 50.0372619 -110.6931108
    500110 50.0372639 -110.693452
    500111 50.0372016 -110.6937873
    500112 50.0374116 -110.6944594
    500113 50.0374171 -110.6946551
    500114 50.0373756 -110.6948787
    500116 50.0374521 -110.6954802
    500117 50.037497 -110.695802
    500118 50.0375079 -110.6961935
    500119 50.0371923 -110.6866288
    500120 50.037429 -110.6861003
    500121 50.03749 -110.68597
    500123 50.0375522 -110.6858195
    500124 50.0375937 -110.6855959
    end
    [/CODE]

    Thanks in advance!

  • #2
    Search for "geocoding" on this site. Number of different user written commands; discussed most recently was -georoute-

    Comment


    • #3
      Perhaps this, using -geodist- from SSC:

      Code:
      gen `c(obs_t)' id = _n
      rename * *_2
      tempfile points
      save `points'
      
      rename *_2 *_1
      cross using `points'
      
      drop if id_2 <= id_1
      geodist Lat_1 Long_1 Lat_2 Long_2, gen(dist) miles

      Comment


      • #4
        See -ssc describe geonear-. It's easy to use, well documented, and so fast you'll think it cheats. I have been singing its praises here for several years, the influential experience being that I once had it finish a job on distances among 3,000 U.S. county locations in about 15 min, whereas some other Stata package would have taken (by experimental estimate) a few months. Same author and presumably same algorithm as geodist, but does some of the background work for you.
        Code:
        //  Use data file per above.
        tempfile nay
        save `nay'
        // Compare all possible neighbor pairs in this file, return nearest 5 neighbors in wide format,  default dist in km
        geonear ID Lat Long using `nay', neighbors(ID Lat Long) wide nearcount(5)

        Comment


        • #5
          Thats awesome! Thank you so much, Mike Lacy.

          I am impressed of how fast it works. I was curious about the accuracy so I used google maps to check a few addresses. Too good to be true

          Comment

          Working...
          X