Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching

    Hi I am working on two datasets involving the lat and long information.

    The first dataset is the household data. Which includes the v001( cluster number) , LATNUM, LONGNUM and b1 (month of birth ) and year of birth.


    First dataset


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int v001 double(LATNUM LONGNUM) byte b1 int b2
      1 36.4499182254 72.5715578674  7 2014
      2 35.8919142536 71.7268731073 10 2016
      3 35.1695664132 71.8344582034  1 2017
      4 35.4247289555 72.1639311741  2 2011
      5 35.0056955272 71.7764776856  1 2017
      6  34.944432422 71.6551094148 11 2010
      7 34.7355764204 71.8375579917  1 2016
      8  34.689715398 72.1238682541 12 2014
      9 34.8138638781 72.3664785607  4 2014
     10 34.7486443502 72.4169229558  5 2012
     11 34.7435888145 72.4244339897 11 2011
     12 34.9987162105  72.540403036 10 2016
     13 34.8614502539 72.2755876121  1 2016
     14 34.8487858784 72.8823515247  3 2010
     15 34.4607255601 72.4609016637  6 2011
     16 34.4672412698 71.8554040871  9 2017
     17 34.5657254918 71.9057528327  8 2014
     22 34.4484616298 73.3836077956 12 2009
     23 34.5177441604  73.265701841 10 2005
     24 34.4309865058 73.2175568765 12 2017
     25 34.6745435574 73.0526476984  7 2014
     26  34.089194344  73.383530112  4 2017
     27 34.0123092115 73.2395696716  7 2016
     28  34.138836295 73.2200455137 10 2009
     29 33.9656834462 73.2177445651  3 2010
     30  34.134098912 72.9400355457  9 2008
     31  33.809855724 73.1174442725  2 2016
     32 34.0005178374 72.9282441939 11 2012
     33 34.5448600486 72.8395490711  8 2017
     34 34.4866438792 72.0836053773 10 2016
     35 34.1774184331 72.1705950662  2 2009
     36 34.4187948321 72.3278790408  6 2016
     37 34.2801182307 72.1756213181  5 2010
     38  34.272924996 72.1737534586 11 2016
     39 34.2766122933 71.9418244891  6 2014
     40 34.2839028716 72.3767401211  2 2017
     41 34.1860364687 72.3979275223  4 2016
     42 34.1277885467  72.475738516  3 2015
     43 34.1172697963 72.5035644427  8 2017
     44 34.1342646468 71.6908071196 10 2015
     45 34.3032010258 71.6118536887  7 2015
     46 34.1654929177 71.7598850094  2 2007
     47 34.1927121501 71.7638071487  1 2011
     48 34.1033784498 71.6019087575 10 2011
     49 33.9583254271 71.6146843403 11 2012
     50  33.945303642 71.6865737739  3 2016
     51 33.9903730476 71.5375839682  2 2014
     52 34.0113739915 71.5212273276  6 2016
     53 34.0086142429 71.5288680618  7 2017
     54 34.0100262472 71.5353768295  8 2014
     55 34.0007967437 71.5339475817  6 2016
     56 34.0004557253 71.5777313753  2 2018
     57 33.9872542054 71.5971479859  6 2015
     58 33.9979765861  71.552242676  8 2010
     59 33.9931561414 71.5269282239  1 2017
     60 33.9972634656  71.538230657 10 2009
     61 34.1566380784 71.4504940907  4 2013
     62 34.0089296333 71.5420730915  6 2010
     63 33.9959747569 71.5266948364  7 2005
     64   33.91341027 72.0176179786  5 1998
     65  34.005454073  71.779015226  4 2016
     66 34.1227926199  71.960556304  2 2012
     67 33.9249335029 72.0390322529  9 2016
     68  33.938676527 71.7990767178  2 2010
     69 33.4840898942 71.3995560447  5 2013
     70 33.5800871476 71.4282835193  3 2015
     71 33.5742304548 71.4584730272  3 2016
     72 33.3626754137 70.5539741567  2 2017
     73 33.1748642707 71.1865990396  1 2011
     74 32.9790426464 70.6790446519  1 2015
     75 32.9899955976 70.6147092986  9 2016
     76 32.5176163005  70.921225542  8 2016
     77 32.6152064682 70.9164198863  4 2015
     78 31.7568504356 70.7176522238 11 2015
     79 31.9540009492  70.347534046  9 2010
     80 31.8439080198 70.8969695143  8 2012
     81 31.7958158111 70.8967685943  4 2014
     82 31.5949781671 70.8234721928 11 2016
     83 32.1718730258 70.2509650811  6 2016
     84 32.2237092477 70.3756924373 11 2013
     85 33.8684398977 72.7646453356  5 2017
     86 33.1642327843 72.1126681286 12 2017
     87 33.4969445401 71.9532672364  8 2016
     88 33.3056905393 73.2131831488  1 2014
     89 33.4829668086  73.477666433 11 2014
     90 33.8861385011 73.4286611655 12 2017
     91 33.6533953758  73.410579747  8 2011
     92 33.7355745855 72.7908112076 10 2014
     93 33.4804065011  73.419872597 12 2014
     94 33.9046198718 73.3889983098  6 2017
     95 32.9495356629 73.5862306661  3 2016
     96 32.6370501442 73.1454932241  6 2015
     97 32.9951480988 72.8431774034  9 2014
     98 32.7557417861 71.9793290391 11 2013
     99 32.9278043706   72.87326944  8 2013
    100  32.866188617 72.7629057467  8 2015
    101 32.3227490272 72.9209185559  9 2012
    102 32.1487643466  72.911008241  4 2005
    103 32.2318100144 72.4440213163  3 2017
    104  32.035856394 72.7546317476  7 2008
    end




    The second dataset have the lat , longitude year month and temperature variable. Now I have to assign the temperature value from the second data set to the first data set based on the nearest lat , long and year and month variables.


    Second data set

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(lat longitude) int year byte month float date long id float temp
    -179.75 71.25 2009  4 591 1 -18.3
    -179.75 71.25 2009 11 598 1 -13.3
    -179.75 71.25 2009  2 589 1 -25.9
    -179.75 71.25 2009  5 592 1  -3.7
    -179.75 71.25 2010  1 600 1 -23.5
    -179.75 71.25 2010  4 603 1 -16.9
    -179.75 71.25 2010  9 608 1   3.1
    -179.75 71.25 2010  8 607 1   2.4
    -179.75 71.25 2010  7 606 1   2.4
    -179.75 71.25 2011 10 621 1  -3.3
    -179.75 71.25 2009  1 588 1 -20.9
    -179.75 71.25 2010  2 601 1 -22.7
    -179.75 71.25 2011  4 615 1 -15.7
    -179.75 71.25 2010 11 610 1 -11.3
    -179.75 71.25 2010 10 609 1  -3.1
    -179.75 71.25 2011 11 622 1 -13.1
    -179.75 71.25 2009  3 590 1   -24
    -179.75 71.25 2011  1 612 1 -19.1
    -179.75 71.25 2011  6 617 1   1.2
    -179.75 71.25 2011  2 613 1 -21.3
    -179.75 71.25 2009  9 596 1    .4
    -179.75 71.25 2009 12 599 1   -16
    -179.75 71.25 2009  6 593 1    .5
    -179.75 71.25 2009  7 594 1   4.2
    -179.75 71.25 2009 10 597 1  -4.1
    -179.75 71.25 2009  8 595 1   4.2
    -179.75 71.25 2011  9 620 1   1.2
    -179.75 71.25 2011  8 619 1   2.7
    -179.75 71.25 2010 12 611 1 -17.1
    -179.75 71.25 2011  3 614 1 -17.7
    -179.75 71.25 2011  7 618 1   4.3
    -179.75 71.25 2011  5 616 1    -4
    -179.75 71.25 2010  6 605 1    .9
    -179.75 71.25 2011 12 623 1 -19.9
    -179.75 71.25 2010  3 602 1 -22.4
    -179.75 71.25 2010  5 604 1  -5.5
    -179.75 68.75 2009  3 590 2 -26.6
    -179.75 68.75 2009  2 589 2 -28.5
    -179.75 68.75 2011  8 619 2   2.3
    -179.75 68.75 2009  8 595 2     5
    -179.75 68.75 2011 10 621 2  -4.6
    -179.75 68.75 2010  3 602 2   -25
    -179.75 68.75 2009 12 599 2 -18.4
    -179.75 68.75 2011  7 618 2     3
    -179.75 68.75 2011  9 620 2   1.4
    -179.75 68.75 2009  5 592 2  -1.8
    -179.75 68.75 2009 10 597 2  -5.7
    -179.75 68.75 2010  6 605 2    .6
    -179.75 68.75 2010 10 609 2  -4.6
    -179.75 68.75 2009 11 598 2 -16.2
    -179.75 68.75 2009  9 596 2    .4
    -179.75 68.75 2011 11 622 2 -16.1
    -179.75 68.75 2010  8 607 2   3.3
    -179.75 68.75 2010 12 611 2 -18.5
    -179.75 68.75 2009  4 591 2   -19
    -179.75 68.75 2010  2 601 2 -25.1
    -179.75 68.75 2010  5 604 2  -5.9
    -179.75 68.75 2010 11 610 2 -13.3
    -179.75 68.75 2009  6 593 2   -.3
    -179.75 68.75 2009  7 594 2   4.4
    -179.75 68.75 2011  1 612 2 -23.8
    -179.75 68.75 2011  2 613 2 -25.4
    -179.75 68.75 2011  4 615 2 -17.1
    -179.75 68.75 2011  5 616 2  -5.9
    -179.75 68.75 2010  4 603 2 -19.1
    -179.75 68.75 2011  6 617 2     1
    -179.75 68.75 2011 12 623 2 -25.4
    -179.75 68.75 2010  1 600 2 -26.1
    -179.75 68.75 2009  1 588 2 -25.4
    -179.75 68.75 2010  7 606 2   3.2
    -179.75 68.75 2010  9 608 2   2.5
    -179.75 68.75 2011  3 614 2 -20.1
    -179.75 68.25 2010  9 608 3   1.9
    -179.75 68.25 2011  7 618 3   3.2
    -179.75 68.25 2011  4 615 3 -18.1
    -179.75 68.25 2009 10 597 3  -6.5
    -179.75 68.25 2009 11 598 3 -17.4
    -179.75 68.25 2009  2 589 3 -29.6
    -179.75 68.25 2010  7 606 3   3.5
    -179.75 68.25 2010  1 600 3 -27.1
    -179.75 68.25 2011  5 616 3  -6.5
    -179.75 68.25 2010  2 601 3 -26.2
    -179.75 68.25 2011  8 619 3   2.1
    -179.75 68.25 2010 11 610 3 -14.5
    -179.75 68.25 2010  8 607 3   3.1
    -179.75 68.25 2011 10 621 3  -5.9
    -179.75 68.25 2011 11 622 3 -17.3
    -179.75 68.25 2011 12 623 3 -26.1
    -179.75 68.25 2009  9 596 3   -.2
    -179.75 68.25 2009  8 595 3   4.8
    -179.75 68.25 2010  5 604 3  -6.6
    -179.75 68.25 2010 10 609 3  -5.7
    -179.75 68.25 2010 12 611 3 -19.7
    -179.75 68.25 2011  1 612 3 -24.4
    -179.75 68.25 2011  9 620 3    .7
    -179.75 68.25 2009  4 591 3 -19.7
    -179.75 68.25 2011  3 614 3 -21.1
    -179.75 68.25 2011  2 613 3 -26.5
    -179.75 68.25 2011  6 617 3   1.2
    -179.75 68.25 2009 12 599 3 -19.5
    end
    format %tm date

    Thanks

  • #2
    With such a small dataset, brute force is a reasonble approach. Combine -joinby- (https://www.stata.com/manuals/djoinby.pdf) to get all possible combinations with the appropriate formula for the distance between two points ( https://www.movable-type.co.uk/scripts/latlong.html) and then -collapse- to get the minimum for each point.

    Comment


    • #3
      Daniel Feenberg Thanks my data is very large I have household data and temperature points please and have million of observations

      Thanks

      Comment


      • #4
        Then you can't use joinby - the cost runs as the square of the number of observations. Have you looked at -geonear- (https://thedatahall.com/how-to-find-...using-geonear/)? I have no experience with and it makes no speed claims, so it might be totally impractical. There are several nearest neighbor programs available in languages other than Stata that use specialized data structures to drastically reduce the number of comparisons. Sometimes it is better to use a non-Stata solution.

        Comment


        • #5
          Thanks Daniel Feenberg

          Yes I am trying to use the geonear but getting the error message

          and I am confused how to incorporate the month of birth and year of birth of child while matching the datasets. geonear gives the nearest distance only from the household location to the temperature location.

          geonear child_id LATNUM_hh LONGNUM_hh using "air_temp_all.dta", ignoreself n(ID lat longitude) miles genstub(ignore)
          nbor latitude var lat must be between -90 and 90


          can you please point out some of these programs please

          Thanks



          Comment


          • #6
            I don't use -geonear- myself, so I'm reluctant to advise you about this. But I know that it was written by Robert Picard, and he is one of the best Stata programmers around. So I'm going to assume that the error message means exactly what it says. In one of your data sets, (most likely the air_temp_all data set, but check the one in memory, too) there is some observation with a value of lat that is not between -90 and 90. Since latitudes are necessarily within that range, that means there is erroneous latitude data in that data set. Check the data set for values of lat that are outside the -90 to 90 range, and then either delete those observations or, better if you can, correct the value of lat.

            If I am right that the problem is with your data, finding another application to handle this is not going to solve your problem anyway.

            Comment

            Working...
            X