Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Netherlands coordinates not matching when reverse geomatching

    Hello,

    I am using the 'geoinpoly' package to extract countries from coordinates. The data I am using is the 'pdsi.mon.mean.selfcalibrated.nc' file from the NOAA website. The file can be downloaded here as reference: https://downloads.psl.noaa.gov/Datasets/dai_pdsi/.

    I am using Panoply.exe to convert the .nc file to a .txt file, which is then imported into Stata. I am not sure if there is a better way to convert .nc files to .csv or .txt.

    For reference the initial converted .txt data looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long _ID str35 geounit float pdsi double date
    9 "Australia" -.24387507 4.733856e+11
    9 "Australia"  1.0542171 4.733856e+11
    9 "Australia"   2.530027 4.733856e+11
    9 "Australia"  -.2209172 4.733856e+11
    9 "Australia"   2.621126 4.733856e+11
    9 "Australia"   .6343879 4.733856e+11
    9 "Australia"  -.1671666 4.733856e+11
    9 "Australia"   2.627944 4.733856e+11
    9 "Australia"   2.666513 4.733856e+11
    9 "Australia"   2.709112 4.733856e+11
    9 "Australia"   1.713318 4.733856e+11
    9 "Australia"  -.2362924 4.733856e+11
    9 "Australia" -1.7012633 4.733856e+11
    9 "Australia"  2.6904535 4.733856e+11
    9 "Australia"   2.790685 4.733856e+11
    9 "Australia" -.18960217 4.733856e+11
    9 "Australia"  2.2980306 4.733856e+11
    9 "Australia"   2.200707 4.733856e+11
    9 "Australia"   2.745321 4.733856e+11
    9 "Australia"   1.793095 4.733856e+11
    9 "Australia"   3.218915 4.733856e+11
    9 "Australia"   1.263941 4.733856e+11
    9 "Australia"   2.517584 4.733856e+11
    9 "Australia"   2.487008 4.733856e+11
    9 "Australia"   3.189972 4.733856e+11
    9 "Australia"   2.065748 4.733856e+11
    9 "Australia"   .1883537 4.733856e+11
    9 "Australia"   2.921682 4.733856e+11
    9 "Australia" -.24707386 4.733856e+11
    9 "Australia" -.22103044 4.733856e+11
    9 "Australia" -.11581708 4.733856e+11
    9 "Australia"  -.1878037 4.733856e+11
    9 "Australia"  2.3756897 4.733856e+11
    9 "Australia"  2.4398546 4.733856e+11
    9 "Australia"  2.7599454 4.733856e+11
    9 "Australia"   2.970175 4.733856e+11
    9 "Australia"  2.9416146 4.733856e+11
    9 "Australia"   2.185491 4.733856e+11
    9 "Australia"  -2.767646 4.733856e+11
    9 "Australia"  2.2577999 4.733856e+11
    9 "Australia"   3.394545 4.733856e+11
    9 "Australia" -.24933997 4.733856e+11
    9 "Australia"  1.5068643 4.733856e+11
    9 "Australia"   3.061357 4.733856e+11
    9 "Australia"  2.0272496 4.733856e+11
    9 "Australia"  1.1702079 4.733856e+11
    9 "Australia"  2.7880814 4.733856e+11
    9 "Australia"  -.3799795 4.733856e+11
    9 "Australia"   1.713367 4.733856e+11
    9 "Australia"  1.6033676 4.733856e+11
    9 "Australia"   2.414652 4.733856e+11
    9 "Australia"  -.4349431 4.733856e+11
    9 "Australia" -1.6566436 4.733856e+11
    9 "Australia"   2.748014 4.733856e+11
    9 "Australia"   2.661808 4.733856e+11
    9 "Australia"   1.753073 4.733856e+11
    9 "Australia"    1.23942 4.733856e+11
    9 "Australia"   3.158964 4.733856e+11
    9 "Australia"  2.3036907 4.733856e+11
    9 "Australia"   1.929237 4.733856e+11
    9 "Australia"   2.133717 4.733856e+11
    9 "Australia"   2.860601 4.733856e+11
    9 "Australia"  -.2832702 4.733856e+11
    9 "Australia"   2.627661 4.733856e+11
    9 "Australia"   .7561895 4.733856e+11
    9 "Australia"  -.4891445 4.733856e+11
    9 "Australia"  1.6510674 4.733856e+11
    9 "Australia" -1.5471634 4.733856e+11
    9 "Australia"   2.398921 4.733856e+11
    9 "Australia"  -.9469776 4.733856e+11
    9 "Australia"  -.4469681 4.733856e+11
    9 "Australia"  2.3114488 4.733856e+11
    9 "Australia" -1.2622178 4.733856e+11
    9 "Australia"  -.5057986 4.733856e+11
    9 "Australia"  2.3764389 4.733856e+11
    9 "Australia"   3.059789 4.733856e+11
    9 "Australia"  -.8078451 4.733856e+11
    9 "Australia"   3.368574 4.733856e+11
    9 "Australia"   .1191175 4.733856e+11
    9 "Australia"  3.2319875 4.733856e+11
    9 "Australia"  -1.699207 4.733856e+11
    9 "Australia"  -.3062389 4.733856e+11
    9 "Australia"   3.294898 4.733856e+11
    9 "Australia"   2.876773 4.733856e+11
    9 "Australia"   2.523922 4.733856e+11
    9 "Australia"   3.287688 4.733856e+11
    9 "Australia"   2.736962 4.733856e+11
    9 "Australia"  -.9775672 4.733856e+11
    9 "Australia"   3.374736 4.733856e+11
    9 "Australia"  -1.302614 4.733856e+11
    9 "Australia"   2.965404 4.733856e+11
    9 "Australia"  .09760648 4.733856e+11
    9 "Australia"  1.0634354 4.733856e+11
    9 "Australia"    2.78752 4.733856e+11
    9 "Australia"  1.4266902 4.733856e+11
    9 "Australia"  -.5309889 4.733856e+11
    9 "Australia"   .3293644 4.733856e+11
    9 "Australia"   2.575687 4.733856e+11
    9 "Australia"   2.644309 4.733856e+11
    9 "Australia"  3.2080455 4.733856e+11
    end
    format %tc date
    Furthermore, I make the following changes to the data to format the date variable, keep the needed sample period, and fill missing variables with 0.

    Code:
    *Format Date Variable Change from hours to ms 
    local hours_to_ms = 60 * 60 * 1000
    gen double date = clock("1jan1800", "DMY") + time * `hours_to_ms'
    format %tc date
    
    *Keep sample Period 1975-2014
    drop if time < 1534008
    
    //Drop or replace missing values with 0?
    replace pdsi = 0 if missing(pdsi)
    
    save "International_PDSI.dta", replace
    I then use the following code to extract countries from coordinates and merge files to add country names according to matched ID's:

    ​​​​​​​
    Code:
    keep lat lon
    duplicates drop
    egen countryid = group(lat lon)
    order countryid lat lon
    
    geoinpoly lat lon using "geo2xy_world_coor.dta"
    
    keep if !mi(_ID) //drop observations with no matching ID
    
    merge m:1 _ID using "geo2xy_world_data.dta", keep(master match) keepusing(geounit) nogen 
        
    keep geounit _ID lat lon
    
    merge m:m lat lon using "International_PDSI.dta", nogen
    After running the code, if you look at the matched countries Netherlands is not on the list. However it appears to be highlighted on the map generated by panoply.exe and it should be in the sample as well. I am not sure what the problem is or if there is a better way to code this? I hope my explanations are clear as well.

    Any help on this would be greatly appreciated!

  • #2
    Okay.... I literally have done geoinpoly about 8 times over for my current project, so I know a thing or two about it.

    I get most of what you're doing in the second code block. But, what to me seems problematic (potentially) is your use of the m:m merge. If the second block of code is meant to do what you want, I have to start by asking why a m:m merge is necessary. The Stata manual explicitly cautions against it, such that they refuse to even give an example of one, saying that "if you think you need a m:m merge, you're likely wrong." So to start, could you please explain your thinking for that line of code?

    Comment


    • #3
      Thank you for the tip, but I tried using merging 1:m and it doesn't change the results. What is strange is that when using a different shapefile I get different matches. For example, when using this method the number of matches are different, and Netherlands is still missing...

      Code:
      ***********************************************
      //METHOD 2/2 with shapefile from: http://thematicmapping.org/downloads/world_borders.php
      ***********************************************
      * the shapefile for the world was downloaded from
      * http://thematicmapping.org/downloads/world_borders.php
      * and clicking on "TM_WORLD_BORDERS-0.3.zip"
      shp2dta using "TM_WORLD_BORDERS-0.3\TM_WORLD_BORDERS-0.3.shp", data("world_data.dta") coor("world_coor.dta") replace
      
      use "International_PDSI.dta", clear
      
      keep lat lon
      duplicates drop
      egen countryid = group(lat lon)
      order countryid lat lon
      
      geoinpoly lat lon using "world_coor.dta", unique
      
      keep if !mi(_ID) //drop observations with no matching ID
      
      merge m:1 _ID using "world_data.dta", keep(master match) nogen 
      
      rename NAME geounit
      keep geounit _ID lat lon
      
      merge 1:m lat lon using "International_PDSI.dta", nogen
      
      /*Drop missing _ID variables that were not matched to a country ID*/
      drop if missing(_ID)
      duplicates drop
      
      keep pdsi geounit date _ID
      sort _ID date
      I posted a picture of the map output by Panoply, which clearly shows the Netherlands being in the colored regions. If you could recommend a different way to convert .nc files to .csv or .txt file, this could help as well.

      Click image for larger version

Name:	Panoply - map.PNG
Views:	1
Size:	212.2 KB
ID:	1631745

      Comment

      Working...
      X