Netherlands coordinates not matching when reverse geomatching

Bob Honda

Join Date: Jan 2021
Posts: 16

Netherlands coordinates not matching when reverse geomatching

13 Oct 2021, 14:49

Hello,

I am using the 'geoinpoly' package to extract countries from coordinates. The data I am using is the 'pdsi.mon.mean.selfcalibrated.nc' file from the NOAA website. The file can be downloaded here as reference: https://downloads.psl.noaa.gov/Datasets/dai_pdsi/.

I am using Panoply.exe to convert the .nc file to a .txt file, which is then imported into Stata. I am not sure if there is a better way to convert .nc files to .csv or .txt.

For reference the initial converted .txt data looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long _ID str35 geounit float pdsi double date
9 "Australia" -.24387507 4.733856e+11
9 "Australia"  1.0542171 4.733856e+11
9 "Australia"   2.530027 4.733856e+11
9 "Australia"  -.2209172 4.733856e+11
9 "Australia"   2.621126 4.733856e+11
9 "Australia"   .6343879 4.733856e+11
9 "Australia"  -.1671666 4.733856e+11
9 "Australia"   2.627944 4.733856e+11
9 "Australia"   2.666513 4.733856e+11
9 "Australia"   2.709112 4.733856e+11
9 "Australia"   1.713318 4.733856e+11
9 "Australia"  -.2362924 4.733856e+11
9 "Australia" -1.7012633 4.733856e+11
9 "Australia"  2.6904535 4.733856e+11
9 "Australia"   2.790685 4.733856e+11
9 "Australia" -.18960217 4.733856e+11
9 "Australia"  2.2980306 4.733856e+11
9 "Australia"   2.200707 4.733856e+11
9 "Australia"   2.745321 4.733856e+11
9 "Australia"   1.793095 4.733856e+11
9 "Australia"   3.218915 4.733856e+11
9 "Australia"   1.263941 4.733856e+11
9 "Australia"   2.517584 4.733856e+11
9 "Australia"   2.487008 4.733856e+11
9 "Australia"   3.189972 4.733856e+11
9 "Australia"   2.065748 4.733856e+11
9 "Australia"   .1883537 4.733856e+11
9 "Australia"   2.921682 4.733856e+11
9 "Australia" -.24707386 4.733856e+11
9 "Australia" -.22103044 4.733856e+11
9 "Australia" -.11581708 4.733856e+11
9 "Australia"  -.1878037 4.733856e+11
9 "Australia"  2.3756897 4.733856e+11
9 "Australia"  2.4398546 4.733856e+11
9 "Australia"  2.7599454 4.733856e+11
9 "Australia"   2.970175 4.733856e+11
9 "Australia"  2.9416146 4.733856e+11
9 "Australia"   2.185491 4.733856e+11
9 "Australia"  -2.767646 4.733856e+11
9 "Australia"  2.2577999 4.733856e+11
9 "Australia"   3.394545 4.733856e+11
9 "Australia" -.24933997 4.733856e+11
9 "Australia"  1.5068643 4.733856e+11
9 "Australia"   3.061357 4.733856e+11
9 "Australia"  2.0272496 4.733856e+11
9 "Australia"  1.1702079 4.733856e+11
9 "Australia"  2.7880814 4.733856e+11
9 "Australia"  -.3799795 4.733856e+11
9 "Australia"   1.713367 4.733856e+11
9 "Australia"  1.6033676 4.733856e+11
9 "Australia"   2.414652 4.733856e+11
9 "Australia"  -.4349431 4.733856e+11
9 "Australia" -1.6566436 4.733856e+11
9 "Australia"   2.748014 4.733856e+11
9 "Australia"   2.661808 4.733856e+11
9 "Australia"   1.753073 4.733856e+11
9 "Australia"    1.23942 4.733856e+11
9 "Australia"   3.158964 4.733856e+11
9 "Australia"  2.3036907 4.733856e+11
9 "Australia"   1.929237 4.733856e+11
9 "Australia"   2.133717 4.733856e+11
9 "Australia"   2.860601 4.733856e+11
9 "Australia"  -.2832702 4.733856e+11
9 "Australia"   2.627661 4.733856e+11
9 "Australia"   .7561895 4.733856e+11
9 "Australia"  -.4891445 4.733856e+11
9 "Australia"  1.6510674 4.733856e+11
9 "Australia" -1.5471634 4.733856e+11
9 "Australia"   2.398921 4.733856e+11
9 "Australia"  -.9469776 4.733856e+11
9 "Australia"  -.4469681 4.733856e+11
9 "Australia"  2.3114488 4.733856e+11
9 "Australia" -1.2622178 4.733856e+11
9 "Australia"  -.5057986 4.733856e+11
9 "Australia"  2.3764389 4.733856e+11
9 "Australia"   3.059789 4.733856e+11
9 "Australia"  -.8078451 4.733856e+11
9 "Australia"   3.368574 4.733856e+11
9 "Australia"   .1191175 4.733856e+11
9 "Australia"  3.2319875 4.733856e+11
9 "Australia"  -1.699207 4.733856e+11
9 "Australia"  -.3062389 4.733856e+11
9 "Australia"   3.294898 4.733856e+11
9 "Australia"   2.876773 4.733856e+11
9 "Australia"   2.523922 4.733856e+11
9 "Australia"   3.287688 4.733856e+11
9 "Australia"   2.736962 4.733856e+11
9 "Australia"  -.9775672 4.733856e+11
9 "Australia"   3.374736 4.733856e+11
9 "Australia"  -1.302614 4.733856e+11
9 "Australia"   2.965404 4.733856e+11
9 "Australia"  .09760648 4.733856e+11
9 "Australia"  1.0634354 4.733856e+11
9 "Australia"    2.78752 4.733856e+11
9 "Australia"  1.4266902 4.733856e+11
9 "Australia"  -.5309889 4.733856e+11
9 "Australia"   .3293644 4.733856e+11
9 "Australia"   2.575687 4.733856e+11
9 "Australia"   2.644309 4.733856e+11
9 "Australia"  3.2080455 4.733856e+11
end
format %tc date

Furthermore, I make the following changes to the data to format the date variable, keep the needed sample period, and fill missing variables with 0.

Code:

*Format Date Variable Change from hours to ms 
local hours_to_ms = 60 * 60 * 1000
gen double date = clock("1jan1800", "DMY") + time * `hours_to_ms'
format %tc date

*Keep sample Period 1975-2014
drop if time < 1534008

//Drop or replace missing values with 0?
replace pdsi = 0 if missing(pdsi)

save "International_PDSI.dta", replace

I then use the following code to extract countries from coordinates and merge files to add country names according to matched ID's:

Code:

keep lat lon
duplicates drop
egen countryid = group(lat lon)
order countryid lat lon

geoinpoly lat lon using "geo2xy_world_coor.dta"

keep if !mi(_ID) //drop observations with no matching ID

merge m:1 _ID using "geo2xy_world_data.dta", keep(master match) keepusing(geounit) nogen 
    
keep geounit _ID lat lon

merge m:m lat lon using "International_PDSI.dta", nogen

After running the code, if you look at the matched countries Netherlands is not on the list. However it appears to be highlighted on the map generated by panoply.exe and it should be in the sample as well. I am not sure what the problem is or if there is a better way to code this? I hope my explanations are clear as well.

Any help on this would be greatly appreciated!

Tags: None

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

14 Oct 2021, 05:15

Okay.... I literally have done geoinpoly about 8 times over for my current project, so I know a thing or two about it.

I get most of what you're doing in the second code block. But, what to me seems problematic (potentially) is your use of the m:m merge. If the second block of code is meant to do what you want, I have to start by asking why a m:m merge is necessary. The Stata manual explicitly cautions against it, such that they refuse to even give an example of one, saying that "if you think you need a m:m merge, you're likely wrong." So to start, could you please explain your thinking for that line of code?
Comment

Bob Honda

Join Date: Jan 2021
Posts: 16

14 Oct 2021, 08:30

Thank you for the tip, but I tried using merging 1:m and it doesn't change the results. What is strange is that when using a different shapefile I get different matches. For example, when using this method the number of matches are different, and Netherlands is still missing...

Code:

***********************************************
//METHOD 2/2 with shapefile from: http://thematicmapping.org/downloads/world_borders.php
***********************************************
* the shapefile for the world was downloaded from
* http://thematicmapping.org/downloads/world_borders.php
* and clicking on "TM_WORLD_BORDERS-0.3.zip"
shp2dta using "TM_WORLD_BORDERS-0.3\TM_WORLD_BORDERS-0.3.shp", data("world_data.dta") coor("world_coor.dta") replace

use "International_PDSI.dta", clear

keep lat lon
duplicates drop
egen countryid = group(lat lon)
order countryid lat lon

geoinpoly lat lon using "world_coor.dta", unique

keep if !mi(_ID) //drop observations with no matching ID

merge m:1 _ID using "world_data.dta", keep(master match) nogen 

rename NAME geounit
keep geounit _ID lat lon

merge 1:m lat lon using "International_PDSI.dta", nogen

/*Drop missing _ID variables that were not matched to a country ID*/
drop if missing(_ID)
duplicates drop

keep pdsi geounit date _ID
sort _ID date

I posted a picture of the map output by Panoply, which clearly shows the Netherlands being in the colored regions. If you could recommend a different way to convert .nc files to .csv or .txt file, this could help as well.

Announcement

Netherlands coordinates not matching when reverse geomatching

Comment

Comment