Hello Stata list,
I am facing a situation I can't explain, I'm using the merge command failing to merge only one observation with extacly the same names, after removing all potential blanks. I am using Stata 15.
The first datasets is the publicly available data on COVID by ECDC available here : https://www.ecdc.europa.eu/en/public...-rate-covid-19
I use the excel version of it, saved on my laptop, and keep only the last available week and only cases, so that one country = one observations and I have no duplicates.
Then I change some of the upper cases to lower cases for the country names, and I remove any additional blanks (there were actually none):
I merge it with a file I created which has the translation of each countries. I've also deleted all potential additional blanks beforehand, using the same command as above (and again, there were none).
The browse command give the following output:
country pays _merge
Antigua and Barbuda master only (1)
Antigua and Barbuda Antigua-et-Barbuda using only (2)
I don't see the difference between the country names.
Yet, when I copy paste the country name directly from the browse table, the first one and then the second one (which look completely identical to me), using the following commands,
and
The output yields in one observation for each: the one I copied respectively.
Following the same logic, when I tried to modify the name in my master dataset, it does make a replacement :
replace country="Antigua and Barbuda" if country=="Antigua and Barbuda"
(1 real change made)
Are there any hidden characters we should be wary of?
It's the first time I encountering such situation in Stata.
Thank you in anticipation for enlighting me here.
I am facing a situation I can't explain, I'm using the merge command failing to merge only one observation with extacly the same names, after removing all potential blanks. I am using Stata 15.
The first datasets is the publicly available data on COVID by ECDC available here : https://www.ecdc.europa.eu/en/public...-rate-covid-19
I use the excel version of it, saved on my laptop, and keep only the last available week and only cases, so that one country = one observations and I have no duplicates.
Code:
keep if year_week=="2021-16" keep if indicator=="cases"
Code:
replace country=subinstr(country, " And ", " and ", 1) replace country=subinstr(country, " Of ", " of ", 1) replace country=strtrim(stritrim(country))
Code:
merge 1:1 country using Translate.dta br country pays if _merge!=3
country pays _merge
Antigua and Barbuda master only (1)
Antigua and Barbuda Antigua-et-Barbuda using only (2)
I don't see the difference between the country names.
Yet, when I copy paste the country name directly from the browse table, the first one and then the second one (which look completely identical to me), using the following commands,
Code:
br country pays if country=="Antigua and Barbuda"
Code:
br country pays if country=="Antigua and Barbuda"
Following the same logic, when I tried to modify the name in my master dataset, it does make a replacement :
replace country="Antigua and Barbuda" if country=="Antigua and Barbuda"
(1 real change made)
Are there any hidden characters we should be wary of?
It's the first time I encountering such situation in Stata.
Thank you in anticipation for enlighting me here.
Comment