Hi everyone,
I need to do a "fuzzy match" with two datasets. I have two sets of data.
I will provide a smal dataex for both datasets.
Master dataset:
Using dataset:
For example, for zip code "08190" in my using data set, I would like to add the number of observations for this zip code (264) to "08191" or "08192" in my master dataset, to have a final dataset with my master dataset, with the added observations. So, if I match "08190" with "08191", I end up with 529+264=793 observations for "08191".
If needed, here is the chunk of code that I tried:
But I ended up without any observations!
I hope my explanations are sufficiently clear.
Thank you in advance.
Michael
Edit: This will be then used to plot that on a map, by using -geoplot- command.
I need to do a "fuzzy match" with two datasets. I have two sets of data.
- In the first, I have the number of observations and zip codes for a large number of zip codes in Spain.
- In the second, I have the number of observations per zip code for those that don't exist in the first, due for example to the inclusion/exclusion of some of the zip codes over the years.
I will provide a smal dataex for both datasets.
Master dataset:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str5 sp_zipcode_str long tag_zip_house_mun byte _merge "08189" 9 3 "08191" 529 3 "08192" 149 3 "08193" 18 3 "08194" 18 3 "08195" 135 3 "08196" 9 3 "08197" 72 3 "08198" 47 3 "08199" . 1 "08201" 164 3 "08202" 202 3 "08203" 204 3 "08204" 203 3 "08205" 162 3 "08206" 329 3 "08207" 183 3 "08208" 187 3 "08210" 246 3 "08211" 167 3 "08212" 24 3 end label values _merge _merge label def _merge 1 "Master only (1)", modify label def _merge 3 "Matched (3)", modify
Using dataset:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str5 sp_zipcode_str long tag_zip_house_mun byte _merge "01872" 1 2 "03180" 14 2 "03787" 2 2 "03800" 2 2 "04602" 1 2 "06083" 1 2 "08092" 1 2 "08190" 264 2 "08200" 2 2 "08220" 2 2 "08240" 15 2 "08400" 51 2 "10683" 1 2 "13197" 1 2 "15192" 6 2 "15407" 1 2 "15708" 1 2 "17092" 1 2 "17254" 1 2 "18849" 1 2 "18857" 1 2 "19019" 1 2 "21102" 1 2 "21458" 1 2 "21646" 1 2 end label values _merge _merge label def _merge 2 "Using only (2)", modify
If needed, here is the chunk of code that I tried:
Code:
merge 1:1 sp_zipcode_str using `house_info_munic' Result Number of obs ----------------------------------------- Not matched 3,033 from master 2,956 (_merge==1) from using 77 (_merge==2) Matched 7,918 (_merge==3) ----------------------------------------- save Spain_municip, replace file Spain_municip.dta saved use "Spain_municip.dta", clear keep if _merge == 2 sort sp_zipcode_str egen i = group(sp_zipcode_str) tempfile not_matched_using save `not_matched_using', replace use "Spain_municip.dta", clear keep if _merge!= 2 sort sp_zipcode_str egen j = group(sp_zipcode_str) save Spain_municip, replace matchit j sp_zipcode_str using `not_matched_using', idusing(i) txtusing(sp_zipcode_str)
I hope my explanations are sufficiently clear.
Thank you in advance.
Michael
Edit: This will be then used to plot that on a map, by using -geoplot- command.
Comment