Dear Statalisters,
I hope I will be able to explain the problem clearly because I believe my problem is due to a small coding mistake that I can't find. Basically, I have a list of countries who each have two to five related datasets (the number varies). I would like to :
1) merge all the datasets related to a country
2) append all the merged country-specific dataset
I am stuck at the first step of my problem. Given the fact that I have a lot of countries, I created a loop to merge every country-specific dataset together. The merge goes fine for the majority of my countries, however, for three countries, namely Chad, Italy and North Macedonia, my unique identifier I used to merge every country-specific file together becomes doubled due to a bad merging.
I can answer questions if the code is unclear, but basically, at the end of the appending process, Chad, Italy and Macedonia have their idstd variable doubled, which is annoying for the rest of my do-file. Have a look at my data:
As you can see, the doubled idstd show missing variables for the rest of the merged variable (i1 k3a). Now of course I could just run a duplicates drop command, but this seems like an unsatisfying solution as I would like to understand why my code didn't work in the first place, and why it didn't work for these 3 countries only (Italy, North Macedonia and Chad).
Thank you for your help!
EDIT : Before anyone asks, I carefully checked that idstd is a unique identifier in both Chad files with the command isid. So I believe the merge 1:1 is appropriate.
EDIT2 : It is definitely a problem of code and not of data as I did the merge just for Chad in a separate do-file and it went well.
I hope I will be able to explain the problem clearly because I believe my problem is due to a small coding mistake that I can't find. Basically, I have a list of countries who each have two to five related datasets (the number varies). I would like to :
1) merge all the datasets related to a country
2) append all the merged country-specific dataset
I am stuck at the first step of my problem. Given the fact that I have a lot of countries, I created a loop to merge every country-specific dataset together. The merge goes fine for the majority of my countries, however, for three countries, namely Chad, Italy and North Macedonia, my unique identifier I used to merge every country-specific file together becomes doubled due to a bad merging.
Code:
global countries1 Albania Armenia Azerbaijan Belarus Bosnia_and_Herzegovina Bulgaria Chad Croatia Cyprus Czech_Republic El_Salvador Estonia Georgia Greece Guatemala Guinea Honduras Hungary Italy Jordan Kazakhstan Latvia Lebanon Lithuania Malta Moldova Mongolia Montenegro Morocco Mozambique Nicaragua Niger North_Macedonia Poland Portugal Romania Russian_Federation Serbia Slovak_Republic Slovenia Somalia South_Africa Togo Zambia Zimbabwe
global countries2 Armenia Bulgaria Croatia Cyprus Czech_Republic El_Salvador Estonia Georgia Greece Guatemala Honduras Hungary Italy Jordan Latvia Lebanon Lithuania Malta Moldova Mongolia Morocco Mozambique Nicaragua North_Macedonia Poland Portugal Romania Russian_Federation Slovak_Republic Slovenia Somalia Zambia Zimbabwe
global countries3 Armenia Bulgaria Croatia Cyprus Czech_Republic Estonia Georgia Greece Hungary Italy Jordan Latvia Lebanon Lithuania Malta Moldova Morocco North_Macedonia Poland Portugal Romania Slovak_Republic Slovenia Somalia Zambia
global countries4 Jordan
forvalues x = 1/4 {
foreach country of global countries`x' {
local round_file: dir "$inp_bm_cf/Round_`x'/`country'" files "*.dta"
foreach rf of local round_file {
tempfile `country'_R`x'_ed /* Ex: Albania_R1_ed */
use "$inp_bm_cf/Round_`x'/`country'/`rf'", clear
gen round_r`x' = `x'
* Save data in a tempfile
save ``country'_R`x'_ed', replace
}
}
}
foreach country of global countries1 {
local master_file: dir "$inp_bm_bs/`country'/" files "*.dta"
display `master_file'
foreach mf of local master_file {
use "$inp_bm_bs/`country'/`mf'", clear
}
forvalues x = 1/4 {
capture noisily: merge 1:1 idstd using ``country'_R`x'_ed', generate(merge_round`x')
tempfile `country'_AR /* Ex : Albania_AR (AR = Allrounds) */
save ``country'_AR'
}
}
foreach country of global countries1 {
append using ``country'_AR', force
}
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str28 countryx long idstd byte(merge_round1 i1 k3a) "Chad" 624601 3 2 -9 "Chad" 624601 2 . . "Chad" 624602 2 . . "Chad" 624602 3 2 90 "Chad" 624603 3 2 100 "Chad" 624603 2 . . "Chad" 624604 2 . . "Chad" 624604 3 2 100 "Chad" 624605 2 . . "Chad" 624605 3 2 100 "Chad" 624606 3 2 100 "Chad" 624606 2 . . "Chad" 624607 3 1 100 "Chad" 624607 2 . . "Chad" 624608 3 1 100 "Chad" 624608 2 . . "Chad" 624609 2 . . "Chad" 624609 3 1 50 "Chad" 624610 2 . . "Chad" 624610 3 1 30 "Chad" 624611 2 . . "Chad" 624611 3 1 60 "Chad" 624612 2 . . "Chad" 624612 3 2 50 "Chad" 624613 3 2 20 end label values idstd IDSTD label values merge_round1 _merge label def _merge 2 "Using only (2)", modify label def _merge 3 "Matched (3)", modify label values i1 I1 label def I1 1 "Yes", modify label def I1 2 "No", modify label values k3a K3A label def K3A -9 "Don't know (spontaneous)", modify
Thank you for your help!
EDIT : Before anyone asks, I carefully checked that idstd is a unique identifier in both Chad files with the command isid. So I believe the merge 1:1 is appropriate.
EDIT2 : It is definitely a problem of code and not of data as I did the merge just for Chad in a separate do-file and it went well.

Comment