Dear all,
I know that there are some threats asking about data merging issues and I have read some of them. However, I find my dataset are quite different relative to what I have read in these forums. In addition, I have tried to apply what I have learned in those threats into my datasets but unfortunately it did not work. Therefore, I decided to make a new one and followings are my data and coding information:
In fact, I have three sub-datasets from a household survey, namely: file1 that contains demographic info; file2 contains educational info and the last one contains healthcare info. I realize that there are some common variables existed in the three files, including: province, district, commune, enumeration area, household identification and household member ID. Hence, my strategy is to base on these six variables to merge the three sub-datasets and here are my coding:
I sorted data first
Merge data - I am using file1 as master and the others are using. I am using m:1 -merge
The codes worked for merging file1 and file2 but failed when I tried to merge the last file, it said that "variables pro dist comm enum_area hh_ind hh_mem_id do not uniquely identify observations in the using data". I do not know why so could anyone have any ideas/suggestions to solve the problem?
In my case, I do not know how to create a sample data using dataex so I attach my datasets here (please kindly find the attached).
I hope I can have replies from you.
Thank you.
I know that there are some threats asking about data merging issues and I have read some of them. However, I find my dataset are quite different relative to what I have read in these forums. In addition, I have tried to apply what I have learned in those threats into my datasets but unfortunately it did not work. Therefore, I decided to make a new one and followings are my data and coding information:
In fact, I have three sub-datasets from a household survey, namely: file1 that contains demographic info; file2 contains educational info and the last one contains healthcare info. I realize that there are some common variables existed in the three files, including: province, district, commune, enumeration area, household identification and household member ID. Hence, my strategy is to base on these six variables to merge the three sub-datasets and here are my coding:
I sorted data first
Code:
use "C:\Users\file2.dta", clear count sort pro dist comm enum_area hh_ind hh_mem_id save "C:\Users\file2_sorted.dta", replace use "C:\Users\file3.dta", clear count sort pro dist comm enum_area hh_ind hh_mem_id save "C:\Users\file3_sorted.dta", replace
Code:
use "C:\Users\file1.dta", clear count sort pro dist comm enum_area hh_ind hh_mem_id merge m:1 pro dist comm enum_area hh_ind hh_mem_id using "C:\Users\file2_sorted.dta" tab _merge keep if _merge==3 drop _merge merge m:1 pro dist comm enum_area hh_ind hh_mem_id using "C:\Users\file3_sorted.dta" tab _merge keep if _merge==3
In my case, I do not know how to create a sample data using dataex so I attach my datasets here (please kindly find the attached).
I hope I can have replies from you.
Thank you.
Comment