Hello everyone,
I just startet to use stata and have a problem with merging 3 different datasets (with Stata 12).
I used the merge (m:1) command and got an error message r(459) "variable does not uniquely identify observations in the using data".
merge m:1 company_ID using Set 2.dta
merge m:1 company_ID using Set 3.dta
The Datasets look like this:
Set 1 (master):
company_ID job_request_date graduate_id
123 12.11.2014 57878
123 12.10.2014 78878
123 16.11.2014 99899
121 14.11.2014 55744
345 12.10.2014 55879
876 12.09.2014 55879
876 19.09.2014 14787
1000 19.09.2014 14787 (--> not available in Set 2)
. (missing) . (missing) 68994 (--> no job offer --> no company_ID)
... .....
Set 1 contains multiple obersevations for one company.
Set 2:
company_ID number_employees
123 100
121 50
345 600
876 800
... .....
Set 2 contains one observation for each company.
Set 3:
company_ID export_number
123 1
121 1
345 5
876 6
1000 1
... .....
Set 3 contains one observation for each company. Not every Company_ID of Set 3 is included in Set 2.
I want to add the information of Set 2 and Set 3 for each observation in Set 1:
Set merged:
company_ID job_request_date graduate_id number_employees export_number
123 12.11.2014 57878 100 1
123 12.10.2014 78878 100 1
123 16.11.2014 99899 100 1
121 14.11.2014 55744 50 1
345 12.10.2014 55879 600 5
876 12.09.2014 55879 800 6
876 19.09.2014 14787 800 6
1000 19.09.2014 14787 . (missing) 1
... .....
It is possible that the company_ID and the graduate_id are the same but since I define the company_ID as the keyvariable there should not be a problem?
I think that it might be a problem that the different sets contain company_IDs that are not in all ofthe other datasets?! I only know how to merge Data wih exel "vlookup". Add information if you find a matching pair f.e. company_ID = 123 in both files. Does it work the same with the merge command?
I hope you can help me with this problem.
Thank you!
I just startet to use stata and have a problem with merging 3 different datasets (with Stata 12).
I used the merge (m:1) command and got an error message r(459) "variable does not uniquely identify observations in the using data".
merge m:1 company_ID using Set 2.dta
merge m:1 company_ID using Set 3.dta
The Datasets look like this:
Set 1 (master):
company_ID job_request_date graduate_id
123 12.11.2014 57878
123 12.10.2014 78878
123 16.11.2014 99899
121 14.11.2014 55744
345 12.10.2014 55879
876 12.09.2014 55879
876 19.09.2014 14787
1000 19.09.2014 14787 (--> not available in Set 2)
. (missing) . (missing) 68994 (--> no job offer --> no company_ID)
... .....
Set 1 contains multiple obersevations for one company.
Set 2:
company_ID number_employees
123 100
121 50
345 600
876 800
... .....
Set 2 contains one observation for each company.
Set 3:
company_ID export_number
123 1
121 1
345 5
876 6
1000 1
... .....
Set 3 contains one observation for each company. Not every Company_ID of Set 3 is included in Set 2.
I want to add the information of Set 2 and Set 3 for each observation in Set 1:
Set merged:
company_ID job_request_date graduate_id number_employees export_number
123 12.11.2014 57878 100 1
123 12.10.2014 78878 100 1
123 16.11.2014 99899 100 1
121 14.11.2014 55744 50 1
345 12.10.2014 55879 600 5
876 12.09.2014 55879 800 6
876 19.09.2014 14787 800 6
1000 19.09.2014 14787 . (missing) 1
... .....
It is possible that the company_ID and the graduate_id are the same but since I define the company_ID as the keyvariable there should not be a problem?
I think that it might be a problem that the different sets contain company_IDs that are not in all ofthe other datasets?! I only know how to merge Data wih exel "vlookup". Add information if you find a matching pair f.e. company_ID = 123 in both files. Does it work the same with the merge command?
I hope you can help me with this problem.
Thank you!
Comment