Hey!
We're a group of students who are new to Stata, so our skills in Stata are fairly limited. Our problem is with merging a large number of files, horizontally and vertically. First I'll give an example of the data we have and how it is structured:
Session 1
001_uniquefilename1.dta
001_uniquefilename2.dta
001_uniquefilename3.dta
etc...
Session 2
002_uniquefilename1.dta
002_uniquefilename2.dta
002_uniquefilename3.dta
etc...
etc...
For a total of ~300 sessions. We aim to merge the data files for each session horizontally, and appending the sessions vertically. There is a mismatch in name for unique ID in some of the files in each session, but if this is not something Stata can accommodate for, we will change the unique ID name to have the same value.
Furthermore, some files have multiple entries per unique ID, lets say unique ID is year, e.g.:
001_uniquefilename1.dta:
1 entry per year, 8 dimensions
001_uniquefilename2.dta:
6 entries per year, 6 dimensions
001_uniquefilename3.dta:
24 entries per year, 8 dimensions
001_uniquefilename4.dta:
24 entries per year, 8 dimensions (same IDs as uniquefilename3 for the 24 entries)
We are trying to merge these in a way where it will duplicate so that the entries in uniquefilename1 is duplicated 6 times to accommodate for the 6 entries in uniquefilename2, and the 6 entries in the resulting data file from 1 and 2 is duplicated 24 times for uniquefilename3. on the 3rd merge with uniquefilename4, it should not duplicate another 24 times as the IDs for the 24 entries match the IDs in uniquefilename3.
As we have very limited experience with Stata, our googling game is not too on point, and we are getting the impression that files to be merged must be in a chronological order with the same name except for a number-identifier. Such as data001, data002, data003 etc., so we were wondering if anyone more experienced would be able to point us in the right direction.
We're a group of students who are new to Stata, so our skills in Stata are fairly limited. Our problem is with merging a large number of files, horizontally and vertically. First I'll give an example of the data we have and how it is structured:
Session 1
001_uniquefilename1.dta
001_uniquefilename2.dta
001_uniquefilename3.dta
etc...
Session 2
002_uniquefilename1.dta
002_uniquefilename2.dta
002_uniquefilename3.dta
etc...
etc...
For a total of ~300 sessions. We aim to merge the data files for each session horizontally, and appending the sessions vertically. There is a mismatch in name for unique ID in some of the files in each session, but if this is not something Stata can accommodate for, we will change the unique ID name to have the same value.
Furthermore, some files have multiple entries per unique ID, lets say unique ID is year, e.g.:
001_uniquefilename1.dta:
1 entry per year, 8 dimensions
001_uniquefilename2.dta:
6 entries per year, 6 dimensions
001_uniquefilename3.dta:
24 entries per year, 8 dimensions
001_uniquefilename4.dta:
24 entries per year, 8 dimensions (same IDs as uniquefilename3 for the 24 entries)
We are trying to merge these in a way where it will duplicate so that the entries in uniquefilename1 is duplicated 6 times to accommodate for the 6 entries in uniquefilename2, and the 6 entries in the resulting data file from 1 and 2 is duplicated 24 times for uniquefilename3. on the 3rd merge with uniquefilename4, it should not duplicate another 24 times as the IDs for the 24 entries match the IDs in uniquefilename3.
As we have very limited experience with Stata, our googling game is not too on point, and we are getting the impression that files to be merged must be in a chronological order with the same name except for a number-identifier. Such as data001, data002, data003 etc., so we were wondering if anyone more experienced would be able to point us in the right direction.
Comment