Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to merge 10 datasets and loop through them?

    Hi,
    I have ten datasets in a folder named "Bond Data" and I am trying to run through the same code for each of them (the code I have written creates new variables and drops many observations as the datasets are very large) and then merge them all. I know I have to do some kind of foreach loop, but am stuck beyond that.
    Thanks!

  • #2
    The general outline of the approach is:
    Code:
    cd "Bond Data"
    local filenames: dir "." files "*.dta"
    
    local first_file: word 1 of `filenames'
    local filenames: list filenames - first_file
    
    use "`first_file'", clear
    // CODE TO TRANSFORM AND CLEAN THE FIRST FILE GOES HERE
    tempfile building
    save `building'
    
    local keyvars // SPECIFY A SET OF VARIABLES THAT UNIQUELY IDENTIFIES OBSERVATIONS
    
    foreach f of local filenames {
        use "`f'", clear
        // CODE FOR TRANSOFRMING AND CLEANING FILES GOES HERE
        merge 1:1 `keyvars' using `building', nogenerate
        save `"`building'"', replace
    }
    
    // THE MERGED FILE IS NOW IN MEMORY; READY TO BE SAVED OR FURTHER MANAGED
    Notes:

    1. The first file has to be handled separately before you can loop over the others to -merge- it in.
    2. It is critical that each file contains the same set of variables that uniquely identifies observations in the data set. These are the merge key variables, and you must specify them in the -local keyvars- command above (where I left them out because I have no idea what they might be in your data). Typically they are ID variables, or combination of ID variables and dates or something like that. But whatever they are, they must jointly determine unique observations in all the data sets so that you can sensibly pair each observation in the previous data sets with the correct corresponding one in the new data set.
    3. You need to decide what you want to do with observations that go unmatched. Keep them? Discard them all? Discard some--if so, which ones? The options available in -merge- (see -help merge-) will facilitate implementing those decisions, but you need to be clear about how you want this handled.

    Finally, I wouldn't do this in the first place. Unless you are very lucky, there are likely to be some incompatibilities among the data sets you are trying to merge. The same variable (other than one of the key variables) might occur in more than one, but with inconsistent values. Which of the inconsistent values is the right one to keep? Or should you create a new variable so you can keep both versions. One of the key variables might be a string in one data set but a value-labeled integer in another: -merge- will choke on that.

    What I would do is write a separate do-file for each of the ten data sets, cleaning the data, pruning out unwanted observations, doing such transformations as are needed to assure that the key variables always have consistent data types across the 10 data sets, etc. While working on that, I would be alert to any variables other than the merge key variables that appear in more than one of the data sets, and investigate whether there are any inconsistencies between them. Only after I had perfected the 10 data sets would I then merge them together.

    Yes, in principle you can do it all as shown above. But the code for transforming and cleaning the files can get completely unwieldy as you have to write in all sorts of code whose execution is conditional on particular files being in process, or conditional on some particular configuration of the variables and observations in each file. That code, in addition to being difficult to write and debug, may prove difficult to even understand when you read it down a few days from now. So my advice is to perfect each data set separately, and then run a simple loop to merge them (which looks like the above without any code for transforming and cleaning the files as you go).

    Comment


    • #3
      Thanks so much!

      Comment

      Working...
      X