Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to append all excel files from a directory (plus problem with local macros)?

    I have a folder (with subfolders) containing many excel files. Each file has two columns: Name and Code. I'm trying to figure out a way to import and combine all of these files using Stata (ultimately to create one large excel spreadsheet). I've found several related answers online, but I haven't been successful at modifying them to fit my problem. I would like to avoid changing the file names or converting them to csv files if possible (there are over 200 files).

    On another forum, it looked like I could use local macros to input the files (source: http://www.stata.com/statalist/archi...sg00242.html):

    local allfiles : dir . files *.xls

    But when I try to even do a simple local macro, I get the error, "invalid name". For example:

    local example "hello"
    list 'example'

    I know I should be looping through the files and appending them, but I'm not sure how to begin. Does anyone have ideas?

    Any help would be much appreciated. I'm a newbie to Stata and to this site, so please forgive me if I've made mistakes in the formatting of my question. Thanks so much!

    Krista


  • #2
    For local macros you need to be careful that you're using the right punctuation. Try
    Code:
    local example "hello"
    list `example'
    Notice that the first apostrophe there is the one that slants down to the right, not the straight up and down one.

    There are a variety of different ways to do the loop to import and then put all the pieces together. I'd recommend doing something like this

    Code:
    local allfiles : dir . files "*.xls"
    display `allfiles'
    
    foreach file in `allfiles' {
        import excel using "`file'", clear
        
        **to save as a stata file create a local with the .xls extension removed
        local noextension=subinstr("`file'",".xls","",.)
        
        save "`noextension'", replace
    }
    
    local statfiles : dir . files "*.dta"
    clear
    **create a blankvariable so you can save a blank dataset to start appending to
    gen blankvar=.
    save fullfile, replace
    foreach data in `statfiles' {
        use fullfile, clear
        append using `data'
        save fullfile, replace
    }
    
    drop blankvar
    
    save fullfile, replace
    
    export excel using fullfile.xls
    This is the bare bones strategy. You'll likely want to add various options to the import excel and export excel steps.

    There are other strategies. You can do this all in one loop but I like this sort of setup because you have copies of the imported data that you can look at if you run into problems with the append while debugging. Often when importing multiple spreadsheets I find that variables get imported with different types depending on what exactly is contained in the original spreadsheets. Most often I run into this when a column contains strings in most of the sheets but is empty in some of them. This typically results in a string variable for those sheets with data and a numeric variable with all missing values for those sheets that do not contain data for that column. This has to be fixed before you can append.

    Comment


    • #3
      But when I try to even do a simple local macro, I get the error, "invalid name". For example:
      local example "hello"
      list 'example'
      The problem is not with your use of macros, its with your use of list. -list- requires variable names, not other matter. So in your example, when Stata sees -list `example'- it replaces `example' by hello and attempts to execute the command: list hello. If there is no variable hello in your data set, then Stata will complain. If there is a variable hello, then you will get a listing of all its values observation-by-observation. I'm guessing that what you want Stata to do here is print out the word hello. That calls for the -display- command, not the list command.

      Code:
      display "`example'"
      will do that. Note that the quotation marks around `example' are important. When display encounters something that looks like it could be a variable name, it interprets it as such and will show the value of that variable in the first observation. So to make sure that -display- understands that you want the word hello, not a value of the variable hello, you need explicit quotation marks (even if there is no variable hello for -display- to confuse it with.) Also if your macro itself contains quotation marks, then you need to use compound double quotes `" and "'.

      Returning to your basic task of appending a bunch of files, you don't tell us what code you tried and what output you got from that. But since I think I understand the task, let me suggest you try:

      Code:
      local allfiles : dir "." files "*.xls"  // I'm pretty sure the quotation marks are necessary here
      tempfile building
      clear    // START WITH NO DATA IN MEMORY
      save `building', emptyok
      foreach f of local allfiles {
          import excel using `f', clear // You may need to apply certain other options here such
              // as -firstrow- or -cellrange()- depending on how these spreadsheets are laid out
          gen source = "`f'"
          append using `building'
          save `building', replace
      }
      At the end of this code, the contents of `building' will be in memory and contain all the appended contents of the spreadsheets. From there you can do whatever you need. By the way the line in the code -gen source = "`f'"- just adds a variable, source, that lets you know which file each observation originally came from. That way if you have questions about the data later on, you can trace it back to its origin.

      Hope this helps.

      Comment


      • #4
        Thank you!

        I'm having a problem early on in the code. It looks like Stata's not recognizing the files (noting is displayed). I tried to convert a few to .dta files manually, and when I modified the code, it did recognize the .dta files. Do you know how I could fix this problem?

        Comment


        • #5
          I think for further help you need to show us exactly the code you tried (copied and pasted from the original) and exactly what Stata did in response (also copied and pasted from the Results screen or your log file). In this case, it might also help to provide a listing of the directory you are trying to find these .xls files in, so we can see exactly what that looks like.

          Comment


          • #6
            You're going to need to provide more detail than "It looks like Stata's not recognizing the files (noting is displayed)."

            What code did you run? What did Stata display?

            Comment


            • #7
              Thank you for the tip about providing a listing of the directory. My files were saved as .xlsx files (sorry for all my silly mistakes!). Now the code works perfectly!!

              Comment


              • #8
                One other quick question: Is there a way to allow Stata to also append all excel files within sub-directories? If not, it's not a big deal.

                Comment


                • #9
                  If you only have one level of subdirectories, it's not too hard.

                  Code:
                  clear
                  tempfile building
                  save `building', emptyok
                  local directories: dir "." dirs. "*" // Macro with name of all subdirectories
                  local directories "." `directories'  // Stick the current directory in front
                  
                  foreach d of local directories {
                      local files: dir "`d'" files "*.xlsx"
                      foreach f of local files {
                          import excel using "`d'/`f'", clear // Note use of /, not \ as separator.
                          gen source = "`d'/`f'"
                          append using `building'
                          save `building', replace
                      }
                  }
                  should do the trick. Again, you may need to specify some other options in the -import excel- statement. And use /, not \ as your filename separator, even if you are running in Windows. The backslash sometimes does funny things to the character that follows it, and Stata understands / as a path separator in all OSs.

                  If you have multiple levels of nesting, it gets more complicated. You could write a program that calls itself recursively to do this--but I suspect from your request that that would be more trouble than it's worth to you. Also, I think there is a user-written program out there somewhere that will build a list of files meeting a certain description in a directory and all embedded subdirectories--but I can't remember what it's called. Maybe somebody else on the forum remembers.

                  Hope this helps.

                  Comment


                  • #10
                    Just for promotion and because it somehow fits, you may find xls2dta from SSC helpful. It basically implements strategies similar to the ones outlined here and leaves filenames in r() that can be used in loops to append the datasets. Note that it does handle subdirectories (yet - if I find the time I might implement some of the things discussed here).

                    Clyde might be thinking of filelist (SSC)?

                    Best
                    Daniel
                    Last edited by daniel klein; 13 Jun 2014, 14:57.

                    Comment


                    • #11
                      I'm a bit late to the game but here's the sample code from the filelist help file adapted to import Excel files. This will recursively build a list of all Excel datasets in the current directory (type pwd to identify it) and its subdirectories. A loop is then used to import each dataset. The final step is to loop again to append all the Stata datasets.


                      Code:
                      * make a dataset of Excel datasets and save it
                      filelist, p("*.xlsx") list
                      gen fileid = _n
                      save "myxlsx", replace
                      
                      * loop over each file and save a copy of the imported data;
                      use "myxlsx", clear
                      local obs = _N
                      forvalues i=1/`obs' {
                        use "myxlsx" in `i', clear
                        local f = dirname + "/" + filename
                        import excel using "`f'", clear
                        gen source = "`f'"
                        tempfile save`i'
                        save "`save`i''"
                      }
                      
                      * loop over the saved datasets and append them
                      clear
                      use "`save1'"
                      forvalues i=2/`obs' {
                        append using "`save`i''"
                      }

                      Comment


                      • #12
                        I realize that you specifically asked for solutions that require not to convert the Excel files to CSV. But for the record here is an easy-looking and automated solution that can be executed from the command line and hence from within Stata with the shell escape.

                        1. Get this macro from Stackoverflow that allows batch-conversion of Excel files to CSV. You can run this in the command line.

                        2. Again in the command line, merge the cvs files into a single file: copy *.csv all.csv

                        You could add another line to delete all csv files in your directory.

                        I am curious if how the various approaches mentioned here compare in terms of processing time. Or maybe it doesn't matter much with "just" 200 files?

                        Cheers, Bert

                        Comment


                        • #13
                          Originally posted by Clyde Schechter View Post
                          But when I try to even do a simple local macro, I get the error, "invalid name". For example:


                          The problem is not with your use of macros, its with your use of list. -list- requires variable names, not other matter. So in your example, when Stata sees -list `example'- it replaces `example' by hello and attempts to execute the command: list hello. If there is no variable hello in your data set, then Stata will complain. If there is a variable hello, then you will get a listing of all its values observation-by-observation. I'm guessing that what you want Stata to do here is print out the word hello. That calls for the -display- command, not the list command.

                          Code:
                          display "`example'"
                          will do that. Note that the quotation marks around `example' are important. When display encounters something that looks like it could be a variable name, it interprets it as such and will show the value of that variable in the first observation. So to make sure that -display- understands that you want the word hello, not a value of the variable hello, you need explicit quotation marks (even if there is no variable hello for -display- to confuse it with.) Also if your macro itself contains quotation marks, then you need to use compound double quotes `" and "'.

                          Returning to your basic task of appending a bunch of files, you don't tell us what code you tried and what output you got from that. But since I think I understand the task, let me suggest you try:

                          Code:
                          local allfiles : dir "." files "*.xls" // I'm pretty sure the quotation marks are necessary here
                          tempfile building
                          clear // START WITH NO DATA IN MEMORY
                          save `building', emptyok
                          foreach f of local allfiles {
                          import excel using `f', clear // You may need to apply certain other options here such
                          // as -firstrow- or -cellrange()- depending on how these spreadsheets are laid out
                          gen source = "`f'"
                          append using `building'
                          save `building', replace
                          }
                          At the end of this code, the contents of `building' will be in memory and contain all the appended contents of the spreadsheets. From there you can do whatever you need. By the way the line in the code -gen source = "`f'"- just adds a variable, source, that lets you know which file each observation originally came from. That way if you have questions about the data later on, you can trace it back to its origin.

                          Hope this helps.
                          Hi, I'm a beginner of Stata. Your code here seems very useful to solve my current problem.
                          However, I just got a blank spreadsheet when running the code below:
                          cd D:\Stata\Combine2
                          log using append.log
                          local allfiles : dir "." files "*.xlsx"
                          tempfile building
                          clear
                          save building, emptyok
                          foreach TRD of local allfiles {
                          import excel using TRD, clear
                          gen source = "Return"
                          append using building
                          save building, replace
                          }
                          browse
                          log close

                          and the log info is:
                          name: <unnamed>
                          log: D:\Stata\Combine2\append.log
                          log type: text
                          opened on: 26 Feb 2015, 17:45:28

                          . local allfiles : dir "." files "*.xlsx"

                          . tempfile building

                          . clear

                          . save building, emptyok
                          (note: dataset contains 0 observations)
                          file building.dta saved

                          . foreach TRD of local allfiles {
                          2. import excel using TRD, clear
                          3. gen source = "Return"
                          4. append using building
                          5. save building, replace
                          6. }

                          . browse

                          . log close
                          name: <unnamed>
                          log: D:\Stata\Combine2\append.log
                          log type: text
                          closed on: 26 Feb 2015, 17:45:28

                          There would be much appreciation if you could tell me what's wrong with it. Thanks!

                          Comment


                          • #14
                            At least one bug:

                            Code:
                             
                            import excel using `TRD', clear

                            Comment


                            • #15
                              Also

                              Code:
                              gen source = "Return"
                              seems not to be useful, as source will be a constant "Return" for all cases in all files. You probably wanted something dynamic here, but I have no idea what it should be.

                              Best
                              Daniel

                              Comment

                              Working...
                              X