Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing several excel files and appending into 1 large excel file

    Hi everyone,

    I'm new to using Stata and was wondering how you go about importing several excel files and then appending them all together into 1 large excel file.

    Thank you

  • #2
    So first you need to create the list of names of the files that you want to append together. How you do that depends on details of the filenames themselves and the directory structure in which they've contained. Since you haven't told us anything about that, I'll skip that part and just assume that you have already done that in a local macro called filenames.

    From there the basic framework is a loop:

    Code:
    clear
    tempfile building
    save `building', emptyok
    
    foreach f of local filenames {
        import excel using `"`f'"'
        gen source = `"`f'"'
        display `"Appending `f'"'
        append using `building'
        save `"`building'"', replace
    }
    
    export excel using 1_large_excel_file.xlsx, replace
    Note: If your Excel files are nicely cleaned and compatible with each other, this will be all you need. But Excel imposes very little structure on data. You may find that Stata balks along the way. For example, if a column in one of the files has string information but in another file the corresponding column is numeric, Stata will tell you of the incompatibility and will halt. If you run into those problems, then you have to add more code inside the loop to clean up each file as you go and impose order on the chaos.

    Comment


    • #3
      Clyde Schechter Thank you for your response!! Greatly appreciated.

      Apologies for the brevity of my post. Basically, the excel files include data for various years. An example file name would be "Data List - 1970.xlsx" and I have 37 years of such data (so 37 excel files). Would I have to take care of this prior to implementing the code you posted above?

      Thank you once again.

      Comment


      • #4
        I had a similar problem and I used -xls2dta- to handle it. Here is my code

        Code:
        ssc install xls2dta
        xls2dta, save("C:\Users\pxc093020\Dropbox\data\acq") : import excel "C:\Users\pxc093020\Dropbox\data\acq"
        xls2dta, save("C:\Users\pxc093020\Dropbox\data\acq1.dta") : append
        export excel using filename.xlsx, replace
        In my case, I have 19 acquisition datasets (1996-2014) where each dataset contains acquisition announcements each year. I create a folder (named -acq-) containing all excel files starting their name -acq- so that I have 19 excel files (i.e., acq1996.xls, acq1997.xls,..., acq2014.xls).

        The first line of command is for importing all excel files in the folder and save each of them in .dta in the same folder. So now I get 19 stata datasets (i.e., acq1996.dta, acq1997.dta,...,acq2014.dta).
        The second line of command is for appending all dta datasets into a single dta dataset. I name it as acq1.dta.
        The last line is for exporting the dta dataset to excel.

        Hope this helps. Good luck.
        Last edited by Pawinee Changphao; 29 Mar 2016, 22:07. Reason: I am sorry for keep editing. I was using the wrong [QUOTE] so I changed it to [CODE]

        Comment


        • #5
          Let's assume that Carter Dreymond has all 37 excel files in a single folder (and there are no other excel files contained in that folder), as suggested by Pawinee Changphao in #3. Then we can add a line to Clyde's code in #2 that will store all files ending with .xls (or .xlsx) in the local macro filenames.

          Code:
          clear
          tempfile building
          save `building', emptyok
          
          local filenames: dir "c:\file\path\sub\folder" files "*.xls*"
          
          foreach f of local filenames {
          import excel using `"`f'"'
          gen source = `"`f'"'
          display `"Appending `f'"'
          append using `building'
          save `"`building'"', replace
          }
          
          export excel using 1_large_excel_file.xlsx, replace
          Stata/MP 14.1 (64-bit x86-64)
          Revision 19 May 2016
          Win 8.1

          Comment


          • #6
            Thank you for everyone's help!!! It is truly greatly appreciated.

            ​I tried out Pawinee Changphao 's code and it worked fine until the 3rd line of code:
            xls2dta, save("C:\Users\pxc093020\Dropbox\data\acq1.dta") : append
            I end up with this message in red:

            variable AB is byte in master but str30 in using data
            You could specify append's force option to ignore this numeric/string
            mismatch. The using variable would then be treated as if it contained
            numeric missing value.

            What does it mean and how should I go about resolving this issue?

            Comment


            • #7
              You must find out yourself. In one file data for AB are held as a str30 variable, capable of holding text such as "foo abcdefghijklmnopqrstuvwxyz" and in the other data for the variable with the same name are held as byte, namely small integers.

              This could be anything from stray text that someone has written in as a note in just one cell, to a serious mismatch in what is called what in different places.

              So, the only way to be sure is to go back to the original spreadsheets.

              I've wasted many happy hours with researchers who realise that they need to use Stata for some serious analysis, so need to import data from spreadsheet files. But then they are bitten by their habits of adding text in what Stata sees as data fields or by minor or major inconsistencies in what goes in which column or their column headers.

              Comment


              • #8
                If you have identified the problem, note that xls2dta allows the execution of commands on the imported datasets, which might help solve the problem if it is systematic.

                Best
                Daniel

                Comment


                • #9
                  After looking through the spreadsheets, the AB variable seems to be a mix of numbers (bytes) in some excel files and words (str30) in other files. Would there be a way of just ignoring this issue and leaving them be (as in leaving numbers as numbers and the words as words)?

                  Comment


                  • #10
                    Originally posted by carter dreymond View Post
                    After looking through the spreadsheets, the AB variable seems to be a mix of numbers (bytes) in some excel files and words (str30) in other files. Would there be a way of just ignoring this issue and leaving them be (as in leaving numbers as numbers and the words as words)?
                    You could use -tostring- for that variable after importing the data and before you save and append the file (take care with the precision of numeric variables that you -tostring-). If the 'AB' var is already a string in some of the files, it will skip the -tostring- command for that step in the loop, and keep going.
                    You can decide at the end of the process whether to then -destring- or otherwise recode/reformat the string version of AB.

                    So, piggybacking off Carole's helpful code, you'd add something like:

                    Code:
                    clear
                    tempfile building
                    save `building', emptyok
                    
                    local filenames: dir "c:\file\path\sub\folder" files "*.xls*"
                    
                    foreach f of local filenames {
                    import excel using `"`f'"'
                    gen source = `"`f'"'
                    display `"Appending `f'"'
                      tostring AB, replace //or generate a new var
                    append using `building'
                    save `"`building'"', replace
                    }
                    
                    export excel using 1_large_excel_file.xlsx, replace
                    Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

                    Comment


                    • #11
                      eric_a_booth Thank for your post Eric! I'm actually just using pawinee's code for now. I've imported all the excel files, saved them into .dta files, appended them all into one .dta file. The problem now is that when I try to export it back into one excel file, I get "too few variables specified". Any thoughts on how to fix this?

                      Comment


                      • #12
                        First, be really careful with that force command in append. You may find that you're losing data that you care about that way.
                        Second, we'll be way more likely to be able to help you figure out the error you're getting if you show us the code you used.
                        Please check the FAQ (link at the top of the screen) for information on how to present code and some tips on how to structure questions clearly to improve the chances that someone here will be able to help you.

                        Comment


                        • #13
                          So here's the code that I used -

                          Code:
                          Code:
                          clear
                          ssc install xls2dta
                          xls2dta, save("/Users/carterdre/Desktop/Part 1") : import excel "/Users/carterdre/Desktop/Part 1"
                          xls2dta, save("/Users/carterdre/Desktop/Part 1/data1.dta") : append, force
                          export excel using filename.xlsx, replace
                          With regards to the force command, do you have any other suggestions for leaving the strings as strings and the bytes as the way they are?

                          Many thanks!

                          Comment


                          • #14
                            Since the numeric values in AB are clearly byte (otherwise Stata would not store them as such) tostring will not result in loss of information. Three is, of course, no way a variable can be byte and str# at the same time. The resulting variable will be of the latter type, meaning that the numeric values cannot be used in computations. Anyway, xls2dta (which is probably from SSC, as should have been mentioned) is essentially a wrapper for the code suggested by Clyde and Carole. It is just taking this step by step. Here is the translation of Eric's suggestion

                            Code:
                            xls2dta, save("/Users/carterdre/Desktop/Part 1") : import excel "/Users/carterdre/Desktop/Part 1"
                            xls2dta : xeq tostring AB , replace
                            xls2dta, save("/Users/carterdre/Desktop/Part 1/data1.dta") : append
                            Concerning the error message, first note that the code above does not alter the data in memory and does not load the combined file. If you want to load it , specify the clear option in the third line instead of saving(). Alternatively, you just use data1.dta. This is the second thing to note. You try to export filename.xlsx, but this name is never mentioned in your code before. My guess is, it does either not exist or is empty. With the exact code above, try

                            Code:
                            use "/Users/carterdre/Desktop/Part 1/data.dta"
                            export excel data1.xlsx
                            Best
                            Daniel

                            Comment


                            • #15
                              daniel klein Thank you very much Daniel!! It worked for most of the tasks.

                              I tried the code again without the following line and it seemed to work.
                              Code:
                               xls2dta : xeq tostring AF , replace
                              However, once I got all the "sub .dta" files and tried to append them together I get the message:

                              variable AF is byte in master but str45 in using data
                              You could specify append's force option to ignore this numeric/string
                              mismatch. The using variable would then be treated as if it contained
                              numeric missing value.

                              This is where I used the line of code above. However, this message popped up:

                              variable AF not found
                              an error occured while processing /Users/carterdre/Desktop/p/data2.dta

                              Not really sure why that's the case since I thought that line of code was what was needed.
                              Last edited by carter dreymond; 31 Mar 2016, 17:07.

                              Comment

                              Working...
                              X