Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting several separate excel files to several separate dta files?

    Is there a way to do this all at one? I have 70 excel files looking to convert to dta files so I can just easily append it to a master dta file. Stata novice here.

  • #2
    Daniel Klein's -xls2dta- program, available from SSC, is probably the simplest way to do this.

    Note that even if your data come from a well-respect and trusted source, it is likely that the 70 data sets will not all be entirely compatible for appending. There are likely to be some situations where the names for the same thing are different, or one file has string entries where the other has numeric. -append- will choke on those differences, or produce unusable results if it does not. Do not just assume that you can just smoothly append all these data sets to each other. It is more likely that some cleaning, perhaps extensive cleaning, will be required first, unless you are very, very lucky.

    Comment


    • #3
      Hi Clyde thank you for your response. I tried to utilize the program and have stumbled upon the following error:

      *This was the code I used. Note that the file is just the folder with the 70 files
      xls2dta : import excel /Users/dinardorodriguez/Desktop/SNAPP Raw Data

      *This is what I was outputted
      invalid Raw
      r(198);

      Comment


      • #4
        You need to put quotes around that pathname because it contains blank spaces.
        Code:
        xls2dta : import excel "/Users/dinardorodriguez/Desktop/SNAPP Raw Data"

        Comment


        • #5
          Okay thank you so much! Now the next step would be to append all of the dta files onto one dta dile. I checked and confiirm that all of the files are cleaned but was wondering if there was a way to have it do it for all the dta files since the alternative is to use the "append using ..." code but that is a bit time consuming

          Comment


          • #6
            If the files are compatible and do not need further cleaning, you can follow your initial xls2dta with

            Code:
            xls2dta , clear : append
            to append the previously converted datasets. This code will clear whatever data was loaded before the call to xls2dta.

            The full code is then

            Code:
            xls2dta : import excel "/Users/dinardorodriguez/Desktop/SNAPP Raw Data"
            xls2dta , clear : append
            If you do not want to convert the single Excel files to Stata datasets but only want one combined file, you could also

            Code:
            xls2dta , clear : append using "/Users/dinardorodriguez/Desktop/SNAPP Raw Data"
            Best
            Daniel

            Comment


            • #7
              Thank you Daniel! Is there a way to incorporate codes with xls2dta to add a variable that takes the name of the original file name the observation came from? I would want iti to serve as reference in case it is needed

              Comment


              • #8
                Originally posted by Rafael Rodriguez View Post
                Thank you Daniel! Is there a way to incorporate codes with xls2dta to add a variable that takes the name of the original file name the observation came from? I would want iti to serve as reference in case it is needed
                Sorry I just reread this and found it to be confusing so I wanted to clarify. What I am tryinig to do is append (Daniel's code worked) and include a variable for the master file with all the observation from all the sheets that names the file name in which the observation came from.

                Comment


                • #9
                  xls2dta has a generate() option for that; look it up in the help file. You can specify up to two names and the respective variables will hold the filename and sheetname each observation came from.

                  Best
                  Daniel

                  Comment


                  • #10
                    Originally posted by daniel klein View Post
                    xls2dta has a generate() option for that; look it up in the help file. You can specify up to two names and the respective variables will hold the filename and sheetname each observation came from.

                    Best
                    Daniel
                    Okay so I tried referencing it but I am getting an error.

                    **Code being used
                    xls2dta [, generate(newvar1 [newvar2])] : import excel "/Users/dinardorodriguez/Desktop/OLD DATA"

                    **Error message when outputted
                    weights not allowed
                    r(101);

                    Comment


                    • #11
                      Originally posted by Rafael Rodriguez View Post

                      Okay so I tried referencing it but I am getting an error.

                      **Code being used
                      xls2dta [, generate(newvar1 [newvar2])] : import excel "/Users/dinardorodriguez/Desktop/OLD DATA"

                      **Error message when outputted
                      weights not allowed
                      r(101);
                      **I also tried the following code:

                      xls2dta, [clear, generate (newvar1 [newvar2])] : append using "/Users/dinardorodriguez/Desktop/OLD DATA"

                      **and got the following error:

                      invalid 'generate'

                      Comment


                      • #12
                        I recommend you read the short entry on

                        Code:
                        help language
                        You are not supposed to type the (square) brackets. In Stata's help files, brackets are used to denote optional arguments. The brackets are usually not to be typed; the exceptions being weights and explicit subscripts. Thus, the syntax diagram

                        Code:
                        xls2dta [ , generate(newvar1) ]
                        indicate that you must type at least

                        Code:
                        xls2dta
                        and you may type

                        Code:
                        xls2dta , generate(newvar1)
                        where you may/should replace elements in italics with contents that is specific to your situation. Here, you may want to type

                        Code:
                        xls2dta , generate(filename)
                        Best
                        Daniel

                        Comment


                        • #13
                          Code:
                          cd "your directory"
                          
                          local excels: dir . files "*.xlsx"
                          foreach excel in `excels' {
                              import excel using `excel',clear
                              save `excel'.dta,replace
                          }
                          Something to note:
                          1. first, put your directory path after "cd" command
                          2. check your Excel extension, sometimes it is .xls or others, depending on Excel version
                          3. if the first line of your Excel file is the variable name you want to keep, add "firstrow" to the "import" command
                          4. the final dta file will probably have the name like "xxx.xlsx.dta" with the Excel extension in there, there are some ways that you can move the Excel extension from the dta file name, but I will just stop here.

                          Just want to add that the above user-written command can do what you want as well.
                          Last edited by Neo Zhou; 04 Dec 2019, 18:55.

                          Comment

                          Working...
                          X