Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appending several files with xlsx format and different names located in one folder

    I am trying to append several excel files (about 80 files) with different names located in the same folder using Stata 17. I want to save all of them in one dataset with dta format. I have looked at previous posts about appending files and tried to test them for my issue. However, I can’t reach the desired result. Could you please help debug the issue?
    The following is my code:

    // step 1:
    cd "C:\Users\pc1\Desktop\trd"
    local files : dir "C:\Users\pc1\Desktop\trd" files "*.xlsx"
    foreach file in `files' {
    clear
    import excel `file'
    ***newfolder is the folder where I want the dta files save here.
    save newfolder\\`file'.dta, replace
    }

    // step 2:
    local files : dir "C:\Users\pc1\Desktop\trd" files "*.dta"
    foreach file in `files' {
    append using newfolder\\`file'
    }

  • #2
    At the end of Step 1, you have the last .dta file you worked with is in memory. If in Step 2 you then append all the files in the new folder, that dataset will be included again.

    I don't know what other issue or issues you're dealing with, as you show code, but do not mention any specific problems.

    Comment


    • #3
      The backslashes are going to be a problem, too; see Cox (2008) for an explanation.

      For the generic task at hand, also see xls2dta from SSC.


      Cox, N. J. 2008. Stata tip 65: Beware the Backstabbing Backslash. The Stata Journal, 8(3), 446--447.

      Comment


      • #4
        Doubling up the backslash is intended to deal with the usual problem of following a backslash with a local macro reference.

        Comment


        • #5
          Nick is obviously right. I did not pay attention to the double backslash, i.e., escaped backslashes.
          Last edited by daniel klein; 24 Jun 2023, 02:52.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            At the end of Step 1, you have the last .dta file you worked with is in memory. If in Step 2 you then append all the files in the new folder, that dataset will be included again.

            I don't know what other issue or issues you're dealing with, as you show code, but do not mention any specific problems.
            No problem with double backslash.
            step1yields this error:
            using required
            r(100);
            Last edited by Aysan Namdar; 24 Jun 2023, 03:27.

            Comment


            • #7
              That's a surprise because help import excel tells me that using is at choice. That was true in Stata 17 as well as now in 18.

              Here's a guess. There are spaces in your filenames. So, import excel sees what it thinks is an attempt to specify a name list specifying which columns are to be imported, and for that using is required.

              You can check this by looking at the list of filenames. You can try to fix this if it's true with

              Code:
              import excel "`file'"
              and similarly later.

              Comment


              • #8
                Originally posted by daniel klein View Post
                The backslashes are going to be a problem, too; see Cox (2008) for an explanation.

                For the generic task at hand, also see xls2dta from SSC.


                Cox, N. J. 2008. Stata tip 65: Beware the Backstabbing Backslash. The Stata Journal, 8(3), 446--447.
                Thank you Daniel. I had used xls2dta before, but now for some reasons I have to append files using loops.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  That's a surprise because help import excel tells me that using is at choice. That was true in Stata 17 as well as now in 18.

                  Here's a guess. There are spaces in your filenames. So, import excel sees what it thinks is an attempt to specify a name list specifying which columns are to be imported, and for that using is required.

                  You can check this by looking at the list of filenames. You can try to fix this if it's true with

                  Code:
                  import excel "`file'"
                  and similarly later.
                  Yes, there're spaces in my filenames. I tried "`file'" instead of `file', but still I receive error:
                  file filename.xlsx not found
                  r(601);

                  my 1st step code:
                  local files : dir "C:\Users\pc1\Desktop\trd" files "*.xlsx"
                  foreach file in `files' {
                  clear
                  import excel "`file'"
                  append using "`file'"
                  save `file'.dta, replace
                  }

                  Comment


                  • #10
                    Assuming that your working directory is not C:\Users\pc1\Desktop\trd, you need to specify that path in the import excel command:

                    Code:
                    import excel "C:\Users\pc1\Desktop\trd\\`file'"

                    Comment


                    • #11
                      #1 included the line

                      Code:
                      cd "C:\Users\pc1\Desktop\trd"
                      implying that Aysan is working in the right directory or folder.

                      But the appearance of the literal string filename in the error message is still puzzling and makes me wonder whether the syntax has been changed. That is, it perhaps is no surprise that there is no filename.xlsx; but it is a surprise that the code instructs Stata to look for it.

                      Comment


                      • #12
                        Many thanks Nick. Yeah, it's puzzling and my mind's been for some days in a destructive loop thinking over the problem to fix !..
                        But, I found another way to get rid of the troublesome filenames, I renamed them using the codes provided in the forum, say numbers, 1 to n, i.e. the maximum number of the files in the folder.
                        I pay particular attention to how local macros work, so I ran the whole chunk (in step 1) as one, (Many thanks for Clyde Schechter and Broline Saginin).

                        Step 1:
                        local files : dir . files "*.xlsx"
                        di `files'
                        local files: list sort files
                        local i = 1
                        foreach file of local files {
                        local newname "`i'.xlsx"
                        shell rename "`file'" "`newname'"
                        di "rename `file' to `newname'"
                        local i = `i' + 1
                        }

                        Step 2:
                        forvalues i=1/100 {
                        clear
                        import excel `i'.xlsx
                        save trd_daily\\`i'.dta, replace
                        }

                        Step 3:
                        forvalues i = 1/100 {
                        append using trd_daily\\`i'.dta
                        * my files have blank rows
                        dropmiss, obs force
                        }

                        Fortunately, I got the desired result but looking for the solution for the above error.
                        Regards,
                        Aysan
                        Last edited by Aysan Namdar; 25 Jun 2023, 22:33.

                        Comment


                        • #13
                          Originally posted by Nick Cox View Post
                          #1 included the line

                          Code:
                          cd "C:\Users\pc1\Desktop\trd"
                          implying that Aysan is working in the right directory or folder.

                          But the appearance of the literal string filename in the error message is still puzzling and makes me wonder whether the syntax has been changed. That is, it perhaps is no surprise that there is no filename.xlsx; but it is a surprise that the code instructs Stata to look for it.
                          I would point out that the filenames are composed of spaces, dashes, parentheses, numbers, and Persian and English words. In a word, whatever you want !

                          Comment


                          • #14
                            dropmiss is from the Stata Journal but superseded by missings from the same place. (FAQ Advice #12: please spell out where community-contributed commands you refer to come from.)

                            Of more concern: what is the "above error"?

                            Comment


                            • #15
                              Many thanks Nick. Yeah, it's puzzling and my mind's been for some days in a destructive loop thinking over the problem to fix !..
                              But, I found another way to get rid of the troublesome filenames, I renamed them using the codes provided in the forum, say numbers, 1 to n, i.e. the maximum number of the files in the folder.
                              I pay particular attention to how local macros work, so I ran the whole chunk (in step 1) as one, (Many thanks for Clyde Schechter and Broline Saginin).

                              Step 1:
                              local files : dir . files "*.xlsx"
                              di `files'
                              local files: list sort files
                              local i = 1
                              foreach file of local files {
                              local newname "`i'.xlsx"
                              shell rename "`file'" "`newname'"
                              di "rename `file' to `newname'"
                              local i = `i' + 1
                              }

                              Step 2:
                              forvalues i=1/100 {
                              clear
                              import excel `i'.xlsx
                              save trd_daily\\`i'.dta, replace
                              }

                              Step 3:
                              forvalues i = 1/100 {
                              append using trd_daily\\`i'.dta
                              * my files have blank rows
                              dropmiss, obs force
                              }

                              Fortunately, I got the desired result but looking for the solution for the above error.
                              Regards,
                              Aysan
                              Last edited by Aysan Namdar; 25 Jun 2023, 23:52.

                              Comment

                              Working...
                              X