Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • loops to import excel file and merge

    Hi everyone,

    I have a folder that contains 26 individual .xlsx files, each named with the name of a country. In a separate folder I have 26 .dta files named with corresponding country names. I would like to use a loop to import the Excel files and merge in several variables from the corresponding dta file and then save as a new merged file, for each country. I currently have the following code and am getting stuck on the merge file name which I assume has to do with how I'm using the loop indicator in the file path.

    When I try with `i', I get the error ".../Merge format `i'.xlsx not found"
    with "`i'", I get the error "using required"
    and with `"i"', I get the error "too many ')' or ']'"

    Any help would be very much appreciated. Thanks.

    foreach i in 1/26 {
    import excel "C:\...\Merge format\`i'.xlsx", sheet("Labor") firstrow clear
    merge 1:1 WP5889 using "C:\...\Merge datasets"`i'".dta", keepusing(P8A P8B P8C)
    drop if _merge==2
    save `i'_verbatims, replace
    }
    Last edited by Elise Gordon; 19 Jul 2021, 02:19.

  • #2
    I have not looked at any details, but you typically do not want backslashes (\) when working with macros. Replace those with slashes (/) and see whether that fixes the problem.

    For an explanation, see Cox (2008).


    Cox, N. J. 2008. Stata tip 65: Beware the backstabbing backslash. The Stata Journal, 8(3), 446--447.

    Comment


    • #3
      Thanks Daniel, appreciate you taking the time to provide a suggested solution. That didn't fix the issue unfortunately but I will avoid backslashes in future.

      Comment


      • #4
        Originally posted by Elise Gordon View Post
        That didn't fix the issue
        I might need to look into the details, then. You need to provide more details, too.

        I do not see how you could possibly still be getting the error

        Code:
        ".../Merge format `i'.xlsx not found"
        with all backslashes replaced with slashes. So what exactly is "the issue" when you solve the problems with backlashes?


        Edit:

        I did miss another obvious problem in your code; sorry. You do not want

        Code:
        foreach i in 1/26
        Probably you want

        Code:
        forvalues i = 1/26
        instead.
        Last edited by daniel klein; 19 Jul 2021, 07:18.

        Comment


        • #5
          Thanks again, Daniel. I have made that change as well but I am still getting the same errors as in the original query.

          I know that having files open that you're trying to import can impact the ability of import code to run, could it be something similarly simple if there aren't any obvious errors in the code?

          It still appears that the issue lies with how I'm specifying the file path.

          I have copied my code below using code delimiters. Apologies, first time forum contributor.

          Code:
          foreach i in 1/26 {
          import excel "C:\...\Merge format\`i'.xlsx", sheet("Labor") firstrow clear
          merge 1:1 WP5889 using "C:\...\Merge datasets"`i'".dta", keepusing(P8A P8B P8C)
          drop if _merge==2
          save `i'_verbatims, replace
          }
          Last edited by Elise Gordon; 19 Jul 2021, 08:41.

          Comment


          • #6
            Do you get the error message

            Code:
            ../Merge format `i'.xlsx not found
            exactly? That is, does the error message contain the literal ` and ', i.e., left and right single quote character? If not, please post (i.e., copy and paste) the exact code that you are using along with the exact error message that Stata is giving you.

            Also, what do you get when you type

            Code:
            dir  C:/.../Merge format/"
            (where you obviously replace the ... with the specific subdirectories that you need)?

            Comment


            • #7
              Thanks for your help Daniel.

              When I run the following code:

              Code:
              forvalues i = 1/26 {
                  import excel "C:/.../Merge format/`i'.xlsx", sheet("Labor") firstrow clear
                  merge 1:1 WP5889 using "C:/.../Merge datasets/`i'.dta", keepusing(P8A P8B P8C P8D)            
                  drop if _merge==2
                  save `i'_FL_verbatims, replace
              }
              I get the error:
              file C:/.../Merge format/1.xlsx not found

              Is it looking for a number rather than string?

              Both code and error copied exactly except for omitting full file path.

              when I type dir "C:/.../Merge format/" I get the following:
              <dir> 7/19/21 13:58 .
              <dir> 7/19/21 13:58 ..
              32.8k 5/26/21 16:32 Albania.xlsx
              38.9k 5/26/21 16:33 BosniaandHerzegovina.xlsx
              39.8k 5/26/21 16:33 Bulgaria.xlsx
              31.3k 5/26/21 16:33 CongoBrazzaville.xlsx
              34.5k 5/26/21 16:33 CzechRepublic.xlsx
              43.0k 5/26/21 16:34 Ecuador.xlsx
              28.9k 5/26/21 16:34 Estonia.xlsx
              47.5k 5/26/21 16:35 Greece.xlsx
              33.6k 7/19/21 13:58 India.xlsx
              27.9k 5/26/21 16:37 Iraq.xlsx
              26.6k 5/26/21 16:39 Israel.xlsx
              33.4k 5/26/21 16:39 Italy.xlsx
              23.4k 5/26/21 16:40 IvoryCoast.xlsx
              26.3k 5/26/21 16:41 Laos.xlsx
              24.6k 5/26/21 16:41 Lithuania.xlsx
              30.4k 5/26/21 16:41 Malaysia.xlsx
              32.2k 5/26/21 16:41 Moldova.xlsx
              35.5k 5/26/21 16:41 Namibia.xlsx
              23.9k 5/26/21 16:42 Pakistan.xlsx
              145.9k 5/26/21 16:42 Peru.xlsx
              22.8k 5/26/21 16:42 Senegal.xlsx
              25.5k 5/26/21 16:42 SouthKorea.xlsx
              26.8k 5/26/21 16:42 Turkey.xlsx
              33.3k 5/26/21 16:43 UnitedKindom.xlsx
              37.1k 5/26/21 16:43 Venezuela.xlsx
              35.2k 5/26/21 16:43 Zimbabwe.xlsx
              Last edited by Elise Gordon; 19 Jul 2021, 09:04.

              Comment


              • #8
                The local macro `i' is substituted by the ith iteration of the forvalues loop. So, on the first iteration, `i'.xlsx is 1.xlsx.

                Since your files are not numbered, a preferable solution is to produce a list of files in the folder, and use that list in your loop:

                Code:
                local files: dir "C:/.../Merge format/" files "*.xlsx"
                foreach i of local files{
                    import excel "C:/.../Merge format/`i'", sheet("Labor") firstrow clear
                    local newi = subinstr("`i'",".xlsx","",.)
                    merge 1:1 WP5889 using "C:/.../Merge datasets/`newi'.dta", keepusing(P8A P8B P8C P8D)            
                    drop if _merge==2
                    save `newi'_FL_verbatims, replace
                }

                Comment


                • #9
                  Thanks for providing the details.


                  Originally posted by Elise Gordon View Post
                  I get the error:
                  file C:/.../Merge format/1.xlsx not found

                  Is it looking for a number rather than string?
                  Yes. Here is why:

                  With the loop

                  Code:
                  fovalues i = 1/26
                  you define a local macro, i, that takes on the values 1, 2, ..., 26. When you dereference `i', you are getting those numbers. Stata has no way of knowing that you want file names instead.

                  Here is one way to do what you want. First, you need to obtain a list of filenames. You obtain that list with Stata's extended functions.

                  Code:
                  local xlsx : dir "C:/.../Merge format" files "*.xlsx"
                  Now, you can loop over the filenames in local macro xlsx

                  Code:
                  foreach fn of local xlsx
                  Inside the loop, we need the filenames without extension
                  .xlsx; we need .xlsx (for import excel), .dta or no extension (for merge), and no extension (for save). The local macro xlsx, that we have created with the dir function, contains the filenames with extension .xlsx. One way of removing this extension is

                  Code:
                  local dta : subinstr local fn ".xlsx" ""
                  Putting this all together, we get

                  Code:
                  local xlsx : dir "C:/.../Merge format" files "*.xlsx"
                  
                  foreach fn of local xlsx {
                      local fn : subinstr local fn ".xlsx" ""
                      import excel "C:/.../Merge format/`fn'.xlsx" , sheet("Labor") firstrow clear
                      merge 1:1 WP5889 using "C:/Merge datasets/`fn'" ///
                          , keepusing(P8A P8B P8C P8D)
                      drop if _merge == 2
                      save `fn'_FL_verbatim , replace
                  }
                  Because the code relies on local macros, we need to run the entire code at once.



                  Edit:

                  My answer is essentially equivalent to #8; I added an explanation.
                  Last edited by daniel klein; 19 Jul 2021, 09:46.

                  Comment


                  • #10
                    Thank you so much Daniel and Ali, that worked perfectly! I definitely learned a few new things about loops.

                    Comment

                    Working...
                    X