Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import multiple Excel files and merge as one dta file by using tempfile in loop

    I am trying to import multiple Excel files (from a few different stations for the year 2013) and merge them into one temp .dta file (which then contains all stations in the year 2013) without creating and saving all separate .dta files first. So I would like to use tempfile to save all the separate Excel files as dta file, and subsequently merge these tempfiles into one temporaty dta file for the year 2013. After this I would like to repeat this process for the other folders containing data from 2014 and 2015 and in the end append all temporary year files into one saved dta file. I have researched as found code for a loop to do this however I end up with a .dta file containing only the last of the imported 2013 Excel files.

    I am using Stata/IC version 12.1 on Windows 7

    This is the code I used for the 2013 folder which contains "HLW_13.xls", "JVG_13.xls", "SHK_13.xls" and "VDS_13.xls":
    Code:
    clear
    local workdir "...\MyWorkDir2013"
    cd `workdir'
    local files: dir "`workdir'" files "*.xls"
    
    foreach x in `files' {
    import excel using `x', firstrow clear
    format datum %td
    rename *, lower
    local noextension=subinstr("`x'",".xls","",.)
    tempfile `noextension'
    save "``noextension''", replace
    }
    
    fs *_13.dta
    local getfile "use"
    foreach file in `r(files)' {
        `getfile' `file'
        local getfile "merge 1:1 datum uur using "
        drop _merge
    }
    tempfile TrafficAms2013
    save "`TrafficAms2013'", replace
    Could you tell me what is wrong in my code?

  • #2
    You are saving temporary files in the first loop. These files have names like ST_00000001.tmp.Your call to fs (presumably from SSC) therefore will not find any files that end in _13.dta.

    Instead of showing how to do this along the lines you propose, I will show (not tested code) how to do something similar with xls2dta (SSC).

    Code:
    forvalues year = 2013 2014 2015 {
        
        cd ".../MyWorkDir`year'"
        
        tempfile tmp`year'
        
        xls2dta : import excel .
        xls2dta : xeq format datum %td ; rename * , lower
        xls2dta , clear : merge 1:1 datum uur
        xls2dta , eraseok : erase
        
        if (`year' > 2013) {
            use finalfile , clear
            append using "`year_`year''"
        }
        
        save final_file.dta , replace
    }
    The code above will create permanent dta-files on the fly put it will also erase them, so you will end up with only one permanent dta-file.

    Best
    Daniel

    Comment


    • #3
      Thanks Daniel, I understand now that the fs command cannot find the temporary files. But if I don't create tempfiles I'll get an error in the second loop. Before I try your suggestion using xls2dta I would like to know what's wrong in the second loop. Could you help me with that?

      If I create permanent files in the first loop like this:
      Code:
      clear
      local workdir "...MyWorkDir_2013"
      cd `workdir'
      local files: dir "`workdir'" files "*.xls"
      
      foreach x in `files' {
      import excel using `x', firstrow clear
      format datum %td
      rename *, lower
      local noextension=subinstr("`x'",".xls","",.)
      save "`noextension'", replace
      }
      
      fs *_13.dta
      local getfile "use"
      foreach file in `r(files)' {
          `getfile' `file', clear
          local getfile "merge 1:1 datum uur using"
          drop _merge
      }
      tempfile TrafficAms2013
      save "`TrafficAms2013'", replace
      Stata returns:

      . cd `workdir'
      M:\MilieuBasisdata\Projecten\Lucht\2016\EL_16_089_ Onderzoek_Emissiefactoren_EC\analyse\Amsterdam\Ams _Int_13

      . local files: dir "`workdir'" files "*.xls"

      .
      . foreach x in `files' {
      2. import excel using `x', firstrow clear
      3. format datum %td
      4. rename *, lower
      5. local noextension=subinstr("`x'",".xls","",.)
      6. *tempfile `noextension'
      . save "`noextension'", replace
      7. }
      file hlw_13.dta saved
      file jvg_13.dta saved
      file shk_13.dta saved
      file vds_13.dta saved

      .
      . fs *_13.dta
      hlw_13.dta jvg_13.dta shk_13.dta vds_13.dta

      . local getfile "use"

      . foreach file in `r(files)' {
      2. `getfile' `file', clear
      3. local getfile "merge 1:1 datum uur using"
      4. drop _merge
      5. }
      variable _merge not found
      r(111);

      Comment


      • #4
        That is obvious. The first time through the loop you are not merging any files, hence variable _merge which is created by the merge command does not yet exist in the dataset.

        Best
        Daniel

        Comment


        • #5
          But if I remove drop _merge, then after the second time through the loop Stata says:

          _merge already defined
          r(110);
          So where to place _drop _merge then?

          Comment


          • #6
            Perhaps not at all. That depends on whether you want to keep track of the merges (recommended) or not. So either do

            Code:
            local count 0
            foreach file in `r(files)' {
                ...
                if (!`count') {
                    use ...
                }
                else {
                    merge 1:1 datum uur using ...
                    rename _merge _merge`count'
                }
                local ++count
                ...
            }
            Keeping all _merge variables, numbered sequentially (this is what xls2dta would do by default). or,

            Code:
            foreach file in `r(files)' {
                ...
                if (`count' <= 1) {
                    use ...
                }
                else {
                    merge 1:1 datum uur using ... , nogenerate
                }
                ...
            }
            not creating the variable in the first place.

            The first approach is the safer one.

            Best
            Daniel
            Last edited by daniel klein; 29 Nov 2016, 10:15.

            Comment

            Working...
            X