Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Append multiple csv files from different folders as dta with IDs

    DESCRIBTION OF FOLDERS AND FILES
    Level 1 = Main folder named HH_Plots, directory "C:\Atinga Files\IPADS DOCS\IPADS PhD Program\Theses\Weather Data\HH_Plots"
    Level 2 = Subfolders (900) named 1 to 900, (each folder represent information from a household)
    Level 3 = Files in each subfolder (8 csv files) named A, B, C, D, E, F, G, H (type of information from each household)

    I want to append each type of file (example file A) in all the 900 folders as a single file, and also generate an ID variable (1 - 900) showing which folder data is appended from

    Approach;

    I first imported all csv files and saved all files as dta in respective folders named eg. A.dta to H.dta

    Code:
    cd "C:\Atinga Files\IPADS DOCS\IPADS PhD Program\Theses\Weather Data\HH_Plots"
    filelist
    
    drop if filename == ".DS_Store" 
    gen path = dirname + "\" + filename 
    levelsof path, local(tofile)
    
    
    foreach x in `tofile' {
         import delimited using "`x'", clear
        
     drop in 1/8
     keep v1 v3
     rename ( v1 v3 )(Time Prcp)
     split Time , parse("-" " ") generate(Time)
     split Prcp , parse(",") generate(Prcp)
    
       local tosave = subinstr("`x'", ".csv", "", .)
       save "`tosave'.dta", replace 
    }
    I am however not able to figure out how to generate an ID variable and append each type of file.

  • #2
    Check out the generate(newvar) option to the -append- command. It puts a file serial number into a new variable for each record. Should have been the filename, but a number is also useful. See https://www.stata.com/manuals/dappend.pdf


    Code:
    append *,generate(file)

    Comment


    • #3
      Please how do I integrate the code
      Code:
      append *, generate (file)
      to append each type of file? I am struggling to understand how.

      Comment


      • #4
        My bad. -append- doesn't actually take wildcards. You would have to use the extended macro facility to make a list of filenames. -Append- will take a list of files.
        Code:
        local files : dir "." files "*.dta"
        append using `files',generate(file)
        will combine all the .dta files in the current directory into the current workspace. The first file appended will have file==1, the second file==2, etc. You could use the -pos- extended macro function to relate these numbers to the file names. Sorry my first answer was misleading.

        See https://www.stata.com/manuals13/pmacro.pdf and https://www.stata.com/manuals/dappend.pdf

        Daniel Feenberg

        Comment


        • #5
          Hello Daniel Feenberg, Thanks so much for the support. I was able to follow after struggling for about a week but it saved me months of work.
          Saved time working on 7000 + csv files.
          The below worked based on each type of file.


          Code:
          clear
          
          * Setup;
          **********
          cd "C:\Atinga Files\IPADS DOCS\IPADS PhD Program\Theses\Weather Data\HH_Plots"
          filelist, dir("C:\Atinga Files\IPADS DOCS\IPADS PhD Program\Theses\Weather Data\HH_Plots") pat("A_*.csv") save("All_rain.dta")
          global save_dir "C:\Atinga Files\IPADS DOCS\IPADS PhD Program\Theses\Weather Data\HH_Plots\Rain_output_files"
          
          * Generating unique ID (Plot_nb) for all type A csv files based on each file's directory path
          *********************************************************************************************
          use "All_rain.dta", clear
             split dirname , parse("/" ) generate(Plot_nb) 
             destring Plot_nb2, force replace
             sort Plot_nb2
             save All_rain, replace
             
          * Importing all type A csv files and saving as .dta
          ***************************************************
                   local obs = _N
                   forvalues i=1/`obs' {
                     use "All_rain.dta" in `i', clear
                     local f = dirname + "/" + filename
                     insheet using "`f'", clear
                     gen id2 = `i'       
                     save "${save_dir}/rain_`i'", replace
                   }
          
                   
          * Appending all type A files into one dta file
          **************************************************
          cd "${save_dir}"
          use rain_1.dta
          
          forval i=2/900{
              append using rain_`i'.dta
              save Rain_all_Append.dta, replace
          }

          Comment

          Working...
          X