Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • looping on multiple files

    Hi statalist community!

    I am stuck with this problem for more than two weeks and have studied the web meanwhile. I have multiple stata files (all in R2 folder) I need to make changes on and merge with a master file. At fist step my code to make changes, does not work:

    Code:
    local dir "C:\Users\Paper 2\Data\cdrdist\R2"
    local data: dir "`dir'" files "*.dta"
    local data `data'
    
    foreach file of local data {
    drop _all
    
    use "`dir'/`data'", clear       
    duplicates drop FQmetainstanceID,force
    seva,replace
    }
    I'd appreciate if you could take a look at my code. I think I need to read more about local directory macros.

    regards,

  • #2
    Please read the FAQ for excellent advice about how to most effectively ask questions in this Forum. In particular, saying that something "does not work" doesn't convey any information. There are many, many ways in which a particular code may produce results that are not what you want, both as a result of the code itself and your expectations, either or both of which may be wrong.

    With what you have shown, those who want to help you are left to guess what is problematic for you. It is best to show (by copying from the Results window or your log file) what you are getting, and, if it isn't obvious, how it differs from what you expect.

    That said, there are a couple of things:

    1. The command -local data `data'- does nothing. It isn't harmful, but it just overwrites local macro data with what is already in it.

    2. There is no command -seva-. I think that's a typo for -save-.

    Another point worth asking about: are you running this code all at one time? If you are "chunking" the code and running a few lines at a time, then it is guaranteed to fail because any local macros defined in one "chunk" will go out of scope at the end of that chunk and will be non-existent when you get to subsequent chunks.

    Also, the handling of quotes and macros is sometimes tricky. It is safer to use compound double quotes `" "' than regular double quotes " ". I don't know for a fact that changing this will fix your code, but it might.

    If these possibilities don't help you solve your problem, I recommend you run -set tracedepth 1- and -set trace on- and then run the code again. The output will let you see how Stata is understanding your local macros, and that may shed light on why your code is not doing what you expect. If it doesn't, post that output here, and maybe somebody will be able to help you figure it out.

    Comment


    • #3
      Clyde, thank you so much for your valuable input. Sorry for making the common mistake again.

      I am using Stata 13.0 on Windows and Mac (interchangeably, Windows at work and mac at home) and trying to open a bunch of data files in a folder one by one and make changes. Before setting the trace on, I didn't get any error. I just saw my codes in result window again. I tried to change the code based on Stata logic (as much as I know) and try again.
      I am not sure in which line I have to use compound double quotes, but I tried couple of changes and none worked.

      Here is the result I get with trace one:

      Code:
      .
      . local dir "/Users/salehbabazadeh/Box Sync/Dissertation documents/Paper 2/Data/cdrdist/R2"
      
      . local data: dir "`dir'" files "*.dta"
      
      .
      . foreach file of local data {
        2. drop _all
        3.
      . use "`dir'/`data'", clear      
        4. duplicates drop FQmetainstanceID,force
        5. save,replace
        6. }
      - foreach file of local data {
      - drop _all
      - use "`dir'/`data'", clear
      = use "/Users/Paper 2/Data/cdrdist/R2/"cdr2_sdp_fivemeth.dta" "cdr2_sdp_staff1.dta" "cdr2_sdp_larc.dta" "cdr2_sdp_staff.dta"
      >  "cdr2_sdp_threemeth.dta" "cdr2_sdp_nostockout.dta" "cdr2_sdp_stockout.dta" "cdr2_sdp_proj.dta" "cdr2_sdp_offer.dta" "cdr2_sdp_phar_hc.dta" "cdr3_sdp_fees.dta"", clear
      invalid 'cdr2_sdp_fivemeth.dta'
        duplicates drop FQmetainstanceID,force
        save,replace
        }
      r(198);
      
      end of do-file
      
      r(198);
      As you can see in the output, I see the name of files that I need to open, make changes and save.
      Last edited by Saleh Babazadeh; 03 Jan 2018, 22:13.

      Comment


      • #4
        Yes, that is because you have said -use "`dir'/`data'", clear-, and `data' is, as you have defined it, a list of files. But -use- can only handle one file at a time. So you need to say -use `"`dir'/`file''', clear- (file coming from -foreach file of local data-).

        Comment


        • #5
          Clyde, you are the best. Thanks for taking the time to help me.

          Comment


          • #6
            Dear Statalisters,

            I am facing a problem while looping over files in different folders.

            Specifically, I want to import .csv files contained in different folders. I have my data organized into "year" folders and, within each of these, I have "country" .csv files. However - and this is the reason for my problem - country files may change across "year" folders which leads Stata to return error when not finding a (non-existent) file. I tried to address this with a if condition to deal with the error and keep looping over the remaining countries, but it did not work.

            The code I have written is:

            Code:
            local year "1999 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014"
            local countries "at be bg ch cy cz de dk ee es fi fr gr hr hu ie is it lt lu lv mt nl pl pt rc se si sk uk"
            
            foreach y of local year{
                di `y'
                foreach c of local countries{
                    di "`c'"
                    clear
                    import delimited "`originaldata'\Volume I\AdhocModules\\`y'\\`c'`y'_y.csv"
                    if _rc {
                        di "`c' does not exist in `y'"
                    }    
                    keep qhhnum coeff quarter hhnum rec hhseqnum refyear refweek intweek ///
                    country countryb countryw existpr hatlev1d leavclas mainstat national /// 
                    refweek refyear seekreas seekwork sex wstator year
                    gen source="v1_ahm"
                    save "`datapath'\1_LFS\V1_ahm\\`c'`y'.dta", replace 
                }
            }
            *
            And the error i get is of the type: "file C:........1999\at1999_y.csv not found" r(601).

            Possibly, a second-best option would involve a loop over .csv files and over directories.

            I would greatly appreciate your help or directions to fix this issue.

            Thank you,
            Andre

            Comment


            • #7
              In order to manage the error, you need to use capture on the command that may generate an error. Otherwise, Stata displays the error message and stops. You should also switch to forward slashes in file paths, these are safe to use and will work on all platforms.

              I prefer to handle this type of problem by first making a dataset of files that need to be processed using filelist (from SSC). It's hard to debug macro/file paths issues from within a loop so I think its better to finalize all the details of what you want to work on before you start processing the content of these files.

              It appears that all your CSV files are in a directory called "AdhocModules". Change Stata's directory to the one that contains "AdhocModules" and run the following:
              Code:
              filelist, dir(AdhocModules) pattern("*.csv")
              gen country = substr(filename,1,2)
              gen year = substr(filename,3,4)
              The above will replace the data in memory with a list of all files that end with ".csv" in "AdhocModules" and in any of its subdirectories. It should look something like:
              Code:
              . list in 1/5
              
                   +-----------------------------------------------------------+
                   | dirname             filename       fsize   country   year |
                   |-----------------------------------------------------------|
                1. | AdhocModules/1999   at1999_y.csv     151        at   1999 |
                2. | AdhocModules/1999   be1999_y.csv     151        be   1999 |
                3. | AdhocModules/1999   bg1999_y.csv     151        bg   1999 |
                4. | AdhocModules/1999   ch1999_y.csv     151        ch   1999 |
                5. | AdhocModules/1999   cy1999_y.csv     151        cy   1999 |
                   +-----------------------------------------------------------+
              
              .
              Make sure that all the files you want are there and drop those that do not belong. When you are satisfied with the list, you can now process each file separately. I recommend using runby (from SSC). With runby, you define a program that contains all the code you want to execute on by-groups of observations. You can process the list of files in memory with something like:

              Code:
              program drop _all
              program doit
                  local year = year
                  local country = country
                  local path = dirname + "/" + filename
                  import delim "`path'", clear
                  gen source = "v1_ahm"
                  gen year = `year'
                  gen country = "`country'"
                  save "v1_ahm/`country'`year'.dta", replace
                  clear
              end
              runby doit, by(dirname filename) verbose
              By using by-groups of dirname filename, there's only one observation in memory when the doit program is called. The first part of the program copies the values from variables to local macros. The data in memory is then replaced with the content of the file once imported. The content of local macros is then copied back into variables (if needed). Finally, the data from the CSV file being processed is saved as a Stata dataset in the "v1_ahm" subdirectory.

              With runby, anything that left in memory at the time the doit program terminates is considered results and accumulates. If your intent is to append the content of all these files, remove the final clear and when runby finishes processing all by-groups, the data in memory will contain the content from all the files processed.

              Comment


              • #8
                Dear Robert,

                Thank you for your thorough explanation! That's indeed an efficient way of importing files while keeping things organized.
                Just one caveat (which makes entirely sense): I needed to specify (local) path macros inside the -doit-, since Stata does not carry them over the program.

                An alternative to your suggestion that works fine too would be:

                Code:
                local year 1999 2002 2003 
                foreach y of local year {
                    cd "`originaldata'/Volume 1/AdhocModules/`y'"
                    clear
                    local list : dir . files "*.csv"
                    foreach f of local list {
                        di "`f'"
                        import delimited "`f'"
                        local newf = subinstr("`f'","_y.csv","",.)
                        save "`datapath'/1_LFS/V1_ahm/`newf'.dta", replace
                        clear
                    }
                }
                Best, Andre

                Comment


                • #9
                  Dear Statalist,

                  Let me keep using this thread to ask a related question on looping over multiple files.

                  I have country-year .dta files within a single folder and I was wondering what would be the most efficient way to run some code on each of them.

                  My .dta files are of the type: "AT1980, AT1981, ...., AT2015, BE1985, ...,BE2015" and I wanted to loop over countries and over years.
                  So, schematically I want to

                  - start in AT
                  - loop over 1980-2015
                  - proceed to BE
                  - loop over 1985-2015
                  - proceed to the next country
                  - etc

                  If looping over countries is not a problem, looping over years in this case is a bit more tricky, since the datafiles I have don't start for every country in the same year.
                  So, ideally I would like to have a way for the loop to start by considering a minimum year, say 1980, and if no such file exists to iterate to the next year and start the loop there. This would in principle address missing files in particular years in the middle of the sample too.

                  Eg:

                  Code:
                  foreach c of local countries {
                      foreach year in 1980(1)2015 {
                          (if there is no AT1980, skip to AT1981 and so on)    
                          code....
                      }
                  }
                  Of course, I see other ways where I can achieve the same final result but where I have to fully specify the years I want to loop over for each country or to create year folders and run the code using the code nicely suggested by Robert Picard above. However, since this is a difficulty that I often face it would be extremely helpful any help on this.

                  Thank you,
                  Andre F Coelho



                  Comment


                  • #10
                    Hi,
                    I would like to loop over multiple dta files in same (working) directory. But the following code runs and nothing happens:


                    Code:
                    . cd "D:\Working directory"
                    
                    . local all_sets: dir . files "F*.dta L*.dta S*.dta"
                    
                    . foreach x of local all_sets{
                    2. use `x', clear
                    3. so FacilityID
                    4. keep FacilityID
                    5. save chk_"`x'", replace
                    6. }
                    I have been trying so hard to get this right. Some help would be amazing !

                    Comment


                    • #11
                      The problem is with "F*.dta L*.dta S*.dta" You are trying to put three patterns into one -local...:dir- command. You can only do one pattern. There are two direct ways to work around this. One is to use -local all_set: dir . files "*.dta"- and then inside your loop put an -if- command that runs the code only -if inlist(substr(`"`x'"', 1, 1), "F", "L", "S")-. The other is to build up the all_sets local in stages:

                      Code:
                      local all_sets: dir . files "F*.dta"
                      local lfiles: dir . files "L*.dta"
                      local all_sets: list all_sets | lfiles
                      local sfiles: dir . files "S*.dta"
                      local all_sets: list all_sets | sfiles
                      Added: One more thing that could still trip you up. If you are working on a Mac, filenames are case sensitive, so "F*.dta" will not catch filenames beginning with f, etc.

                      Comment


                      • #12
                        Dear Clyde,
                        Thanks so much ! solved my problem. Also thanks for the Mac note. I use windows though.
                        Mahmood

                        Comment


                        • #13
                          Hey, stata forum. I am trying to merge multiple files from same directory but its not working out. Would appreciate if anyone could help. Thanks in advance.

                          local dir "\SMI2012"
                          local data: dir "`dir'" files "*.dta"
                          foreach file of local data {
                          use `"`dir'/`file'", clear
                          sort dpc
                          save, replace
                          }
                          use "Fixed assets", clear
                          foreach file of local data {
                          merge 1:1 dpc using `file'
                          }

                          Comment


                          • #14
                            econ stat Please don't run two threads simultaneously. I will reply to this one at https://www.statalist.org/forums/for...iles-in-a-loop


                            https://www.statalist.org/forums/help#adviceextras #1 explains about bumping, new threads, and so forth. But the point is just common sense: posting on the same matter in different threads is at best likely to confuse and waste people's time if you get multiple answers and at worst looks thoughtless if not rude.
                            Last edited by Nick Cox; 12 Apr 2020, 04:28.

                            Comment


                            • #15
                              Hi Statlisters,
                              I also became stuck with making a loop command.
                              I have 17 industries and here bringing only two industry in example. I have industry_22 and industry_24. I cleaned the data for industry 22 and now making a loop command to do the same for second industry and then will extend it to 17 industries. Actudlly I will further extended to several industries too. All the files are in one folder.
                              Each file have importers exports for 2001, 2002, 2003, 2004 and sic_code. I selecting only the top five from them and droping others and then appending
                              The commands show the I executed several commands for cleaning the data as well.

                              global myfolder "C:\Users\shafi\Desktop\Data_Cleaning_ERV\20221122 "

                              cd "C:\Users\shafi\Desktop\Data_Cleaning_ERV\20221122 \Industry_EXIM\Bahrain\br_cleandata"

                              clear all
                              set more off

                              #delimit;
                              clear all;


                              foreach x in indust_temp_22.dta indust_temp_24.dta {;

                              cd "C:\Users\shafi\Desktop\Data_Cleaning_ERV\20221122 \Industry_EXIM\Bahrain\br_cleandata"
                              local filelist: dir "br_cleandata" indust_temp_22 indust_temp_24 "*.dta"


                              foreach x in Importers SIC Exportedvaluein2001 {;
                              use `filelist'.dta, clear ;
                              keep Importers SIC Exportedvaluein2001`x';
                              rename Exportedvaluein2001 volume `x';
                              gen export=1 `x';
                              gen year=2001 `x';
                              save temp1.dta.dta, replace;
                              clear all;
                              };

                              foreach x in Importers SIC Exportedvaluein2002{;
                              use `filelist', clear ;
                              keep Importers SIC Exportedvaluein2002 `x';
                              rename Exportedvaluein2002 volume `x';
                              gen export=1 `x';
                              gen year=2002 `x';
                              save temp2.dta, replace;
                              clear all;
                              };

                              foreach x in Importers SIC Exportedvaluein2003 {;
                              use `filelist', clear ;
                              keep Importers SIC Exportedvaluein2003 `x';
                              rename Exportedvaluein2003 volume `x';
                              gen export=1 `x';
                              gen year=2003 `x';
                              save temp3.dta, replace;
                              clear all;
                              };

                              foreach x in Importers SIC Exportedvaluein2004{;
                              use `filelist', clear ;
                              keep Importers SIC Exportedvaluein2004 `x';
                              rename Exportedvaluein2004 volume `x';
                              gen export=1 `x';
                              gen year=2004 `x';
                              save temp4.dta, replace;
                              clear all;
                              };

                              local temp temp1.dta temp2.dta temp3.dta temp4.dta

                              foreach in local temp.dta {;
                              append using`temp' ;
                              rename Importer target;
                              gen source="Bahrain" `x';

                              egen mean_volume=mean(volume), by (target) `x';
                              drop volume `x';
                              keep if year==2001 `x';
                              gsort -mean_volume `x';
                              keep if _n<=5 `x';
                              egen total_volume=total(mean_volume) `x';
                              gen weight=mean_volume/total_volume `x';
                              gen ranking=_n `x';
                              gsort -weight `x';
                              drop year `x';
                              rename target country `x';
                              merge 1:1 country using currency_code.dta;
                              keep if _merge==3 `x';
                              drop _merge;

                              save trade_list.dta, replace;
                              clear all;
                              };


                              foreach x in currency_code {;

                              use exchange_long_format.dta;
                              keep if currency_code=="BHD" `x';
                              save temp2_BHD.dta, replace ;
                              clear all;

                              use trade_list.dta;
                              merge 1:m currency_code using exchange_long_format.dta;
                              keep if _merge==3 `x';
                              drop _merge `x';
                              save temp1_weight_exchange.dta, replace;

                              merge m:1 currency_code year month using temp2_BHD.dta;
                              drop _merge `x';
                              sort currency_code year month, stable `x' ;
                              egen wanted_exchangerate = mean(exchange_rate/(currency_code=="BHD")), by(year month) `x';
                              drop if currency_code=="BHD" `x';
                              gen exchange_inBHD=exchange_rate/wanted_exchangerate `x';
                              gen weight_times_ex=exchange_inBHD*weight `x';
                              save temp_raw_temp1.dta, replace;

                              collapse (mean) weighted_exchange_rate=weight_times_ex, by (year month );
                              gen one=1 `x';
                              save exchange_rate_weighted_data.dta, replace;
                              clear all;
                              };

                              foreach x in SIC export source one{;
                              use temp_raw_temp1.dta;
                              keep SIC export source `x';
                              gen one=1 `x';
                              keep if _n==1 `x';
                              merge 1:m one using exchange_rate_weighted_data.dta;
                              drop `x'
                              save real_exchange_br_22.dta;

                              use real_exchange_br_22;
                              forvalue i=1/2 {;
                              append using exchange_temp`i'.dta, force;
                              };
                              };


                              #delimit cr



                              I am new with Stata and need your help for correcting any error, especially setting local macros and setting cd for the datafiles locations.

                              Thanks in advance!

                              Comment

                              Working...
                              X