Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop over specific files

    I have many files in a folder and would like STATA to read in a subset of them.

    All files are .txt files. They have the following naming convention:

    date_formtype_identifier_someothervariable.txt

    I only want to read in if the formtype is type 1, type 2 or type 3. I do not want to read in any other type.

    For example: I would like to read in these files

    20160101_type1_75_89.txt
    20151231_type3_89_00.txt

    But not these:

    20140715_type7_65_52.txt
    20130223_type5_32_07.txt


    I am not sure how to tell the loop to only consider type1, type2, type3?

    Code:
    cd "E:\dir"
    set more off
    
    local myfilelist : dir . files "*.txt"
    quietly foreach `x' in local type1 type2 typ3 myfilelist {
        import delimited using *_`x'_*_.txt
    Thanks in advance.

  • #2
    I think the import delimited line has a mistake.

    It should be:

    Code:
    import delimited using "*_`x'_*_.txt"

    Comment


    • #3
      Several small mistakes here.

      import delimited needs a specific filename (once locals and for that matter globals have been evaluated).

      Various ways to do this. Here's one. Necessarily I can't test it.


      Code:
      cd "E:\dir"
      
      forval j = 1/3 {
         local myfilelist : dir . files "*type`j'*.txt"
         quietly foreach x of local myfilelist {
             import delimited using "`x'"
         }
      }



      So your outer loop is just over 1 2 3 and then you get a file list for each acceptable type.

      Once you have a filelist, the items in it will be entire filenames.

      In essence
      foreach allows only a few standard forms and whatever is not documented is forbidden.

      Comment


      • #4
        Have you tried

        Code:
        local myfilelist1 : dir . files "*type1*.txt"
        local myfilelist2 : dir . files "*type2*.txt"
        local myfilelist3 : dir . files "*type3*.txt"
        local myfilelist : "`myfilelist1' `myfilelist2' `myfilelist3'" 
         quietly foreach x of local myfilelist {     import delimited using `x'.txt

        Comment


        • #5
          Nick,

          Thanks for the reply.

          I should have been more specific, I didn't think it mattered. My apologies.

          The form types are not 1 2 3, they are "10k", "10ka", etc. I did not think that mattered at the time of my original post, but now I think it does.

          Would the loop given this be:

          Code:
           
           cd "E:\dir"  foreach j of 10k 10ka {    local myfilelist : dir . files "*`j'*.txt"    quietly foreach x of local myfilelist {        import delimited using "`x'"    } }
          Thanks again. And my apologies for not being more specific in the first post.

          Comment


          • #6
            I don't know why the format of the code came out incorrect.

            Here is the corrected code.

            Code:
            cd "E:\dir"
            
            foreach j of 10k 10ka {
               local myfilelist : dir . files "*`j'*.txt"
               quietly foreach x of local myfilelist {
                   import delimited using "`x'"
               }
            }

            Comment


            • #7
              Jesse,

              Thanks for the input. Sadly, your method does not work. I get an error after this line.

              Code:
              local myfilelist : "`myfilelist1' `myfilelist2' `myfilelist3'"
              The error is: " not allowed r(101). (The quote at the beginning is not mine, that is part of the error.)

              Comment


              • #8
                As you'll appreciate, it's hard for us to hack into your computer to find the real filenames behind your question. That's good news on other grounds. In turn that's meant to seem slightly amusing, not sarcastic.

                But I repeat one hint from earlier.

                foreach really is not very flexible.

                Code:
                foreach j of 10k 10ka  {
                will not work at all. You can have of followed by an acceptable keyword or in followed by a list but there is no mix and match to the syntax.


                Code:
                foreach j in 10k 10ka {
                is the simple fix.

                Meta-comment: Copying and pasting code from one part of the forum software to another often seems to lose end-of-line information. That bites me all the time. When it does, I copy and paste into my favourite text editor, and get it as I want, and that seems to fix the problem.

                Comment


                • #9
                  #7 The colon : from #4 doesn't seem a good idea. Cut it out.

                  Comment


                  • #10
                    Nick,

                    The new way seems to work. Thanks alot for the help.

                    Comment


                    • #11
                      Nick is right, the colon should either be dropped or replaced by an equal sign. That's what you get when you first want to use macro list functionality but then decide against it...

                      Comment


                      • #12
                        There's a new program on SSC called runby that can help those who want to loop over a list of files. When paired with filelist, also from SSC, you get a simple and powerful technique to process a bunch of files at once. It's simpler because instead of having to work with a long macro of file names, you first create a dataset of files you want to process and then use runby to apply commands for each file. When you create the dataset of files, you can leverage the full complement of Stata data management tools to finesse the list to what you want.

                        To illustrate the process and relate it to this thread, I created a sub-directory called "demo_dir" within Stata's current directory and filled it with 74 text files, each containing a single observation from the standard auto dataset using the following code:
                        Code:
                        clear all
                        cap mkdir demo_dir
                        sysuse auto
                        forvalues i = 1/`=_N' {
                            local type = word(make[`i'],1)
                            local fname = "20140101_`type'_`i'.txt"
                            dis "`fname'"
                            export delimited in `i' using "demo_dir/`fname'", replace
                        }
                        The file name includes a "type", made up from the first word of the car's make. Here are the first 5 files in the directory:
                        Code:
                        . filelist , dir(demo_dir)
                        Number of files found = 74
                        
                        . list filename in 1/5
                        
                             +----------------------+
                             | filename             |
                             |----------------------|
                          1. | 20140101_AMC_1.txt   |
                          2. | 20140101_AMC_2.txt   |
                          3. | 20140101_AMC_3.txt   |
                          4. | 20140101_Audi_53.txt |
                          5. | 20140101_Audi_54.txt |
                             +----------------------+
                        The following code makes a dataset of all the files in the "demo_dir" subdirectory, extracts the car type, prunes the list to 3 types:
                        Code:
                        * fill the data in memory with all files in the "demo_dir" subdirectory
                        filelist , dir(demo_dir)
                        
                        * extract the car type
                        gen car_type = regexs(1) if regexm(filename,"_([^_]+)_")
                        
                        * prune the list to 3 types
                        keep if inlist(car_type, "AMC", "Buick", "Olds")
                        sort car_type filename
                        list, sepby(car_type)
                        and the results:
                        Code:
                        . list, sepby(car_type)
                        
                             +-----------------------------------------------------+
                             | dirname    filename                fsize   car_type |
                             |-----------------------------------------------------|
                          1. | demo_dir   20140101_AMC_1.txt        151        AMC |
                          2. | demo_dir   20140101_AMC_2.txt        142        AMC |
                          3. | demo_dir   20140101_AMC_3.txt        147        AMC |
                             |-----------------------------------------------------|
                          4. | demo_dir   20140101_Buick_10.txt     153      Buick |
                          5. | demo_dir   20140101_Buick_4.txt      153      Buick |
                          6. | demo_dir   20140101_Buick_5.txt      151      Buick |
                          7. | demo_dir   20140101_Buick_6.txt      146      Buick |
                          8. | demo_dir   20140101_Buick_7.txt      147      Buick |
                          9. | demo_dir   20140101_Buick_8.txt      149      Buick |
                         10. | demo_dir   20140101_Buick_9.txt      154      Buick |
                             |-----------------------------------------------------|
                         11. | demo_dir   20140101_Olds_35.txt      145       Olds |
                         12. | demo_dir   20140101_Olds_36.txt      152       Olds |
                         13. | demo_dir   20140101_Olds_37.txt      152       Olds |
                         14. | demo_dir   20140101_Olds_38.txt      146       Olds |
                         15. | demo_dir   20140101_Olds_39.txt      150       Olds |
                         16. | demo_dir   20140101_Olds_40.txt      146       Olds |
                         17. | demo_dir   20140101_Olds_41.txt      154       Olds |
                             +-----------------------------------------------------+
                        Once you are satisfied that the final list is what you want, you can use runby to run a small Stata program for each filename. The program contains all the commands you would like to use to process each file. In the example below, I import each file, create a source and a car_type variable, and leave the data in memory. With runby, what's left in memory when the user's program terminates is considered results and accumulates. When runby has processed all by-groups (in this case all filenames), the accumulated results replace the data in memory.
                        Code:
                        clear all
                        
                        * fill the data in memory with all files in the "demo_dir" subdirectory
                        filelist , dir(demo_dir) norecur
                        
                        * extract the car type
                        gen car_type = regexs(1) if regexm(filename,"_([^_]+)_")
                        
                        * prune the list to 3 types
                        keep if inlist(car_type, "AMC", "Buick", "Olds")
                        sort car_type filename
                        list, sepby(car_type)
                        
                        * import and append
                        program my_import_routine
                          local f = filename[1]
                          local t = car_type[1]
                          import delimited using "demo_dir/`f'", clear
                          gen source = "`f'"
                          gen car_type = "`t'"
                        end
                        runby my_import_routine, by(filename)
                        list make-trunk source car_type, sepby(car_type)
                        and the results:
                        Code:
                        . runby my_import_routine, by(filename)
                        
                        --------------------------------------
                        Number of by-groups    =            17
                        by-groups with errors  =             0
                        by-groups with no data =             0
                        Observations processed =            17
                        Observations saved     =            17
                        --------------------------------------
                        
                        . list make-trunk source car_type, sepby(car_type)
                        
                             +--------------------------------------------------------------------------------------------+
                             |           make   price   mpg   rep78   headroom   trunk                  source   car_type |
                             |--------------------------------------------------------------------------------------------|
                          1. |    AMC Concord    4099    22       3        2.5      11      20140101_AMC_1.txt        AMC |
                          2. |      AMC Pacer    4749    17       3          3      11      20140101_AMC_2.txt        AMC |
                          3. |     AMC Spirit    3799    22       .          3      12      20140101_AMC_3.txt        AMC |
                             |--------------------------------------------------------------------------------------------|
                          4. |  Buick Skylark    4082    19       3        3.5      13   20140101_Buick_10.txt      Buick |
                          5. |  Buick Century    4816    20       3        4.5      16    20140101_Buick_4.txt      Buick |
                          6. |  Buick Electra    7827    15       4          4      20    20140101_Buick_5.txt      Buick |
                          7. |  Buick LeSabre    5788    18       3          4      21    20140101_Buick_6.txt      Buick |
                          8. |     Buick Opel    4453    26       .          3      10    20140101_Buick_7.txt      Buick |
                          9. |    Buick Regal    5189    20       3          2      16    20140101_Buick_8.txt      Buick |
                         10. |  Buick Riviera   10372    16       3        3.5      17    20140101_Buick_9.txt      Buick |
                             |--------------------------------------------------------------------------------------------|
                         11. |        Olds 98    8814    21       4          4      20    20140101_Olds_35.txt       Olds |
                         12. | Olds Cutl Supr    5172    19       3          2      16    20140101_Olds_36.txt       Olds |
                         13. |   Olds Cutlass    4733    19       3        4.5      16    20140101_Olds_37.txt       Olds |
                         14. |  Olds Delta 88    4890    18       4          4      20    20140101_Olds_38.txt       Olds |
                         15. |     Olds Omega    4181    19       3        4.5      14    20140101_Olds_39.txt       Olds |
                         16. |  Olds Starfire    4195    24       1          2      10    20140101_Olds_40.txt       Olds |
                         17. |  Olds Toronado   10371    16       3        3.5      17    20140101_Olds_41.txt       Olds |
                             +--------------------------------------------------------------------------------------------+

                        Comment

                        Working...
                        X