Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Recursive loops in Stata

    Hi Statalist,

    I'm working on creating a program, filetree, that will loop through all files and folders after specifying the path in the syntax, and will print the entire file tree in an excel. What I'm having trouble with is how to have Stata recursively loop through one loop, that then knows to back up to different folder levels (so not identically recursive). Right, now, for each level of a folder I have a different nested loop, so my file tree is limited by how many nested loops I have.

    The syntax is:

    filetree "[File path, which must be in quotes]" using [Desired name of file tree]

    example:

    Code:
    filetree "C:\Users\Username\Dropbox\My_Folder" using filetree
    Here is a modified version of the program (this will print a file tree for 3 folders deep:

    Code:
    * Author: Lucia Goin
    * Contact: 
    * Purpose: Create file tree in excel
    *******************************************************************************
    
        program filetree
    
        * quotes must be around file path
        syntax anything using/, [DIRSOnly]
        
            if "`using'" == "" {
                loc using filetree
            }
        
            
        * things to automate: commas, and names
        
        * start writing report here, define directory
            loc maindir `anything'
            file open report using "`maindir'\\`using'.csv", write replace
    
        * write the directory
            file write report "`maindir'" _n
            
        * this will create local of all FILES
            loc files: dir "`maindir'" files *
        
        * will rewrite over files each time
            loc pn: word count `files'
    
        * write a directory with files
            if `pn' > 0 {
            
                foreach file of loc files {
                    file write report ",`file'" _n
                }
            }
            
    
        loc i 0
        loc sd: dir "`maindir'" dirs "*"
        loc dn: word count `sd'
        
        if `dn' > 0 {
        
        file write report _n
    
        foreach subdir of loc sd {
        
            loc subdirectory "`maindir'\\`subdir'"
            
            file write report ",`subdir'" _n
                
            * find all subfiles
                loc files: dir "`subdirectory'" files *
                
            * will rewrite over paperfiles each time
                loc n: word count `files'
            
            * write subfiles found above
                if `n' > 0 {
                
                foreach file of loc files {
                    file write report ",,`file'" _n
                    }
                }
    
        
        loc sd`i': dir "`subdirectory'" dirs "*"
        loc dn: word count `sd`i''
        
        if `dn' > 0 {
            
            file write report _n
            
            foreach subdir_1 of loc sd`i' {
            
                * actual directory path
                    loc subdirectory "`maindir'\\`subdir'\\`subdir_1'"
                    file write report ",,`subdir_1'" _n
                    
                * find all subfiles
                    loc files: dir "`subdirectory'" files *
                    
                * will rewrite over paperfiles each time
                    loc pn: word count `files'
                
                * write subfiles found above
                    if `pn' > 0 {
                    foreach file of loc files {
                        file write report ",,,`file'" _n
                        }
                    }                                        
        
        loc sd`i'`i': dir "`subdirectory'" dirs "*"
        loc dn: word count `sd`i'`i''
    
        if `dn' > 0 {
            
            file write report _n
            
            foreach subdir_2 of loc sd`i'`i' {
            
                * actual directory path
                    loc subdirectory "`maindir'\\`subdir'\\`subdir_1'\\`subdir_2'"
                    file write report ",,,`subdir_2'" _n
                    
                * find all subfiles
                    loc files: dir "`subdirectory'" files *
                    
                * will rewrite over paperfiles each time
                    loc pn: word count `files'
                
                * write subfiles found above
                    if `pn' > 0 {
                    
                    foreach file of loc files {
                        file write report ",,,,`file'" _n
                        }
    }        
    }
    }
    }
    }
    }
    }
    file close report
    end

  • #2
    Lucia,

    I don't know of a way to do this in Stata, but I'm pretty sure it can be done in Mata, which has it's own structured programming language and which, like C, Java, Perl, etc., can do recursive function calls. Learning Mata can involve a steep learning curve, especially for something like this, but perhaps someone here or in the Mata forum will have some already-cooked-up code that they can provide.

    Regards,
    Joe

    Comment


    • #3
      It can and has been done in Stata. See filelist (SSC, Picard).

      Best
      Daniel

      Comment


      • #4
        filelist uses Mata functions (some of which are recursive). You can look at the code to get some inspiration on how recursion works.

        I am not sure that one can write a recursive ado-file though. This program for example gives an error. It triggers an error above a certain number of recursion. Works fine wiht 10 but not with 100. Any idea why?

        Code:
        cap prog drop hello
        program define hello
        
                args i
                
                di "Hello `i'"
                local --i
                if (`i'>0) hello `i'
                else exit
                
               
        end
        
        
        hello 100

        Comment


        • #5
          Hi Christophe Kolodziejczyk, I ran the program with trace on, and this is the output before it breaks:

          Code:
           - end hello ---
                                                                                                          --- end hello ---
                                                                                                        ----- end hello ---
                                                                                                      ------- end hello ---
                                                                                                    --------- end hello ---
                                                                                                  ----------- end hello ---
                                                                                                ------------- end hello ---
                                                                                              --------------- end hello ---
                                                                                            ----------------- end hello ---
                                                                                          ------------------- end hello ---
                                                                                        --------------------- end hello ---
                                                                                      ----------------------- end hello ---
                                                                                    ------------------------- end hello ---
                                                                                  --------------------------- end hello ---
                                                                                ----------------------------- end hello ---
                                                                              ------------------------------- end hello ---
                                                                            --------------------------------- end hello ---
                                                                          ----------------------------------- end hello ---
                                                                        ------------------------------------- end hello ---
                                                                      --------------------------------------- end hello ---
                                                                    ----------------------------------------- end hello ---
                                                                  ------------------------------------------- end hello ---
                                                                --------------------------------------------- end hello ---
                                                              ----------------------------------------------- end hello ---
                                                            ------------------------------------------------- end hello ---
                                                          --------------------------------------------------- end hello ---
                                                        ----------------------------------------------------- end hello ---
                                                      ------------------------------------------------------- end hello ---
                                                    --------------------------------------------------------- end hello ---
                                                  ----------------------------------------------------------- end hello ---
                                                ------------------------------------------------------------- end hello ---
                                              --------------------------------------------------------------- end hello ---
                                            ----------------------------------------------------------------- end hello ---
                                          ------------------------------------------------------------------- end hello ---
                                        --------------------------------------------------------------------- end hello ---
                                      ----------------------------------------------------------------------- end hello ---
                                    ------------------------------------------------------------------------- end hello ---
                                  --------------------------------------------------------------------------- end hello ---
                                ----------------------------------------------------------------------------- end hello ---
                              ------------------------------------------------------------------------------- end hello ---
                            --------------------------------------------------------------------------------- end hello ---
                          ----------------------------------------------------------------------------------- end hello ---
                        ------------------------------------------------------------------------------------- end hello ---
                      --------------------------------------------------------------------------------------- end hello ---
                    ----------------------------------------------------------------------------------------- end hello ---
                  ------------------------------------------------------------------------------------------- end hello ---
                --------------------------------------------------------------------------------------------- end hello ---
              ----------------------------------------------------------------------------------------------- end hello ---
            ------------------------------------------------------------------------------------------------- end hello ---
          --------------------------------------------------------------------------------------------------- end hello ---
          Notice that it breaks right at "Hello 36," and the error says system limit exceeded - see manual. If you look at the manual (help limits) it says that the maximum # of nested do files is 64. I believe this is why this is breaking at 36. It is treating each iteration of the loop as a separate nested do. I'm not sure how to get around this, though.

          Comment


          • #6
            Note that going from 100 down to 37 (the last displayed line) is 64 levels of recursion, consistent with the limit of 64 nested do-files.

            The original version of filelist (Nov 2013) used the dir extended macro function and a recursive ado subprogram. You would need a directory structure with subdirectories 65 levels deep to hit the limit.

            I switched to Mata after a user found out that filelist would skip a directory with a large number of files (because all the file names would not fit in a single macro). The current version also reports the file size.

            Comment


            • #7
              Since you rarely see examples of recursive ado file programming on Statalist, here's the code for an older version of filelist:

              Code:
              *! 1.0.3 19feb2014 
              *! Robert Picard   [email protected]
              program define filelist
              
                  version 9.2
                  
                  syntax , ///
                      [ ///
                      Directory(string) ///
                      List ///
                      Pattern(string) ///
                      noRECursive ///
                      replace ///
                      Save(string) ///
                      ]
              
                  // default to all files pattern if not specified
                  if "`pattern'" == "" local pattern "*"
                  
                  // default to current directory if not spefified
                  if "`directory'" == "" local directory .
                  
                  preserve
                  
                  clear
                  gen dirname = ""
                  gen filename = ""
                  
                  filelist_recursive ,  dir("`directory'") pat("`pattern'") `recursive'
                  
                  dis as txt "Number of files found = " as res `=_N'
                  
                  sort dirname filename
                  
                  qui leftalign
                  
                  if "`list'" != "" {
                      gen filepath = dirname + "/" + filename
                      qui leftalign filepath
                      list filepath, noobs sepby(dirname)
                      drop filepath
                  }
                  
                  if "`save'" == "" restore, not
                  else {
                      save "`save'", `replace'
                      restore
                  }
                  
              
              end
              
              
              program define filelist_recursive
              
                  version 9.2
                  
                  syntax , ///
                      [ ///
                      Directory(string) ///
                      Pattern(string) ///
                      noRECursive ///
                      ]
                  
                  cap local flist: dir "`directory'" files "`pattern'"
                  if _rc == 134 {
                      dis as err "too many filenames in directory `directory'"
                      
                  }
                  
                  local i = _N
                  qui foreach f of local flist {
                      set obs `++i'
                      replace dirname = `"`directory'"' in `i'
                      replace filename = `"`f'"' in `i'
                  }
                      
                  if "`recursive'" == "" {
                      cap local dlist: dir "`directory'" dirs "*"
                      foreach d of local dlist {   
                          filelist_recursive ,  dir("`directory'/`d'") pat("`pattern'")
                      }
                  }
                  
              end
              
              
              program define leftalign
              
                  version 9.2
                  
                  syntax [varlist]
                  
                  qui ds `varlist', has(type string) 
              
                  foreach v in `r(varlist)' { 
                      local t : type `v' 
                      local n : subinstr local t "str" "" 
                      format `v' %-`n's
                  }
                  
                  des `r(varlist)'
                      
              end

              Comment


              • #8
                Lucia Goin and Robert Picard : Thanks for your comments and code. I could also have read the manual entry on limits to find out what was wrong .

                I actually wrote a similar program to filelist but with a different approach. Unfortunately it only works on Windows. I run a dir command in a shell and store the output in a text file. The text file is then parsed to Stata to retrieve the relevant info. Besides names and size of files you get other info like the date of creation of the file. In another version of the program I used the recursive approach in Mata that Robert has implemented and I am happy to see that I came up with a very similar solution. In particular the use of fseek() to compute the size of the file. I found out by searching on the net that this approach is not recommended in C for binary files. But I don't know if it is also an issue in Mata.

                My conclusion of this thread is that implementing recursive Stata-programs (ado-files) can have limitations. It depends on whether it is likely to hit the limit of 64 nested do-files.

                Comment

                Working...
                X