Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop over all folders in a folder

    I learned how to loop over files in a folder: https://www.stata.com/statalist/arch.../msg00620.html

    I learned how to loop over variables in a file: https://www.stata.com/statalist/arch.../msg01140.html

    But now I want to "cd" to each and all subfolders of my main "upper most" folder, and then do some operations.

    If the upper most folder contains "ultimate" and "main". And "ultimate" contains "rabbit" and "dog", and "main" contains "tiger" and "lion" and "cat". Then if I do what I want in a manual way, it will be like this.

    Code:
    cd ultimate
    cd rabbit
    (some operations)
    cd ..
    cd dog
    (some operations)
    cd ..
    cd ..
    
    cd main
    cd tiger
    (some operations)
    cd ..
    cd lion
    (some operations)
    cd ..
    cd cat
    (some operations)
    cd ..
    But I am wondering if this can be made smart by something like this:

    Code:
    Code loop over "ultimate" and "main" {
    
         Code to loop over "rabbit" and "dog" when it's "ultimate"
    
         Code to loop over "tiger" and "lion" and "cat" when it's "main"
    
    }

  • #2
    Something like this:

    Code:
    local level0_folder "<put in top level directory path here>"
    
    local level1_folders: dir `"`level0_folder'"' dirs *
    foreach fldr1 of local level1_folders {
        cd `"`level0_folder'/`fldr1'"'
        dis `"Working in sub-folder `fldr1' ..."'
        * various commands for the level 1 sub-folders go here
        local level2_folders: dir `"`level0_folder'/`fldr1'"' dirs *
        foreach fldr2 of local level2_folders {
            cd `"`level0_folder'/`fldr1'/`fldr2'"'
            dis `"Working in sub-folder `fldr2' ..."'
            * various commands for the level 2 sub-folders go here
        }
    }

    Comment


    • #3
      Hi James, it sounds like you want to recursively search through a folder and its subfolders. A recursive command call will allow you to walk through all levels of a directory tree, regardless of how many levels there are, or how many subdirectories are on each level. I provide some pseudo code below. I don't use Stata's program syntax because I think this is conceptually cleaner in this case, but you will want to use a user built program if you want this to work.

      Code:
      function recursive_directory_search(current_directory)
          foreach sub_directory in current_directory:
              recursive_directory_search(sub_directory)
              // do the work for the current directory here.
      end recursive_directory_search
      Recursion is a pretty advanced programing technique. It's simple enough in theory (the function calls itself), but can be difficult to put into practice. There can also be hidden efficiency costs associated with recursion that can be especially confusing for the uninitiated. I would expect a second year undergraduate computer science or engineering student to be able to implement this algorithm correctly on their own, but these are fields where loops and recursion are presented early and covered in a number of different contexts. Your millage may vary.

      Comment


      • #4
        To add to Daniel's point: the code in #2 is most useful if, as described, you have two levels of sub-folders and you have essentially the same set of operations you need to carry out for any given level of sub-folder. If you want a much more general code that allows for any possible depth of folder tree, that will probably need recursion. I would probably also code it somewhat differently if you wanted entirely different operations to be carried out in every sub-folder.

        Comment


        • #5
          Yes, I have to agree. There are a number of ways to do this kind of thing depending on the underlying problem. For example, you might want to be able to pass a function into your function. This new function could determine the actual business logic that needs to be taken care of in each directory. That way you could write any function you like and have it run inside of each subdirectory. That would be cool (I think), and useful in a framework, but even more difficult to write and to debug. Changing the logic "on the fly" so that each sub-directory has completely different business logic (as in Hemanshu's example) would be harder still to generalize. I was taught that one ought to write general code that makes it easier to solve a class of problems. This is how software engineers, the authors of a framework, Stata engineers, or even those on this forum who write user contributed commands tend to think. But the reality is that if you have a very specific "one off" problem that you don't expect to see again, it is often smarter to write "dumb" simpler code that is none-the-less easier to think through, easier to read, and easier to debug. If tomorrow you find yourself writing nearly the same dumb code, then maybe its time to start thinking about generalizing. Good programmers should always remember the KISS principle. Keep it simple, stupid.

          Why not just list out the path to every directory you care about (regardless of the level) and loop over that?

          Comment


          • #6
            Incidentally, the bash rm -rf command works by recursing through a directory and its subdirectories and deleting the directory tree starting from the bottom and moving all the way up to the directory you actually want to delete.

            Comment

            Working...
            X