Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maximum file size among files in a folder

    Hi,

    I would like to automate the process of determining the maximum file size among files located in a folder. A code that would work if the size of a file was stored somewhere after use of the "ls" command, say in r(size), would be:

    clear
    local test = "C:/Users/Stata/Test"
    cd `test'
    local max = 0
    local filelist: dir "." files "*.dta"
    foreach file in `filelist' {
    ls `file'
    local size = r(size)
    if `size'>`max'{
    local max = `size'
    }
    }
    di `max'

    Looking forward to your hints and suggestions.

    Thank you!

    Max

  • #2
    -checksum- returns the length of file in bytes in r(filelen).

    Comment


    • #3
      Thank you Aljar!

      Comment


      • #4
        An easy but inefficient way of determining the file size. Checksum will also read the file which will take time. For a bunch of gb files that can take a lot of time. Sergiy

        Comment


        • #5
          Running the code below in a directory with 10 files of about 1 GB each took 55 seconds. The same code took 13 seconds for a directory with about 3000 files with a combined size of 1.2 GB.
          Code:
          local max = 0
          local filelist: dir . files "*.*"
          foreach file in `filelist' {
            quietly checksum "`file'"
            local size = r(filelen)
            if `size' > `max' {
              local max = `size'
            }
          }
          di `max'

          Comment


          • #6
            The new version of filelist (from SSC) can determine the file size with no overhead. See this recent announcement.

            Comment


            • #7
              I compared filelist to the code in post #5. I looked again at 10 files of about 1 GB each. With the code from post #5 it took 58 seconds to identify the size of the largest file.
              Code:
              timer on 1
              local max = 0
              local filelist: dir . files "*.*"
              foreach file in `filelist' {
                quietly checksum "`file'"
                local size = r(filelen)
                if `size' > `max' {
                  local max = `size'
                }
              }
              di `max'
              timer off 1
              timer list 1
              Result of above code:
              Code:
              . di `max'
              1.014e+09
              
              . timer list 1
                 1:     58.01 /        1 =      58.0120
              With Robert's filelist the size of the largest file was returned almost immediately, in less than 1 second. Unfortunately, I couldn't time the task because filelist seems to clear all timers.
              Code:
              timer on 1
              filelist
              sum fsize
              di r(max)
              timer off 1
              timer list 1
              Result of above code (note that timer list returns no result):
              Code:
              . di r(max)
              1.014e+09
              
              . timer list 1

              Comment


              • #8
                Thanks again Friedrich. I had to scratch my head on this one for a few seconds. Turns out that the clear statement in the program is interpreted using version 9 syntax and that apparently included clearing the timers. I switched to drop _all and that should fix the problem. I'll continue holding on to the fixed version for a few days just in case more comes up.

                In the mean time, you can use

                Code:
                set rmsg on
                to display the execution time of each command.



                Comment

                Working...
                X