Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove interim dta files

    In merging and cleaning data, it is not uncommon to create many interim dta files before arriving at the final dta file.

    There is often no reason to keep the interim dta files, and the clutter can be confusing. Is there a tidy command to delete all the dta files except the final one in a given folder?

  • #2
    Here's one solution. Wait until your final data is in memory but hasn't been saved to disk yet, then run this:
    Code:
    shell del *.dta /* Remove interim datasets */
    save final_data.dta, replace /* Save the final dataset. */
    Limitations:
    • The del command is Windows / DOS specific, If you're using MacOS or Unix, I think you'll need to replace del with rm?
    • The command assumes that the only dta files in the working directory are ones that you won't need after your final dta file is ready. If you have dta files from some other sources, which you might like to keep, my solution will delete them and you'll need to do something else.

    Comment


    • #3
      Well, why not create the interim files as -tempfile-s in the first place? Then Stata will automatically remove them when the program you are running finishes--no explicit code for that required.

      Comment


      • #4
        Originally posted by paulvonhippel View Post
        Limitations:
        • If you have dta files from some other sources, which you might like to keep, my solution will delete them
        It might be worth noting that the files are deleted permanently; no recovery is possible. I guess this might be one of the reasons Stata's built-in erase (synonym rm) does not allow wildcards. The approach might be convenient but it is also dangerous. Use it with caution! As pointed out by Clyde, temporary files are generally the safer approach.

        Comment


        • #5
          Thanks! I did not know about tempfiles. To use them, as I understand it, I would just precede every save with a tempfile using the same file name -- e.g.,
          Code:
          tempfile data_so_far
          save data_so_far, replace
          Alternatively, I could have one monster tempfile statement naming all the tempfiles in the do file header --
          Code:
          tempfile a b c d e
          ...
          save a, replace
          ...
          save b, replace
          ...
          save c, replace
          save data_so_far, replace
          That's a little more concise, but also seems harder to maintain, since every time you wrote a new save statement, or changed a datafile name, you'd have to go back to the header and update your tempfile statement

          Comment


          • #6
            Yes, your conceptual understanding is correct. For the reason you noted, my own practice is to do a separate -tempfile- statement for each file, rather than one master at the beginning.

            But you don't have the syntax quite right. It should be
            Code:
            tempfile data_so_far
            save `data_so_far'


            Similarly, any subsequent use of the file (whether -use-, or -merge- or whatever) must include the local macro quotes around the name.

            Comment


            • #7
              Then, just for completeness, these files are stored wherever the environment variable STATA_TMP (I think that’s the right spelling) points to. On occasion, like when Stata crashes, those temp files can linger. You can safely navigate to the directory to clean out those files without risking loss of the datasets you actually want to keep.

              Comment

              Working...
              X