Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about -project-: "using" an entire directory

    (Note: I do not actually need help with this problem anymore. Thanks, Robert! However, Robert asked that I post to Statalist so that he could share his answer more broadly.)

    I have a do file that operates on an entire directory of files. Currently, it contains a command like the following:
    Code:
    local files : dir “`latex_files'" files "*.tex"
    foreach file in `files' {
      project, uses("`latex_files’/`file'")
    }
    When a new file is added to the directory, the do file is not triggered, since -project- does not understand that the do file uses this newly-created file.

    Is there an elegant way to make this work? I’d rather not have to manually specify every single file I add to the directory.

    As an extension of this problem, what if my code had -project, original(…)- in place of -project, uses(…)-? In other words, what if these files were added exogenously, instead of created endogenously by upstream do files?
    Last edited by Nils Enevoldsen; 16 Jan 2015, 10:12. Reason: Hm. My tags didn't get picked up the first time.

  • #2
    Indeed, Nils contacted me directly and since this is such an interesting question, I asked him to post it here so that all could benefit from the discussion. Here's my initial reply (slightly edited):

    Since your code snippet includes a project, uses(), this means that the files are created somewhere else upstream in the project. So I'll assume that you want to post-process, in a single do-file, a bunch of LaTeX files created upstream in the project.

    Since you want to dynamically discover LaTex files in your post-processing do-file without having to manually declare dependencies, you have to somehow make the do-file dependent on what you would find.

    One approach would be to update a list of all LaTex files in the project each time you create, modify, or remove a LaTex file from the project. Since that could entail multiple do-files editing and overwriting the same list, that would violate rules for tracking dependencies since more than one do-file could alter the list.

    One solution is to bank on the fact that the project's master do-file will always run (except of course if there is absolutely no change in the project). I would recommend using filelist (mine, from SSC) because it creates a dataset of files and also because it can scan directories recursively. This means that you do not need to place all your LaTeX files in the same directory (i.e. each table can be created in a separate directory and all output can be saved locally with the do-file that create them). It would look something like:

    Code:
    *----------- master do-file -----------
    * Run a bunch of nested do-files that create LaTex files
    project, do("create_a_bunch_of_tex.do")
    
    * create a list of all LaTex files in a specific directory
    filelist, dir("`c(pwd)'/tables") pat("*.tex")
    export delimited "LaTeX_files.csv", replace
    project, creates("LaTeX_files.csv")
    
    project, do("post-process_LaTex.do")
    *------------------------------------------
    Code:
    *--------- post-process_LaTex.do ------------
    project, uses("LaTeX_files.csv")
    import delimited "LaTeX_files.csv", varname(1)
    forvalues i = 1/`c(N)' {
        local f = dirname[`i'] + "/" + filename[`i']
        project, uses("`f'") preserve
    }
    *------------------------------------------------------
    If "LaTeX_files.txt" changes in any way, "post-process_LaTex.do" will run again. If the set of LaTeX files does not change, -project- will use the dependencies declared in "post-process_LaTex.do" to determine if it needs to run it again.
    Last edited by Robert Picard; 16 Jan 2015, 11:09.

    Comment


    • #3
      Nils' extension of the original problem raises an interesting workflow issue: what if the set of files to be dynamically discovered is not known to project beforehand?

      One can indeed generate project, original() dependencies for files found in a directory and if any of these files change or are removed, then project will notice and run the appropriate do-files. But if a file is simply added to the directory (while nothing else changes in the project), then project has no way to observe the new addition unless something else changes in the project such that the do-file that builds the file list is run again.

      A replication build would note the new file(s) but the issue here is the special case where the only thing that changes is one or more new files in an input directory. This comes down to how to force project to run the master do-file and scan the input directory for new content. Here's an approach

      Code:
      *----------- master do-file -----------
      * note the state of the world in a directory
      filelist, dir("`c(pwd)'/world")
      export delimited "world_state.csv", replace
      project, creates("world_state.csv")
      
      * do stuff
      
      erase "world_state.csv"
      *------------------------------------------
      Last edited by Robert Picard; 16 Jan 2015, 11:11.

      Comment

      Working...
      X