Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -project- by Robert Picard: Running individual do-files

    Hey,

    I started using the program -project- by Robert Picard, to manage my current research project and share it with colleagues.
    It works just fine when building the complete project, but I cannot run individual do-files anymore (same applies to his Stata 12 example, so I guess it's not a coding mistake on my end).
    Stata prints the error:
    ". version 12
    . project, doinfo
    no project being built
    r(198);"

    Can anyone tell me how to run individual do-files, while using -project-? Because I think the program makes working in groups with Stata a lot easier.

    Thanks in advance!
    Philipp

  • #2
    project (from SSC) assumes that an unchanged do-file will always produce the same results if its inputs have not changed. When you use project, you build on this assumption and create a complex web with tens, hundreds, or even thousands of do-files. You hand over to project the task of tracking what has changed and the next time you build the project, only do-files affected by changes will run. In a typical workflow, the do-file you are currently editing will be the only do-file that runs when you build the project. But if you go back and edit a do-file that is further upstream, many do-files may run on the next build if these are affected in any way by the change.

    In order to use project, you have to embed in your do-files directives to indicate files used and created and directives to run nested do-files. Unfortunately, the downside is that you cannot run a do-file separately. But why would you want to do that? Just build the project again. If the only thing that has changed is this single do-file, then that's all that will run when you build the project again.

    Comment


    • #3
      Thanks Robert for the quick reply.
      I get your reasoning and it makes sense.
      In my case I just wanted to run a do-file which edits a dataset and check the results e.g. in the data-browser immediately.
      But I guess I can just load this dataset, after each project run.
      Thanks again for your reply and the awesome program!
      Best
      Philipp

      Comment


      • #4
        Note that if you want to inspect results as you develop a do-file, just put
        Code:
        project, break
        where you want the do-file to stop and build the project. The execution will stop at that point and the data in memory will be preserved. There's nothing special about this directive, it just generates an error which stops execution of the do-file. Personally, I use:
        Code:
        exit 99
        but any invalid command or syntax that will throw an error will do the trick. What follows is copied straight from the Result's window.
        Code:
        .         
        . * I'm in the middle of a do-file I'm working on...
        . 
        .         sysuse auto, clear
        (1978 Automobile Data)
        
        .         poop
        command poop is unrecognized
        r(199);
        
        end of do-file
              name:  plog_1
               log:  /Users/robert/Documents/temp/examples_v12/ex12.smcl
          log type:  smcl
         closed on:  26 Oct 2018, 11:38:15
        ------------------------------------------------------------------------------------------------------------------------------------
        r(199);
        
        . list make price in 1
        
             +---------------------+
             | make          price |
             |---------------------|
          1. | AMC Concord   4,099 |
             +---------------------+
        
        .

        Comment


        • #5
          Ok, that was too simple for me, I guess.

          Thank you Robert! Again, great work, it was a lot of work to bring my project into the -project- environment, but it was totally worth it, especially for the future.


          PS: Have my like for the "poop break".

          Comment


          • #6
            Robert Picard Thank you for this awesome utility!

            I would like to add my 2 cents.
            I have several projects that use very large datasets (>1GB).
            While I am running analysis on the final dataset, I am constantly changing and re-running my analysis.do file. The problem with constantly re-building the project is that verifying the large files takes a significant amount of time.
            If I just run analysis.do, I get an error from the project, uses(), etc. commands within the do-file.

            My "dirty" solution was to comment out the exit code on line 1762 of project.ado so that the error message is printed, but the code runs on.
            Lines 1761--1762 before:
            Code:
            dis as err "no project being built"
            exit 198
            ​​​​​​​
            After:
            Code:
            dis as err "no project being built"
            exit //198

            Comment


            • #7
              Second the thanks for this excellent program Robert Picard


              Agree with Yehuda Davis that verifying large files (of which there may be many) takes quite a bit of time. (I gather this is done using -checksum-)

              I highlight a solution offered on Michael Stepner's outstanding github repository, where extensive details are provided. Briefly, the solution is to redefine -project- to display a message if running a do-file interactively and, if running a project build, to allow it to progress as usual. To implement this, a specific do-file header is required together with a .ado program to display a message if running interactively


              Hope this helps.

              Comment


              • #8
                For those seeking relief from the slowdowns due to dependency checks on very large files, you can download this beta version of project that includes an option to relax dependency checks for files over a specified threshold. Type
                Code:
                which project
                to find out where your current version is installed and copy the three files in the downloaded zip archive there.

                Once you have replaced the files, you should type discard to tell Stata to forget about the old version and type which project again to confirm that Stata accesses the new version. Here's what it looks on my computer:
                Code:
                . discard
                
                . which project
                /Users/robert/Library/Application Support/Stata/ado/personal/project.ado
                *! version 2.0.0b4  09jul2018  Robert Picard, [email protected]
                
                .
                you will need to redefine projects you are working on using project, setup. This will bring up the dialog that lets you select the master do-file. You will see the new option "Relax dependency checks for files over". Check the box and pick a size. The default is 100M but find that 20 works well. Files over the threshold are only checked on the file size (which is very quick to check). You can then build your project again and hopefully you will see a significant improvement in performance.

                I had the opportunity to work with very large datasets recently (>5GB) and the new version works like a charm. I need to update the help file to document this option and a few new features as well as revamp the example projects. I'll try to get to it soon, it's long overdue.
                Last edited by Robert Picard; 29 Dec 2018, 08:20.

                Comment


                • #9
                  Originally posted by Robert Picard View Post
                  For those seeking relief from the slowdowns due to dependency checks on very large files, you can download this beta version of project that includes an option to relax dependency checks for files over a specified threshold.
                  Robert Picard I am remiss for never having thanked you for this update!

                  Edit: I wrote this because I thought the link wasn't working, but apparently it is working—my bad.

                  Comment


                  • #10
                    Robert Picar

                    Comment


                    • #11
                      Robert Picard, thank you for the helpful package. The link you provide above to download the beta files is broken. Is there a new location from which to download the files? I found a fork of your project here, https://github.com/michaelstepner/project_stata, which implements some of your suggestions above, but which is not running for me. Many thanks if you are able to provide a new link.

                      Comment


                      • #12
                        I believe that version is available at this commit: https://github.com/michaelstepner/pr...67f58632d3405e

                        Comment


                        • #13
                          Sorry about that, I retired my web site a few months ago thus the link to nowhere. For the moment, you can access the most recent beta version of project at:
                          https://www.dropbox.com/s/vsq594hpl7...30528.zip?dl=0
                          This is a zip archive that contains three files. Copy the three files to your PERSONAL Stata directory (type sysdir to locate it on your system).

                          If this Dropbox link become unavailable in the future, please email me at [email protected] and I'll update the new location or send the files by email.

                          This version includes the functionality described in #8 as well as other features that remain undocumented.

                          With respect to the version(s) of project mentioned in #11 and #12, it is my position that I retain the copyright on all my programs; I did not consent to a public fork of project and I have requested that Michael remove it from GitHub.


                          Comment


                          • #14
                            Thank you for the updated files, Robert. They work exactly as expected. And thanks again for such a useful contribution.

                            Comment

                            Working...
                            X