-project- by Robert Picard: Running individual do-files

Philipp Giesa

Join Date: Oct 2018

Posts: 3
#1

-project- by Robert Picard: Running individual do-files

25 Oct 2018, 06:41

Hey,

I started using the program -project- by Robert Picard, to manage my current research project and share it with colleagues.
It works just fine when building the complete project, but I cannot run individual do-files anymore (same applies to his Stata 12 example, so I guess it's not a coding mistake on my end).
Stata prints the error:
". version 12
. project, doinfo
no project being built
r(198);"

Can anyone tell me how to run individual do-files, while using -project-? Because I think the program makes working in groups with Stata a lot easier.

Thanks in advance!
Philipp
Tags: None
Robert Picard

Join Date: Mar 2014

Posts: 1536
#2

25 Oct 2018, 13:22

project (from SSC) assumes that an unchanged do-file will always produce the same results if its inputs have not changed. When you use project, you build on this assumption and create a complex web with tens, hundreds, or even thousands of do-files. You hand over to project the task of tracking what has changed and the next time you build the project, only do-files affected by changes will run. In a typical workflow, the do-file you are currently editing will be the only do-file that runs when you build the project. But if you go back and edit a do-file that is further upstream, many do-files may run on the next build if these are affected in any way by the change.

In order to use project, you have to embed in your do-files directives to indicate files used and created and directives to run nested do-files. Unfortunately, the downside is that you cannot run a do-file separately. But why would you want to do that? Just build the project again. If the only thing that has changed is this single do-file, then that's all that will run when you build the project again.
2 likes
Comment
Philipp Giesa

Join Date: Oct 2018

Posts: 3
#3

26 Oct 2018, 02:28

Thanks Robert for the quick reply.
I get your reasoning and it makes sense.
In my case I just wanted to run a do-file which edits a dataset and check the results e.g. in the data-browser immediately.
But I guess I can just load this dataset, after each project run.
Thanks again for your reply and the awesome program!
Best
Philipp
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

26 Oct 2018, 09:44

Note that if you want to inspect results as you develop a do-file, just put

Code:

project, break

where you want the do-file to stop and build the project. The execution will stop at that point and the data in memory will be preserved. There's nothing special about this directive, it just generates an error which stops execution of the do-file. Personally, I use:

Code:

exit 99

but any invalid command or syntax that will throw an error will do the trick. What follows is copied straight from the Result's window.

Code:

.         
. * I'm in the middle of a do-file I'm working on...
. 
.         sysuse auto, clear
(1978 Automobile Data)

.         poop
command poop is unrecognized
r(199);

end of do-file
      name:  plog_1
       log:  /Users/robert/Documents/temp/examples_v12/ex12.smcl
  log type:  smcl
 closed on:  26 Oct 2018, 11:38:15
------------------------------------------------------------------------------------------------------------------------------------
r(199);

. list make price in 1

     +---------------------+
     | make          price |
     |---------------------|
  1. | AMC Concord   4,099 |
     +---------------------+

.

Comment

Philipp Giesa

Join Date: Oct 2018

Posts: 3
#5

29 Oct 2018, 03:11

Ok, that was too simple for me, I guess.

Thank you Robert! Again, great work, it was a lot of work to bring my project into the -project- environment, but it was totally worth it, especially for the future.

PS: Have my like for the "poop break".
Comment
Yehuda Davis

Join Date: Dec 2016

Posts: 24
#6

27 Dec 2018, 12:26

Robert Picard Thank you for this awesome utility!

I would like to add my 2 cents.
I have several projects that use very large datasets (>1GB).
While I am running analysis on the final dataset, I am constantly changing and re-running my analysis.do file. The problem with constantly re-building the project is that verifying the large files takes a significant amount of time.
If I just run analysis.do, I get an error from the project, uses(), etc. commands within the do-file.

My "dirty" solution was to comment out the exit code on line 1762 of project.ado so that the error message is printed, but the code runs on.
Lines 1761--1762 before:

Code:

dis as err "no project being built" exit 198

After:

Code:

dis as err "no project being built" exit //198
Comment
Raza Ali

Join Date: Aug 2015

Posts: 5
#7

29 Dec 2018, 05:06

Second the thanks for this excellent program Robert Picard

Agree with Yehuda Davis that verifying large files (of which there may be many) takes quite a bit of time. (I gather this is done using -checksum-)

I highlight a solution offered on Michael Stepner's outstanding github repository, where extensive details are provided. Briefly, the solution is to redefine -project- to display a message if running a do-file interactively and, if running a project build, to allow it to progress as usual. To implement this, a specific do-file header is required together with a .ado program to display a message if running interactively

Hope this helps.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#8

29 Dec 2018, 08:17

For those seeking relief from the slowdowns due to dependency checks on very large files, you can download this beta version of project that includes an option to relax dependency checks for files over a specified threshold. Type

Code:

which project

to find out where your current version is installed and copy the three files in the downloaded zip archive there.

Once you have replaced the files, you should type discard to tell Stata to forget about the old version and type which project again to confirm that Stata accesses the new version. Here's what it looks on my computer:

Code:

. discard . which project /Users/robert/Library/Application Support/Stata/ado/personal/project.ado *! version 2.0.0b4 09jul2018 Robert Picard, [email protected] .

you will need to redefine projects you are working on using project, setup. This will bring up the dialog that lets you select the master do-file. You will see the new option "Relax dependency checks for files over". Check the box and pick a size. The default is 100M but find that 20 works well. Files over the threshold are only checked on the file size (which is very quick to check). You can then build your project again and hopefully you will see a significant improvement in performance.

I had the opportunity to work with very large datasets recently (>5GB) and the new version works like a charm. I need to update the help file to document this option and a few new features as well as revamp the example projects. I'll try to get to it soon, it's long overdue.

Last edited by Robert Picard; 29 Dec 2018, 08:20.
1 like
Comment
Yehuda Davis

Join Date: Dec 2016

Posts: 24
#9

17 Jan 2022, 09:53

Originally posted by Robert Picard View Post

For those seeking relief from the slowdowns due to dependency checks on very large files, you can download this beta version of project that includes an option to relax dependency checks for files over a specified threshold.

Robert Picard I am remiss for never having thanked you for this update!

Edit: I wrote this because I thought the link wasn't working, but apparently it is working—my bad.
Comment
Gary Lind

Join Date: Jul 2023

Posts: 3
#10

03 Jul 2023, 07:13

Robert Picar
Comment
Gary Lind

Join Date: Jul 2023

Posts: 3
#11

03 Jul 2023, 07:17

Robert Picard, thank you for the helpful package. The link you provide above to download the beta files is broken. Is there a new location from which to download the files? I found a fork of your project here, https://github.com/michaelstepner/project_stata, which implements some of your suggestions above, but which is not running for me. Many thanks if you are able to provide a new link.
Comment
Yehuda Davis

Join Date: Dec 2016

Posts: 24
#12

03 Jul 2023, 11:32

I believe that version is available at this commit: https://github.com/michaelstepner/pr...67f58632d3405e
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#13

03 Jul 2023, 16:24

Sorry about that, I retired my web site a few months ago thus the link to nowhere. For the moment, you can access the most recent beta version of project at:
https://www.dropbox.com/s/vsq594hpl7...30528.zip?dl=0
This is a zip archive that contains three files. Copy the three files to your PERSONAL Stata directory (type sysdir to locate it on your system).

If this Dropbox link become unavailable in the future, please email me at [email protected] and I'll update the new location or send the files by email.

This version includes the functionality described in #8 as well as other features that remain undocumented.

With respect to the version(s) of project mentioned in #11 and #12, it is my position that I retain the copyright on all my programs; I did not consent to a public fork of project and I have requested that Michael remove it from GitHub.
1 like
Comment
Gary Lind

Join Date: Jul 2023

Posts: 3
#14

05 Jul 2023, 11:33

Thank you for the updated files, Robert. They work exactly as expected. And thanks again for such a useful contribution.
Comment

Announcement