project management in Stata

Hendri Adriaens

Join Date: Mar 2015

Posts: 44
#1

project management in Stata

16 Apr 2021, 05:45

I'm looking for a good way to do proper project management in Stata. Some info:

1) I have lots of data files and lots of do files, organized in directories.
2) I have a few master do files that call the other do files in subdirs, which call other do files in subdirs.
3) Every do file mostly works on data in the same (sub)dir, hence the working dir should be set to a subdir when executing a do file in a subdir.
4) Do files should also be able to run independently from the master file.

I "solved" this now by passing the subdir as argument to do files in subdirs. At the start of each do file in a subdir I check if the argument is empty. If not, the argument is used to change the working dir (called from the master). If it's empty (ran independently), the working dir is not changed.

This is of couse very cumbersome and not safe when renaming subdirs.

Stata does offer a project file format, but that is just a list of links to files, it doesn't do project management. For instance, the working dir for a project is set to the dir of the project file and all paths used should be relative to that, meaning that files can't be ram independently anymore and the code will contain many more hard coded paths, because also files in subdirs have to assume the working dir is still the project dir. So, changing a dir name using projects becomes even more unworkable (you need to check all your do files for possible references to the renamed dir).

Is there any way to do a more proper project management in Stata?

I haven't found one, so the next question: is there any decent way, without hardcoded paths as arguments, to change the working dir to the path of the do file that is running?

I have had a look at the "project" package but that does not offer me what I need.

Thanks for any advice.
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#2

16 Apr 2021, 07:19

Yes and no, you will need to eventually code some directories or paths manually, but you can be smart and minimize how much of that is done.

I think you've got a good idea about organizing code into different folders (presumably by task), and having a few files that "run" the bulk of the rest. What you can maybe change is the create an "includes.do" file that is called just once by each "runner". The includes file essentially creates local macros to contain the metadata needed for project directories. As it sounds like you -run- files within your "runner", simply pass them the path of the includes file, and change the code in each one to -include- it.

Let me know if that's clear enough.
Comment
Andrew Lover

Join Date: Apr 2014

Posts: 182
#3

16 Apr 2021, 07:53

There's some nice code here using R as a project management wrapper to call complex Stata analyses:

https://www.lukemcguinness.com/post/...and-pushoverr/

If you think this might be your default workflow, could be worth the investment of time/energy.

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
Comment
Hendri Adriaens

Join Date: Mar 2015

Posts: 44
#4

19 Apr 2021, 01:29

Originally posted by Leonardo Guizzetti View Post

Yes and no, you will need to eventually code some directories or paths manually, but you can be smart and minimize how much of that is done.

I think you've got a good idea about organizing code into different folders (presumably by task), and having a few files that "run" the bulk of the rest. What you can maybe change is the create an "includes.do" file that is called just once by each "runner". The includes file essentially creates local macros to contain the metadata needed for project directories. As it sounds like you -run- files within your "runner", simply pass them the path of the includes file, and change the code in each one to -include- it.

Let me know if that's clear enough.

Hm, this sounds like an interesting suggestion. If I understood correctly: create a do file with all paths hard wired into macros and include that at the top of every single do file in the project. But that can only work if you either specify absolute paths (definite no-go) or all paths relative to the common root of the project, and that common root should always, for every case, be the working directory. In that sense, it could be combined with using a Stata project file in the common root.

However, that does not anymore allow for just starting a random do file in a subdir and run it.

The way I "solved" it now, in more detail is btw: when I start a runner (and the working dir is set by stata to the path of the runner), I call:

do "01 data 200201\00 process" "01 data 200201"

You see that the subdir is passed as an argument as well, ugly indeed.

At the top of such a do file I have:

if "`1'" != "" cd "`1'"

And at the bottom:

if "`1'" != "" cd ..

That way, I can run the do files both from the runner and standalone. Still not safe for renaming subdirs or anything, but at least includes the least amount of dirs. When taking the common root at starting point, I would need to hard wire many more dirs, relative to the common root, meaning more maintenance, and more ways to fail.
Comment
Hendri Adriaens

Join Date: Mar 2015

Posts: 44
#5

19 Apr 2021, 01:30

Originally posted by Andrew Lover View Post

There's some nice code here using R as a project management wrapper to call complex Stata analyses:

https://www.lukemcguinness.com/post/...and-pushoverr/

If you think this might be your default workflow, could be worth the investment of time/energy.

Thanks for the suggestion, but because more people need to be able to work on the project, I want to minimize the requirements (like installing and learning R)
1 like
Comment
Hendri Adriaens

Join Date: Mar 2015

Posts: 44
#6

29 Apr 2021, 01:47

I thought of another solution: the runner could change the working dir before calling the do-file and change it back. That way the do-file will always be ran in its own "environment" with other do-files and data belonging to that do-file. And of course, the do-file can also be ran independently because there are no paths hardwired into it.

"do" and "run" don't offer these options, so I created my first package (on ssc now) called docd, to do all this in a single command. It even changes the dir back when there was an error in the do-file. That does take some effort because Stata doesn't know try..catch..finally. And I had to waste about 10 minutes staring/googling at the error "unexpected end of file" to realize Stata needs an empty line after the final "end". Come on guys, it's 2021, not 1990...

The package also includes runcd. Have fun with the package!
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 537
#7

29 Apr 2021, 05:47

Although this will not answer your specific question regarding the use of Stata's project manager (see the handbook entry in [P] Programming, -help project manager-), in this context I would like to mention "The Workflow of Data Analysis Using Stata" by Scott Long, especially chapter 2. Additionally, have a look at Robert Picard's .ado-program -project- at SSC.
Comment
Julian Reif

Join Date: Dec 2018

Posts: 48
#8

29 Apr 2021, 06:16

Here is the folder structure I like to use when organizing my projects:
https://julianreif.com/guide/#folder-structure

I encourage people to use a single global throughout their project, set in their Stata profile. I have an example of a stand-alone, reproducible project here that may be of interest:
https://github.com/reifjulian/my-project

Each script in the project references the same global.

Associate Professor of Finance and Economics
University of Illinois
www.julianreif.com
Comment
Hendri Adriaens

Join Date: Mar 2015

Posts: 44
#9

29 Apr 2021, 06:38

Originally posted by Dirk Enzmann View Post

Although this will not answer your specific question regarding the use of Stata's project manager (see the handbook entry in [P] Programming, -help project manager-), in this context I would like to mention "The Workflow of Data Analysis Using Stata" by Scott Long, especially chapter 2. Additionally, have a look at Robert Picard's .ado-program -project- at SSC.

Yes, I did have a look at the project package because that was the standard answer to many similar questions here. But it doesn't do what I want, and besides, adds a lot of overhead that I don't want. But thanks for the suggestion!
Comment
Hendri Adriaens

Join Date: Mar 2015

Posts: 44
#10

29 Apr 2021, 06:47

Originally posted by Julian Reif View Post

Here is the folder structure I like to use when organizing my projects:
https://julianreif.com/guide/#folder-structure

I encourage people to use a single global throughout their project, set in their Stata profile. I have an example of a stand-alone, reproducible project here that may be of interest:
https://github.com/reifjulian/my-project

Each script in the project references the same global.

Thanks for the suggestion. The issue is that, for proper project management, you want as little hardwired stuff as possible. What if you rename one of the subdirs? You will end up checking all your do-files for references to that subdir. That's why I opted for the top down approach, with each do-file only referencing files and data in a subdir, making sure at least I know exactly which files to check on a structure change.

But coming from a tool like Visual Studio, this all is of course rather horrible. There the actual filename or location within the project doesn't matter at all (most of the times), making project management and referencing other stuff a lot easier.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#11

29 Apr 2021, 06:57

At some point, when dealing with collections of text files and datasets, you will need to specify a structure to those files and folders. It's natural to define your own folder structure that works well enough, and from there, all paths can be made absolute (worse) or relative (better) based on the above ideas. But this really only works well and is manageable if you can make a clear an consistent folder structure for where specific files are meant to exist. Julian's structure is one tested approach, but you can of course riff off of that to make your own if it doesn't meet your needs.
1 like
Comment
Hendri Adriaens

Join Date: Mar 2015

Posts: 44
#12

29 Apr 2021, 07:01

Originally posted by Leonardo Guizzetti View Post

At some point, when dealing with collections of text files and datasets, you will need to specify a structure to those files and folders. It's natural to define your own folder structure that works well enough, and from there, all paths can be made absolute (worse) or relative (better) based on the above ideas. But this really only works well and is manageable if you can make a clear an consistent folder structure for where specific files are meant to exist. Julian's structure is one tested approach, but you can of course riff off of that to make your own if it doesn't meet your needs.

Of course it is best to agree on a structure before the project starts, and keep it that way during the project. That way there is no problem hardwiring some paths into your code. However, my experience is that projects never evolve as planned, and things need to be added, changed, etc, making hardwired stuff rather troublesome.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#13

29 Apr 2021, 08:53

I agree with you, Hendri. Sometimes directory structures change. When that happens to me, I only need to update the path in a single config file. Having a modular directory structure also means you can accomplish much of this using relative paths, so you may not need to update anything.
1 like
Comment

Announcement

project management in Stata

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment