Testing Replication Code + Best Practice

Stuart Morrison

Join Date: Sep 2014

Posts: 34
#1

Testing Replication Code + Best Practice

04 Feb 2020, 03:54

I am preparing my replication files for an article that has been accepted subject to replication checks. My code relies on a number of user written commands, and the journal's replication policy requires that these are supplied with the replication package. The difficulty is that it is hard for me to test whether what I have included is correct, and produces the exact same results given my very full ado folder. Is there any way to restrict where Stata looks for ado files (I know how to add folders, but this seems different)?

Beyond specifying the random number seed, I wondered if the community had any further advice on best practice for replication files. If it matters I prefer to use an approach that is agnostic about the particular version of Stata.

Thanks,

Stu.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35699
#2

04 Feb 2020, 04:20

The last line is the easiest to answer. You can't really mean, or so I guess, that you want your code to work in Stata 1.0, and that is unlikely as well as almost untestable. Alternatively, you can't easily test that your code will work in an earlier version than yours unless you have access to it using somebody else's machine.

You can remove as well as add with adopath. So the protocol might be to place user-written commands in the current directory and temporarily remove all other places that Stata might look other than its own folders.

I trust you are referencing the authors of those commands in your paper.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#3

04 Feb 2020, 06:34

I find it odd that the journal requires you to submit community-contributed ado-files not written by yourself. Aside from the fact that I would not want several old, buggy versions of my commands to float around in the web, there might also be legal concerns. I do not think that authors who submit their code to SSC by default agree that others can distribute that code publicly elsewhere.

My favorite solution is usually that I add a few lines at the beginning of my replication do-file that read like

Code:

ssc install packagename

Obviously, this might lead to irreproducible results some time in the future due to updates (or even removals) of those packages on SSC (or the respective website from where it was installed), and therefore may not be the first-best solution.

https://www.kripfganz.de/stata/
Comment
Stuart Morrison

Join Date: Sep 2014

Posts: 34
#4

04 Feb 2020, 10:30

I will try harder with adopath then!

I should have been more precise than the term ‘agnostic’. I was trying to convey the last two or three versions on the suspicion that these account for a large share of the versions in use and many more the v16. (I think of versions since 12 as modern but that is obviously entirely subjective.)

We have indeed cited the packages we use. But, I hadn’t thought of the issue of licensing but it is a good one. I know that many SSC packages specify the GPL v3 license but I should check that this is the case here.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#5

04 Feb 2020, 11:55

As an author of various programs out there I have a complicated attitude.

First off, if I make stuff public, the implication is surely that others may find it useful and for the most part I can't and won't try to control whether and how they use it. I am happy that it be used.

Second off, and much more rarely, I don't write programs for others to plagiarise or even to fork without my knowledge. This is a grey area as I suspect most of the time most of us are just recycling standard Stata idioms. Readers with a long memory may remember X going ballistic about similar code in Y's program when Y was just applying a bit of secondary school mathematics (which in this case I have done myself without feeling the need to cite anybody).

Other way round, someone once took a program of mine, threw away the help and then modified it under their own name on GitHub. Said person seemed surprised that I asked that they not do that (but did comply).

Personally, I thinking keeping authors' names and help files intact is good enough if you distribute stuff. A formal reference is also welcome.

Last edited by Nick Cox; 04 Feb 2020, 12:03.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#6

05 Feb 2020, 06:42

There is an extra step that may make sense. You can copy community-contributed programs and make them subroutines of your own code. That way you are protected from (a) the code disappearing from an archive and (b) the code being changed in a way that breaks yours. But it is best to ask permission to do this and make full and fulsome acknowledgments.

I don't think it is an inconsistency for authors to want their work to be used and for it to be acknowledged as theirs. But sometimes people will give a full reference for a single sentence cited in a paper yet never cite the programs that were heavily used and made a project possible.
Comment
Stuart Morrison

Join Date: Sep 2014

Posts: 34
#7

18 Feb 2020, 14:33

I'm sorry for the slow response. I am going with the citation/checking the license approach. The way that this seems to work in practice overall is closer to Nick's second suggestion as they are essentially now included in my code, but as the full package files with full attribution etc. I have also written to the editor to clarify if they have thought these issues through. Seemingly, they haven't. It will be interesting to see how things pan out.
Comment

Announcement

Testing Replication Code + Best Practice

Comment

Comment

Comment

Comment

Comment

Comment