Stata - reproducible research (in 2021)

Christopher Bratt

Join Date: May 2019

Posts: 144
#1

Stata - reproducible research (in 2021)

26 Apr 2021, 04:58

Stata's update to 17 is impressive and tempting (not least because of increased speed and improved integration with Python). So paying for an upgrade and moving back to Stata again is tempting. But coming from R with its R Markdown and the "knitr" package, I am spoiled as far as easy coding for reproducible coding goes.

Where would one go to find the latest updates on what's possible in reproducible research within Stata?
Tags: None
Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#2

26 Apr 2021, 07:37

Christopher Bratt I agree that Stata 17 has some very nice new features.

To answer the question about what's new in Stata 17, both new and experienced users likely will go to the Stata home page and see "What's new in Stata 17". The first listed new feature is Tables. There, the link See all features takes you to "Reporting".

Beyond the updated table command, there seems to be little new in Stata 17 for "reproducible research". Here are two features that I think should be improved in this area:

First, since the original post refers to R, the user-written command markstat by German Rodriguez has some unique and useful features for Stata markdown which are not in official Stata such as combine Stata and R markdown code. Similarly, can we combine Stata and Python code blocks in Stata markdown? How or why not?

Second, it would be helpful to have documentation on the exact environment some Stata code was run ("dependencies"). Do I need to install or configure something first such as install Python, configure ODBC or install user-written commands? How do I quickly find out all details such as which version of Python, which ODBC driver, and which version of pandoc? Should I create a separate do file or use the Stata Project manager or what is Stata best practice for handling dependencies? The programming language Julia has Project.toml and Manifest.toml for listing direct and all (direct and recursive) dependencies, respectively. I am missing something similar in Stata for easily handling dependencies.

Last edited by Anders Alexandersson; 26 Apr 2021, 07:53. Reason: added author of markstat command
1 like
Comment
Christopher Bratt

Join Date: May 2019

Posts: 144
#3

26 Apr 2021, 10:12

Thank you for the answer. I didn't really ask what's new in Stata 17. As said, I think the changes made are impressive. The question was about recent developments (latest updates in addons) for reproducible research with Stata, specifically because V17 seemed not to change anything in this respect. (I expect -table- to be great, but probably not specifically for reproducible research.)

Does anyone know about recent developments for Stata as far as reproducible research goes? A few packages were available last year. Are there any recent developments? I believe possible recent developments would be interesting for more people than me.

@Anders Alexandersson: You might want to write a separate post (start a new topic) for your question on 'documentation on the exact environment some Stata code was run ("dependencies")' Few will see it here.

Last edited by Christopher Bratt; 26 Apr 2021, 10:15.
Comment
Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#4

26 Apr 2021, 14:15

Does Stata Journal 20-4 qualify for 2021? I received it on January 4, 2021. It has two commands which I would count in the area of reproducible research: ifieldkit and github.

If you type in Stata search reproducible then you find some Stata developments in 2021 for reproducible research such as updated markstat 2.6 for handling bibliographies.

Unpublished development work by others than yourself is hard to track and to rely on. For example, a "Stata conference" is an established venue for Stata for sharing recent development work but not all accepted presentations get materialized or are reproducible or become shared commands. Another example, a user-developer may want to hold off with sharing recent code until a related paper has been accepted for publication. A third example, Stata 17 may inspire to developing new related features in user-written commands such as Stata markdown for Python and Java.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#5

26 Apr 2021, 14:47

I find your question very vague. What is it exactly that you think Stata is missing that qualifies as "reproducible"? Or, with the available tools, why do you think these are lacking in some way? There are many commands, both official and user-built, that support reporting results in a reproducible fashion.
Comment
Christopher Bratt

Join Date: May 2019

Posts: 144
#6

26 Apr 2021, 15:15

The question was broad, inquiring about new packages for reproducable research, or updates to old ones, since the old ones for Stata did not yet provide what I needed.

(R Markdown is probably the gold standard these days, but others may catch up, for instance in Python. And Stata is increasingly integrated with Python... So this is an opportunity to learn basic Python!)

«Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them.»

Please disregard my question. I thought it might be an opportunity for an author to put forward their work to the community, but we end up discussing other tings.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#7

26 Apr 2021, 17:41

Your question is fine to ask, and I am familiar with what people generally regard as reproducible research. But the tools presently available technically already meet that definition, especially since the code can create the final output to be reported.
Comment
Bjarte Aagnes

Join Date: Apr 2014

Posts: 785
#8

27 Apr 2021, 03:41

R knitr have some Python integration, so one alternative can be to test the Stata 17 pystata Python package and explore if running Stata via Python from Knitr match your workflow.
Comment
Christopher Bratt

Join Date: May 2019

Posts: 144
#9

27 Apr 2021, 04:03

R knitr have some Python integration, so one alternative can be to test the Stata 17 pystata Python package and explore if running Stata via Python from Knitr match your workflow.

Indeed! Since I currently use knitr, I am tempted to buy Stata 17 and try, hoping that pystata will help for knitr integration...

Men du har ikke prøvd selv? [You haven't tried yourself?]

I should add that the knitr documentation refers to Dough Hemken's Statamarkdown (https://github.com/Hemken/Statamarkdown) as a suitable solution for using Stata in knitr. I might try that first unless someone reports favourable experiences with Stata and knitr using pystata.
Comment
Bjarte Aagnes

Join Date: Apr 2014

Posts: 785
#10

27 Apr 2021, 05:30

I do not use any literate programming tools, but enjoyed testing pystata with Jupyter lab. pystata is a great addition to Stata.
1 like
Comment
Julian Reif

Join Date: Dec 2018

Posts: 49
#11

27 Apr 2021, 08:41

For people interested in producing a cross-platform analysis that will be reproducible for decades, Stata is an ideal language. The most important thing to do is to keep copies of any user add-ons employed in the analysis, since these may change over time. I provide examples of how to do this in my guide:
https://julianreif.com/guide/#libraries

Associate Professor of Finance and Economics
University of Illinois
www.julianreif.com
2 likes
Comment
Christopher Bratt

Join Date: May 2019

Posts: 144
#12

02 May 2021, 03:29

Rounding off this discussion: After installing Stata 17 and experimenting (so far very little) with Python:

One solution to literal programming / reproducible research with Stata is to integrate Stata with Python. For instance, Jupyter Notebooks provide a solid basis for reproducible research.
1 like
Comment
Bjarte Aagnes

Join Date: Apr 2014

Posts: 785
#13

02 May 2021, 03:54

Jupyter lab has a lot to offer, but you might need some extra on top of Jupyter lab to get "literate programming" like https://nbdev.fast.ai/ (and article https://alpha2phi.medium.com/literat...k-4c2520d71597)

Last edited by Bjarte Aagnes; 02 May 2021, 04:00.
Comment
Christopher Bratt

Join Date: May 2019

Posts: 144
#14

03 May 2021, 02:04

An alternative is to use RStudio. I think RStudio is easier to work with and provides more options than Jupiter Notebook (I say so after having played with Jupiter Notebook during the weekend.)

Thanks to Doug Hemken and his R package Statamarkdown, Stata can easily be integrated with RStudio. RStudio can of course be used for R, but also for other languages. Specifically, Python has strong support in RStudio (but I find R better for stats and graphics than Python). Stata is not well supported unless you install Statamarkdown.

Details on Statamarkdown:
https://www.ssc.wisc.edu/~hemken/Sta...-statamarkdown

If you try to use Stata from RStudio, avoid the option to put output into the console (as I used to do). Select what I believe is the default: Chunk Output Inline (similar to what Jupiter Notebook does).
See here for details on that issue: https://www.gitmemory.com/Hemken

Honestly, I don't think Doug Hemken's Statamarkdown receives the attention it deserves.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#15

03 May 2021, 02:17

OP, what is "reproducible research"?
Comment

Announcement

Stata - reproducible research (in 2021)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment