Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Literate programming: Stata way behind R

    I used to prefer Stata (in addition to the more specialised program Mplus). I have now "converted" to R as a basis for analyses, mainly because of its very advanced features for literate programming (with the knitr package along with Rmarkdown, offering excellent, flexible options for generating a document in PDF, HTML, or as a Word file).

    For most ordinary functions, I would prefer Stata, and I find Stata's language much more pleasant than R's. But as in most other new, advanced features (SEM, Bayesian, and now literate programming) I cannot help feeling that Stata does things half-hearted. The free knitr package in R, used within the free integrated development environment RStudio, is much better than what Stata offers (including the user-developed add-on packages for literate programming in Stata). Repeatedly I have tried Stata's own options, and the freely available add-on packages for literate programming in Stata, but R with Rmarkdown and knitr is so much better!

    Is there any chance that Stata will develop literate programming that will be comparable to what is available in R? (I now use R to call Stata functions for data management, that's all I use Stata for, I will probably gradually move to doing even data management in R. If Stata had a decent literate programming environment, that would be different. I have updated Stata twice for a lot of money, but with no benefit for me, I might not do that a third time.)

    This post probably comes across as harsh. I am really sorry for that. I love the basics of Stata. But once a user tries newly developed advanced features, Stata seems to have less to offer than free software, and I think that's a pity.

  • #2
    A short answer is that I don't know. A longer answer is that the company follows its own style, which is that substantially new features appear in each release and are announced and explained only just before that release. Otherwise what happens is that senior people may say informally at users' meetings that StataCorp is or is not interested in something, which means no more than it says.

    This culture is often frustrating to people who don't understand it (and sometimes to people who do!). In essence it has grown out of experience. Suppose StataCorp were to announce now that Stata 16 will include special methods for people in zombie statistics. People in that field may start immediately to plan purchases or research projects. Then come Stata 16 and no zombie statistics, and the company says, Oh sorry, that project's not finished yet. 17 is now our best guess. All of a sudden some users are frustrated, angry, perhaps even litigiously minded. When this happened in the distant past, StataCorp were embarrassed and they learned not to make promises unless they can keep them.

    Also, Stata is in a competitive world. You probably don't care but SAS, SPSS, etc. are in some sense competitors and it's not a good idea to tell competitors of your plans.

    As a further twist, StataCorp are more academic in style than many people imagine. Senior developers have their own ideas about what's interesting and important. They pay attention to what users ask for, but additions to Stata are not determined by user votes. (Neither are those to R!)

    To the point: Well, you want this strongly; no doubt others do too. I see the point of it, but it's not important to me to get any extra tools in this area, which I find a little oversold. No one should care about that, except that this is the picture in aggregate. For almost every X that some people want a lot, there are many other people who don't care one bit (often they've never even heard of X!).

    Comment


    • #3
      I highly recommend the new user-written program markstat which recently had a major update to version 2.0. See http://data.princeton.edu/stata/markdown and recent announcements here on Statalist. With markstat I would like to have more templates, not only Stata Journal articles, similar to R's rmarkdown and bookdown. Overall though, I think it's a very good Markdown implementation in Stata. However, LaTeX often beats Markdown if you need "very advanced features".

      I see at least three other much larger problems for Stata than literate programming:
      1. Overhaul of the table system (e.g., along the lines of Ian Watson's tabout),
      2. Overhaul of the graphic system (still not good 3D graphics), and
      3. Faster overall for computer-intensive tasks (e..g., user-written gtools, playing nice with Julia 1.x, a theoretically sound command for probabilistic record linkage, etc).

      Comment


      • #4
        I am happy to have corporate access to both Stata 15.1 and Rstudio Server Pro 1.1.x. I have extensive experience with both, Stata since 2002 and R since 2008. This combination is helpful to me personally in my workplace. RStudio Server Pro is a fine environment, but I would hate to be stuck with just R. Stata is consistent and there is real value in a commercial suite done well, which Stata Corp does. R is often a house of cards and real effort is needed to maintain code from a legacy perspective. You could have easily seen Stata's efforts at building literate programming with Word a mile off before 15 was released. Given the penetration of Word in corporate settings vs LaTeX, this is a fine choice. Things will only improve there. Your assertions about Bayes and SEM are lacking followup, and in my opinion SEM is a strong suite in Stata vs R, and Bayes is a strong version 2 vs R from my applications point of view. Are either the best or most popular in their fields? No. Are they good and well documented and usable? Yes.

        #3 has some good points. SAS gets used in corporate legacy work due to its ODS system. The graphics system does lack some EDA 3-D versions, but I honestly don't use them all that much even when available. Everybody wants there code to be faster. Hadley Wickham's packages are popular in R because they make code more readable and are faster (simpler C based functions).

        Comment


        • #5
          Dave Airey Thanks for also commenting on the Bayes and SEM assertions which I overlooked. How useful do you find the interactive graphics in R? Rstudio Server Pro 1.1.x is the ideal R environment for interactive graphics using "Shiny" web apps, is it not? It might be that interactive graphics is more important than 3D graphics when considering overall improvements to the Stata graphic system.

          Comment


          • #6
            @Nick, thanks for the lengthy and detailed answer. (I would encourage any researcher using code to document crucial parts of the the analysis in supplemental materials, for which literate programming is great.)

            @Anders, thanks for pointing out that markstat is now in its 2.0 version. I gave up markstat, but will try 2.0. I guess it is easy to call R (e.g. ggplot2) from within markstat. The fact that Rodríguez has updated markstat suggests he might improve it further.

            I fully agree that Stata is much more pleasant to use and obviously more consistent than R. It's a relief for me each time I work with Stata after having been coding in R. But for me, Stata lacks in features in SEM (and its SEM very slow, but I use a third application for that). Indeed, Stata's implementation of Bayes has improved since the first version (which restricted analyses to 1 MCMC chain!). Anyway, I didn't mean to introduce a discussion about Stata versus R.

            NB: For anyone interested, Chuck Huber (an excellent teacher) will run a webinar on Bayes in Stata next week.
            https://www.stata.com/training/webin...sian-analysis/

            Comment


            • #7
              Anders, I agree interactive graphics is a neat thing about R Shiny. Employing enabling Shiny tools in our corporate environment is a hot thing to do. So yeah, interactive graphics is something nice to have, and not in Stata.

              Guest, it can be tough to decide what tools are worth your money. I (fight to) get corporate access, but were I consulting individually, I would definitely buy Stata 15.1 and have R/RStudio as well.
              Last edited by sladmin; 11 Dec 2017, 08:31. Reason: anonymize poster

              Comment


              • #8
                With respect to Markdown, you *could* write your Stata markdown and run it in RStudio - knitr works with a variety of programming languages. Within the Stata world, I’d agree that markstat is the closest to what you want ... but dyndoc does open up some possibilities. To some extent, the evolution of dyndoc will depend on how we use it.
                Doug Hemken
                SSCC, Univ. of Wisc.-Madison

                Comment


                • #9
                  Thanks, Dough. I have not been able to make knitr/RStudio run Stata (that is, I haven't been able to get engine = "stata" to work with knitr, as described here for a Mac: https://www.ssc.wisc.edu/~hemken/Sta...tatalinux.html).
                  So I've settled with the RStata package in R instead.

                  Comment


                  • #10
                    Since RStata requires having R call Stata in pretty much the same way as knitr does, I'm not sure what the problem might have been ... Guest told me he didn't remember.

                    With respect to the web page he cites above (in #9), that should all still work, but in the meantime I've made it easier by turning all those functions into an R package, Statamarkdown. It is available at https://github.com/Hemken/Statamarkdown (instructions for installation are there as well). It sets a few default chunk options, so your source markdown document can be largely free of extraneous programming elements, and just look like text and Stata code.

                    This is largely intended for those of us who are using RStudio to write and process dynamic markdown documents in multiple languages (I use it for R, SAS, and Stata, so far).

                    Time permitting, I hope to release this on CRAN and update my web pages in the next few days.
                    Last edited by sladmin; 11 Dec 2017, 09:52. Reason: anonymize poster
                    Doug Hemken
                    SSCC, Univ. of Wisc.-Madison

                    Comment


                    • #11
                      Doug,

                      I've successfully managed knit Stata code in RStudio using your older pages instructing how to set the Stata engine, etc. I've tried to install your new package, but I get the error below when I try to do so. Do you know if others have encountered this, and if so, what the solution is?

                      Thanks,

                      Eric

                      Code:
                      > devtools::install_github("Hemken/Statamarkdown")
                      Downloading GitHub repo Hemken/Statamarkdown@master
                      from URL https://api.github.com/repos/Hemken/Statamarkdown/zipball/master
                      Installing Statamarkdown
                      '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore  \
                        --quiet CMD INSTALL  \
                        '/private/var/folders/nh/7xjy5v_n7577b4prt70d9lw40000gn/T/RtmpGUhK1B/devtools709c57e094e7/Hemken-Statamarkdown-f8b051f'  \
                        --library='/Library/Frameworks/R.framework/Versions/3.4/Resources/library' --install-tests 
                      
                      * installing *source* package ‘Statamarkdown’ ...
                      ** R
                      ** inst
                      ** preparing package for lazy loading
                      ** help
                      *** installing help indices
                      ** building package indices
                      ** testing if installed package can be loaded
                      Error: package or namespace load failed for ‘Statamarkdown’:
                       .onAttach failed in attachNamespace() for 'Statamarkdown', details:
                        call: dir.exists(d)
                        error: object 'd' not found
                      Error: loading failed
                      Execution halted
                      ERROR: loading failed
                      * removing ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/Statamarkdown’
                      Installation failed: Command failed (1)
                      >

                      Comment


                      • #12
                        I'm having to move some of that code for it to work with CRAN. Had not seen that error, before. Unix, Mac?

                        All of those old instructions would still work, btw.
                        Last edited by Doug Hemken; 04 Dec 2017, 12:09.
                        Doug Hemken
                        SSCC, Univ. of Wisc.-Madison

                        Comment


                        • #13
                          Thanks Eric, inadequate testing on my part. Revised code is on github, and I'll post here when it is on CRAN.
                          Doug Hemken
                          SSCC, Univ. of Wisc.-Madison

                          Comment


                          • #14
                            Thanks, Doug. Revised code on github works for me.

                            Comment


                            • #15
                              Funny how this thread went, starting with a general assertion that Stata was way behind R in this area, and ending with a discussion of how to run things in R. Let me try to nudge us back a bit to the original topic, because there is a lot that can be done in Stata.

                              Let me start by thanking Anders Alexandersson in #3 for recommending markstat and noting its recent update to 2.0, and Doug Hemken in #8, who notes that it comes closest to what he wants. Hopefully things will continue to improve.

                              For those who may not be familiar with the wonderful tools available in R, here is a minimal R Markdown script, where I use the built-in cars dataset to run a simple regression of stopping distance on speed, superimpose the regression line and confidence interval on a scatterplot, and quote the slope.

                              Code:
                              ---
                              title: A Simple Regression
                              author: A Random R User
                              ---
                              
                              Let's run a simple linear regression
                              
                              ```{r}
                              b = coef(lm(dist ~ speed, data=cars))
                              library(ggplot2)
                              ggplot(cars, aes(speed, dist)) + geom_point() + geom_smooth(method="lm")
                              ```
                              
                              The slope is `r round(b["speed"], 3)` feet per mile per hour.
                              Stunning in its simplicity, isn't it? And it produces beautiful documents in HTML, PDF via LaTeX, or docx. You can see the HTML output here.

                              If only we could do something like that in Stata! Well, consider the following markstat script

                              Code:
                              ---
                              title: A Simple Regression
                              author: A Random Stata User
                              ---
                              
                              Let's run a simple linear regression
                              
                              ```{s}
                              rdatasets get datasets cars, clear
                              quietly regress dist speed
                              twoway lfitci dist speed || scatter dist speed , legend(off)
                              graph export stopping.png, width(600)
                              ```
                              
                              ![Stopping Distance](stopping.png)
                              
                              The slope is `s %5.3f _b[speed]` feet per mile per hour.
                              Now that doesn't really look that different, does it? OK, in R knitr detects that you generated a graph and magically includes it in the document. With markstat you do it yourself, using standard Stata and Markdown code. Is that a big gap?You can also generate HTML, PDF via LaTeX and docx. The HTML output is here, using the plottig scheme from SJ.

                              Perhaps the Markdown implementations are different? Nope. R Markdown and markstat both use Pandoc. In fact, today Stata has an edge because markstat uses Pandoc 2.0 and R Markdown uses 1.19, unless you install the development version from GitHub. So if you want slides with side-by-side code and figures like this, stick with Stata while the R folks catch up

                              Where there is a big difference is in the supporting infrastructure. R Studio comes with Pandoc, and the IDE provides excellent support. To run markstat you need to install it from SSC, and add whereis from SSC and Pandoc from pandoc.org. You can edit a markstat script in Stata's editor, but you won't see a "knit" button to press. You get a choice of document and slide formats, but R can do more, not to mention the fact that Yihui Xie has built bookdown and blogdown on top of rmarkdown and knitr. So we definitely have a lot more work to do.

                              On the other hand, don't underestimate the power of official solutions that are baked into Stata and thus work straight out of the box, specifically the new dyndoc command in Stata 15, which can produce HTML from a script that includes dynamic tags and, as Doug notes in #8, opens up possibilities.

                              And then there is always R. Doug Hemken pioneered the use of Stata and Markdown in R, and no doubt this solution will appeal to many. It is only fair to note, however, that knitr runs each Stata code chunk in a separate Stata session. So basically you need to save your data at the end of each chunk, and read them back at the beginning of the next chunk. Needless to say, markstat runs all your Stata code in the same session. Of course knitr accommodates multiple engines, and you need it for R and SAS, as Doug noted in #10.

                              In the end, I think you should use whatever software you feel more comfortable with.The aim of markstat was to increase the choices available to Stata users and help bridge the gap with other tools. Give it a try!

                              Comment

                              Working...
                              X