Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dynamic Documents with Stata and R Code

    Thanks to Kit Baum a new update of -markstat- is available from SSC. New this time is support for R code blocks and inline code. Here is a simple example:

    Code:
    % Quantiles in Stata and R
    
    Stata and R compute percentiles differently. Let us load the `auto`
    dataset and compute the 75th percentile of `price` using Stata's `centile`
    
    ```s
        sysuse auto, clear
        centile price, centile(75)
        save auto, replace
    ```
    
    We find that the 75-th percentile is `s r(c_1)`.
    
    Now let us do the same with R. We'll use the `haven` library to read a
    Stata file
    
    ```r
        library(haven)
        auto <- read_dta("auto.dta")
        q <- quantile(auto$price, 0.75); q
    ```
    
    According to R, the 75-th percentile is `r round(q, 1)`.
    
    Turns out R has 9 types of quantiles, the default is 7.  To get the same result
    as `centile` specify type 6, which gives `r quantile(auto$price, 0.75, type=6)`.
    
    The Stata commands `summarize, detail`, `xtile`, `pctile` and `_pctile` use yet
    another method, equivalent to R's type 2. These give the third quartile as
    `r quantile(auto$price, 0.75, type=2)`. The last three commands have an
    `altdef` option that gives the same answer as `centile`.
    
    For a discussion of these methods see
    Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages,
    *American Statistician* 50:361-365.
    As you can see, we handle R code the same way as Stata and Mata, using code fences but with an r instead of an s or m. You can copy and paste this script, or download it to your working directory using the command

    Code:
    copy http://data.princeton.edu/stata/markdown/quantiles.stmd quantiles.stmd
    To run this script in Stata you use the command:

    Code:
    markstat using quantiles
    All five output formats are supported. You can see the HTML output here. For this to work you need to have R installed, and you need to use -whereis- from SSC to register the location of R in your computer. Instructions and examples here.

    Following a suggestion of Doug Hemken, -markstat- now removes intermediate files, but has an option to specify which ones to keep. It also detects code fences and switches to strict mode automatically.

    As usual, much more at the website http://data.princeton.edu/stata/markdown, including a more extensive example using Bootstrap tabs to switch between Stata and R versions of a Cox regression analysis.

  • #2
    Beautiful!

    I like this:
    It also detects code fences and switches to strict mode automatically.
    I'm trying to remember, does the code fence
    Code:
    ```{stata}
    work as well? If so, we are getting close to a standard writing format that can be rendered in multiple environments, which should be one of our goals.
    Doug Hemken
    SSCC, Univ. of Wisc.-Madison

    Comment


    • #3
      Thanks Doug!

      The fence ```{stata} will not work, in the sense that it will not be interpreted as code to be run through Stata, but ```{s} will. You are right, though, that we are getting close to scripts that can be rendered in multiple environments, perhaps with small tweaks.

      Comment


      • #4
        German, this is a very interesting and useful advance. I will continue to explore the possibilities of the package, which are expanding quickly.

        This isn't a serious problem, but I followed the code above and it did not work due to what I take to be an encoding issue when using copy to a mac running OS X (I'm running OS 10.12.6). I've seen stuff like this before. I'm posting this here in case others encounter the issue. The code below is what results from the copy command. It chokes due to the translations in the the R code chunk like &lt;- for <-.

        But to be clear, the code as posted on the markstat documentation page works. The problem creeped in for me at the copy stage and produced the code below.

        Code:
        % Dynamic Documents with Stata and R Code
        
        Stata and R compute percentiles differently. Let us load the `auto`
        dataset and compute the 75th percentile of `price` using Stata&#39;s `centile`
        
        ```s
            sysuse auto, clear
            centile price, centile(75)
            save auto, replace
        ```
        
        We find that the 75-th percentile is `s r(c_1)`.
        
        Now let us do the same with R. We&#39;ll use the `haven` library to read a 
        Stata file
        
        ```r
            library(haven)
            auto &lt;- read_dta(&quot;auto.dta&quot;)
            q &lt;- quantile(auto$price, 0.75); q
        ```
        
        According to R, the 75-th percentile is `r round(q, 1)`. 
        
        Turns out R has 9 types of quantiles, the default is 7.  To get the same result 
        as `centile` specify type 6, which gives `r quantile(auto$price, 0.75, type=6)`.
        
        The Stata commands `summarize, detail`, `xtile`, `pctile` and `_pctile` use yet 
        another method, equivalent to R&#39;s type 2. These give the third quartile as
        `r quantile(auto$price, 0.75, type=2)`. The last three commands have an 
        `altdef` option that gives the same answer as `centile`.
        
        For a discussion of these methods see
        Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, 
        *American Statistician* 50:361-365.

        Comment


        • #5
          Try adding a -text- option to the -copy- command.
          Doug Hemken
          SSCC, Univ. of Wisc.-Madison

          Comment


          • #6
            Doug,

            Investigating further, it looks like the file following the link below has some encoding issues, which explains the problem.

            http://data.princeton.edu/stata/markdown/quantiles.stmd

            Comment


            • #7
              It always pays to go look at the actual file!
              Doug Hemken
              SSCC, Univ. of Wisc.-Madison

              Comment


              • #8
                Thanks Eric and Doug for pointing out and investigating this problem. Turns out the sample script was html-encoded when transferred to the server, when of course it shouldn't have been. Took a while to fix this as I was travelling, but everything should be alright now. My apologies for the snafu.

                Comment

                Working...
                X