Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rcall : synchronizing R and Stata (for those who work with R and Stata)

    I have released a Stata package in GitHub that allows running R in Stata interactively, and more importantly, allows communicating of data, matrix, scalar, and macro from Stata to R, and automatically retrieves objects with matrix, character, numeric, and list classes back to Stata, with real-time update.

    For example, let's define 2 matrices in Stata and pass them to R using the st.matrix() function, which belongs to Rcall package. Then I sum the two matrices in R.
    Code:
            . matrix A = (1,2\3,4)
            . matrix B = (96,96\96,96)              
            . R: C <- st.matrix(A) + st.matrix(B)
            . R: C
                 [,1] [,2]
            [1,]   97   98
            [2,]   99  100
    Now, I also can access the new matrix C that was created in R, within Stata rclass. i.e.

    Code:
            . mat list r(C)
            r(C)[2,2]
                 c1   c2
            r1   97   98
            r2   99  100

    The package is not on ssc yet, but you can install it from GitHub.

    Code:
    net install Rcall, force from("https://raw.githubusercontent.com/haghish/Rcall/master/")

    What it needs now, is testing. So if you work with R and Stata, you may want to do your torture test and provide some feedback.

    I have also written a basic torture test which can serve as a basic introduction to the package.
    Last edited by haghish; 10 Jul 2016, 17:35.
    ——————————————
    E. F. Haghish, IMBI, University of Freiburg
    [email protected]
    http://www.haghish.com/

  • #2
    A quick package demonstration
    http://www.haghish.com/packages/Rcall.php
    ——————————————
    E. F. Haghish, IMBI, University of Freiburg
    [email protected]
    http://www.haghish.com/

    Comment


    • #3
      As someone working in both R (via R Studio) and Stata, it will be VERY useful to write the R scripts directly in the Stata .do-files and carry over variables, data and matrices. I do, however, have troubles with the install, and ask this via the Stata forum as I expect some other will have similar issues. The problem is that when using the Rcall (or "R") command, I constantly get errors.

      I have a Macbook Pro with El Capitan and Stata 14 IC. And have installed R in the default dir (the full filename, as given by the terminal is /Applications/R.app). However, experimention with different setpath commands have been unsuccessful:

      . Rcall setpath "/Applications/R.app"
      . Rcall: print("greetings from Stata")
      /bin/bash: /Applications/R.app: is a directory

      . Rcall setpath "/Applications"
      . Rcall: print("greetings from Stata")
      /bin/bash: /Applications: is a directory

      . Rcall setpath "/Applications/R"
      file /Applications/R not found

      Could you shed some light on this issue?

      Jan Fredrik Hovden
      Professor
      Department of Information Science and Media Studies, University of Bergen
      Web page: http://janfredrikhovden.wordpress.com/about/

      Comment


      • #4
        I cannot answer for Mac computers but this works for me on Windows:

        R: setpath "C:\Program Files\R\R-3.3.1\bin\X64\R.exe"

        R: setwd("C:\\...\\...")

        R: load("C:/.../xyz.rdata")

        John Moran

        Comment


        • #5
          Jan Hovden the error is clearly due to specifying a wrong path to R. The R application is actually a directory. You can right click on it and "show content" of the app. You need a path to executable R on your machine. I'm confident that "usr/bin/r" should work just fine for you. you can often find the path by searching "which r" in the terminal which returns the path to R.

          I should also note that it seems you are using an old version of Rcall. Try reinstalling the package from GitHub, the version on SSC is way behind...
          Last edited by haghish; 03 Aug 2016, 01:02.
          ——————————————
          E. F. Haghish, IMBI, University of Freiburg
          [email protected]
          http://www.haghish.com/

          Comment


          • #6
            Installing Rcall from Github (not SSC) and using the command ...

            Rcall setpath "/usr/local/bin/r"

            ... did the trick.

            Thank you som much both to both for the help. Look forward to trying this out in the next weeks.

            Jan Fredrik Hovden, Professor
            Department of Information Science and Media Studies, University of Bergen
            Web page: http://janfredrikhovden.wordpress.com/about/

            Comment


            • #7
              Really good work mr Haghish
              I work with Stata 13, 14 and with a server linux version of Stata 13.

              ------------------------------------------------------
              Some problems

              st.load (Stata under v14)
              - the Rcall.ado uses the version() option with saveold. It returns an error msg because, I think, version() is not an option for versions under 14.
              ex with Stata 13:
              Rcall: st.data()
              option version() not allowed



              Linux version (server) install
              Perhaps these problems occurred because of our server parametrization but it was really hell to install.

              (1) package install
              As you said before the ssc versions is not up to date. When I try to install it from github, it fails.
              net install Rcall, replace from("https://raw.githubusercontent.com/haghish/Rcall/master/")
              https://raw.githubusercontent.com/haghish/Rcall/master/ either
              1) is not a valid URL, or
              2) could not be contacted, or
              3) is not a Stata download site (has no stata.toc file).
              --Break--

              Finally I made the install manually (cd install) with the zip archive in github.

              (2) run Rcall
              When I tried to run Rcall after the install, the Rcall command was unrecognized. The soluce, was to run Rcall.ado at each session (I updated my profile.do to automate this step).

              (3) setpath
              In our server, R is not located the default path /usr/local/bin/ and Rcall: setpath refused to change it (and always refuses) .
              By editing Rcall ado (line 386), this problem had been fixed.

              (3) Works only with Rcall:
              The command Rcall: is working but not R:
              It's strange
              ex:
              Rcall: print("Thx mister Haghish")
              [1] "Thx mister Haghish"


              R: print("Thx mister Haghish")
              unrecognized command: R


              ------------------------------------------------------

              Finally, it's really really easy to include R code in ado file

              For example: Stata doesn't compute Gray test for competing risk. With R, it can be done easily with cmprsk package (Gray).
              I wrote a simple command to make (with other results using stcompet and stpepemori)
              competout time status, event(1) by(drug) test(sr)

              [outputs for IC estimates & graph omitted]

              Gray test

              use Rcall (Haguish, 2016) &amp; cmprsk (Gray)

              Line 1 - Test for main event failure: status == 1
              Line 2 - Test for competing event failure: status == 2
              Tests:
              stat pv df
              1 4.908811 0.026720033 1
              2 8.615262 0.003333579 1


              [output for others tests omitted]

              r; t=8.43 15:36:38



              MT

              Comment


              • #8
                Marc Thevenin
                1. I wish you were noting the version of Rcall you are using (found in the help file). The latest update is 1.2.0
                2. st.load()function returns an error in Stata 13. FIXED. thanks
                3. installing from GitHub fails? Well, all I can say is that I have no problem on my Linux Ubuntu, Windows 7, and MacOS systems to install. Nevertheless, this is not a new problem and many users have been saying the same about MarkDoc and other packages. I hope you kindly take the time to send an email to Stata tech support and let them know. I have no bug like that and I cannot really help. http://www.stata.com/support/tech-support/contact/.
                4. Rcall was unrecognized! That means the version you are using is buggy. Install the latest version. There should be no bug like that!
                5. setpath. Well, you are doing it wrong, no matter how many times you do it the colon sign is a separator between Stata and R. So, when you type the following command, Stata tells R to execute whatever comes after the colon! Don't use the colon. the setpath is a subcommand and you should not use colon with it. It's clear from the syntax of the latest version. So instead of:

                  Code:
                  Rcall: setpath path/to/R/on/your/system
                  use

                  Code:
                  Rcall setpath path/to/R/on/your/system
                  Nevertheless, it'd be informative if you also mention what was wrong in line 386 if the new version doesn't solve it.
                6. Rcall works but R is not. You are totally using a very bad old release. It's totally worth downloading the package and installing it locally until you figure out what's wrong with installing from GitHub (and let us know please what's going on with that).
                Rcall provides a program named Rcall_check for embedding R in Stata programs defensively. For example, you want to make sure the user has R installed correctly, and has the required R version and R packages with particular versions installed on his system and return proper errors accordingly. There are several more technical aspects that Rcall handles about embedding R defensively (for example what if R crashes or returns an error inside a Stata program? How proper errors should be handled?). All of these are handled by Rcall_check program and procedures which are explained in the Rcall article. I will upload the article on my website soon, hopefully!

                Nevertheless, I'd be very interested to see your ado program!
                Last edited by haghish; 12 Aug 2016, 13:57.
                ——————————————
                E. F. Haghish, IMBI, University of Freiburg
                [email protected]
                http://www.haghish.com/

                Comment


                • #9
                  I'm getting an error from -Rcall- (SSC) in vanilla mode in a do-file but the error message disappears before I can read it. I am using Stata 14.1/MP for Windows, R Version 3.3.0, and Rcall version 1.2.2.

                  Some workarounds for ado-files are documented here: http://www.haghish.com/resources/pdf/Haghish_Rcall.pdf . It would be nice to have some examples on how to use Rcall non-interactively in Stata do-files, and how to handle errors.

                  Example Stata code in do-file:
                  Code:
                  R vanilla: source("script.r")
                  Example R code in script.r:
                  Code:
                  this_is_some_bad_R_syntax
                  Anders Alexandersson
                  [email protected]
                  Last edited by Anders Alexandersson; 26 Aug 2016, 13:15.

                  Comment


                  • #10
                    Please upload your script files, to figure out what is going wrong. If there is error in your code, it must appear in interactive mode as well. The only difference between the interactive and non-interactive is that objects in the workspace and the packages and datasets you attach, will be memorized in the interactive mode.

                    Besides, Rcall returns the errors to Stata:

                    Code:
                    . Rcall this_is_some_bad_R_syntax
                    
                    Error: object 'this_is_some_bad_R_syntax' not found
                    Execution halted
                    Using Rcall in do-files is really simple. It is also simple to use in in ado files. However, the trick is to think about potential errors and program defensively (and communicate errors from R to Stata). I have talked about a few approaches in the paper...

                    But anyway, if you get an error from Rcall, you also should expect an error if you run your code directly in R. If you don't get an error, then you are doing something wrong... So I should see your script files.
                    ——————————————
                    E. F. Haghish, IMBI, University of Freiburg
                    [email protected]
                    http://www.haghish.com/

                    Comment


                    • #11
                      Haghish, thanks for the reply. The error message is only very briefly displayed in the command window with the heading "C:\Windows\system32\cmd.exe" and the message disappears like a flash before I can read it carefully, and the error is not returned to my copy of Stata. This happens whether I run Rcall interactively or in vanilla mode. If I instead run the one-line R code directly in RStudio (version 0.99.902), then the first line of the error message is the same but "Execution halted" is not displayed.

                      This is my best guess/reading of the elusive error message in the Windows command window. It is clear as mud to me:
                      "Error in eval(expr, envir, enclose) :
                      Object 'this_is_some_bad_R_syntax' not found
                      Calls: source -> with Visible -> eval
                      Execution halted"

                      My one-line .do file and one-line .r file were as described. Both files are used in the same working directory. The .do file is attached. I get the message "Invalid File script.r" when I try to upload the .r file. The Forum Software FAQ states:

                      File attachments may be data files such as .zip, .txt, .doc, or any other file types the Administrator has allowed. You may also attach images (.jpg, .png. and .gif.)
                      Are .r files not allowed by the Administrator? To work around the problem, I uploaded the R file with file name extension .txt.

                      Anders Alexandersson
                      [email protected]
                      Attached Files

                      Comment


                      • #12
                        I am still unable to reproduce Haghish's error message in #10. The best debugging approach I found for Rcall vanilla is to compare the results with the results from running the R code directly in RStudio.

                        Substantively, my problem with Rcall vanilla is how to handle the dollar ("$") character in R code. I first reported the problem here (wrongly under Markdoc): https://github.com/haghish/MarkDoc/issues/10. The solution in Rcall interactive for this is to use the backslash character before the "$" character. But in the vanilla mode that solution fails: not adding backslash before "$" ignores the line of code, and adding backslash before "$" creates a difficult-to-read R error similar to #9 and #11. For now, my workaround is to instead use Roger Newson's -rsource()- command (SSC) if I need R code with the $ character, as I do for probabilistic record linkages in R.

                        Can Rcall vanilla handle R code with the "$" character? The backslash character fix for the interactive mode fails in the vanilla mode as described above.

                        --Anders
                        Last edited by Anders Alexandersson; 29 Aug 2016, 15:02.

                        Comment


                        • #13
                          Originally posted by Anders Alexandersson View Post
                          Can Rcall vanilla handle R code with the "$" character? The backslash character fix for the interactive mode fails in the vanilla mode as described above.
                          \$ works with lm.ado example (https://github.com/haghish/Rcall/blo...ramming/lm.ado)
                          But lm is a native R function. Is it possible that the problem occurs with some users written functions only?

                          I have this problem with cmprsk package.
                          Problem: cuminc function displays estimates and variance for subdistribution hazard + a test to compare differents stratas. I only want to display the test (object: Tests).

                          I found, I think, a soluce under vanilla mode.

                          So I can show to mr Haghish the part of the ado involving R (see 12 aug)

                          Code:
                          * database & cmprsk package
                          use http://www.stata-press.com/data/cggm3/bc_compete, clear
                          R: install.packages("cmprsk", repos="http://cran.uk.r-project.org")
                          R: library("cmprsk")
                          The command is
                          Code:
                          cuminc time_var event_var group_var
                          The easiest way => R: mode only
                          Code:
                          capture program drop cuminc
                          program  cuminc
                          syntax varlist
                          tokenize `varlist'                                   
                          R: t = st.var(`1'); e = st.var(`2'); g = st.var(`3'); out=cuminc(t,e,g);  
                          R: out\$Tests;
                           
                          end
                          
                          cuminc time status drug

                          Other solution => R: mode only
                          Code:
                          capture program drop cuminc
                          program  cuminc
                          syntax varlist
                          tokenize `varlist'
                                                             
                          R: t = st.var(`1'); e = st.var(`2'); g = st.var(`3'); out=cuminc(t,e,g);           
                          R: print(out[names(lapply(out, typeof))=="Tests"]);                                      
                          end
                          
                          cuminc time status drug

                          Solution working under vanilla
                          Code:
                          capture program drop cuminc
                          program  cuminc
                          syntax varlist
                          
                          tokenize `varlist'
                          
                          * Stata =>  position of "Tests" object: (number of events - 1)*(number of stratas) + 1
                          tempname l1 l2 last
                          qui tab `2'
                          scalar `l1'=r(r)-1
                          qui tab `3'
                          scalar `l2'=r(r)
                          scalar `last'=`l1'*`l2'+1
                          
                          Rcall vanilla:                                                                   ///
                          library("cmprsk");                                                            ///
                          last=st.scalar(`last');                                                       ///
                          t = st.var(`1'); e = st.var(`2'); g = st.var(`3');                        ///
                          out=cuminc(t,e,g);                                                          ///
                          invisible(typeof(out[last])); out[[last]];                                ///
                          
                          end
                          
                          cuminc time status race

                          Comment


                          • #14
                            Originally posted by Marc Thevenin View Post
                            Is it possible that the problem occurs with some users written functions only?
                            The problem I reported in #12 was for a user-written function, compare.linkage() in the package RecordLinkage.

                            Comment


                            • #15
                              This morning I tried with others users written function and it works well.
                              But, in my case if I use st.data, problem disappears. . It's really strange.
                              With st.var as I done before (see #13)
                              Code:
                              capture program drop cuminct
                              program define cuminct
                              syntax
                              Rcall vanilla:                                                                                 ///
                              library("cmprsk") ;                                                                        ///       
                              t = (st.var(time)) ; e = st.var(status); g = st.var(drug);                  ///                 
                              out = cuminc(t,e,g)  ;                                                                /// 
                              out\$Tests ;                                                                              ///    
                                    
                              end
                              
                              cuminct
                              
                              * output displayed
                              NOTHING 
                              With st.data
                              Code:
                              capture program drop cuminct
                              program define cuminct
                              syntax
                              
                              Rcall vanilla:                                                                       ///
                              library("cmprsk") ;                                                              ///       
                              attach(st.data()) ;                                                               ///                 
                              out = cuminc(time,status,drug)  ;                                      /// 
                              out\$Tests      ;                                                                   ///          
                              
                              end
                              
                              cuminct
                              
                              * output displayed
                                    stat      pv      df
                              1 4.908811 0.026720033  1
                              2 8.615262 0.003333579  1
                              Anders did you use st.var?

                              Comment

                              Working...
                              X