Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rcall integration when using R's TableOne package

    Hi there, I've been trying to get R's TableOne package to work with Stata via Rcall. See details here to install R and Rcall. In addition to the details in the link, need to install TableOne. So in R, type:
    Code:
    install.packages("TableOne")
    I'm trying to output a CSV file following the directions here (under "Real Export Way")

    Stata code:
    Code:
    sysuse auto, clear // here's our dataset
    rcall vanilla:                        ///
    library("tableone") ;         ///
    data<- st.data() ;             ///
    rows <- c("price", "mpg", "rep78", "headroom", "trunk", "weight", "length") ; ///
    columns <-c("foreign") ;     ///
    categorical <- c("rep78") ;     ///
    medianiqr <- c("weight") ;    ///
    table1 <-CreateTableOne(data=data, vars=rows, strata=columns, factorVars=categorical, addOverall = TRUE) ; ///
    output<-print(table1, nonnormal = medianiqr, missing=TRUE, quote=FALSE, noSpaces=TRUE, formatOptions = list(scientific = FALSE)) ;
    The last line returns the Stata error:
    string variables not allowed in varlist;
    Overall is a string variable
    r(109);
    I would need to add a triple slash and then one more line to get a CSV output of above, if I didn't get that error. Final line:
    Code:
     write.csv(output, file = "table1.csv") ;
    ...but the stata r(109) error seemingly keeps this from working. I'm wondering if folks have any suggestions about why this Stata error might be occurring and how I might address it. I suspect it has something to do with Rcall sending data back to Stata in the final line.

    Thanks! Tim

  • #2
    Hi Tim, I'm having trouble testing your code since it is unclear where the st.data() function comes from. Another dependency apart from tableone? There could be a few things going on here. The first thing that jumps to my mind is that print has a side effect apart from assigning to the output variable (i.e. printing to the standard output) and that might not play nicely with rcall.

    I'm curious as to your motivations for stitching these two platforms together like this, particularly when the code you have provided is almost entirely written in R. You're even writing output to the file system, which makes me think you already plan to use the R output in a different application later. Why not just use the R editor for this? Or better yet, why not use R Studio? You can (at the very least) test all of your R code in an environment like R Studio and get meaningful R-native error messages. If the line works in R but not Stata, then the problem is in rcall, otherwise the R console should give you a meaningful error message.

    In general, the best way to tie two applications together like this is with an interchange format (like a .csv) on the file system. Then you pass data between applications using the file system and your interchange format. You must have a good reason for using rcall, right?

    Comment


    • #3
      I guess if the problem is that rcall is intercepting the standard output from print and then trying to interpret it as native ado, you could always try redirecting the standard output away from Stata before printing. Of course, without testable code I'm basically just groping around in the dark...

      Code:
      sink(nullfile())
      output<-print(table1, nonnormal = medianiqr, missing=TRUE, quote=FALSE, noSpaces=TRUE, formatOptions = list(scientific = FALSE))
      sink()

      Comment


      • #4
        Thanks, Dan. St.data() is part of rcall. The code above is testable code. Just need to install r and rcall. Directions for that are in the original post.

        As for why I am trying to use rcall to do multistep R analyses: it's the same reason that StataCorp decided to support an interface with Python. There are some things that R and Python handle well that are not yet supported, or are otherwise annoying to do, in Stata. Eg, 3d figures. Certain machine learning functions. Etc. Table 1 generation is another of those things. The new table and collect features might at some point catch up, but are still onerous to use IMO. Having a solution via R to simplify the generation of Table 1s would be handy. To do it from my do file simplifies work flows.

        I do appreciate your thoughts and suggestions! Have a great weekend. Tim

        Comment


        • #5
          following #3: ref "Exporting" in Introduction to tableone :seems like printing can be avoided using printToggle = FALSE

          Comment


          • #6
            In my experience integrating two applications can be a huge headache. These solutions can be difficult to debug because it is difficult to isolate the cause of a problem - is it in the host, the client, or the interface between them? Of course, if you find Rcall improves your quality of life, who am I to argue?

            I was just reading through the source code and it looks like Rcall will instruct R to write output to the file system in your current working directory from R's perspective. The file is called "stata.output". If you inspect the file and try to load it into Stata yourself manually you might get some clues as to what is happening. I'm not sure whether or not you need to manually call these functions to transfer output between platforms, but if so that might also illustrate something interesting.

            Otherwise your best bet might be to reach out to the creator of the package, haghish.

            Please be sure to post a working solution when you find it. This will benefit future readers of the thread who may have a similar problem.

            Have a great weekend!
            Last edited by Daniel Schaefer; 15 Jul 2022, 11:55. Reason: Removed some needlessly argumentative and unnecessary details.

            Comment


            • #7
              Thank you for that suggestion, Bjarte Aagnes. It still stops with the same error, but adding "printToggle=False" now generates a stata dataset called "_load.matrix.output.dta" in R's working directory. Opening that up, it's the Table 1. Presumably, Rcall is trying to send R code to Stata using that .dta file somehow, and opening it up in Stata causes an error. In the below code, I put a "capture" command so Stata doesn't stop working at that error, and then import/export that data as an excel spreadsheet. A bit clunky, but it seems to work.

              Code:
              rcall: a<-getwd() ; // here's the working directory
              local wd =r(a) // save r's working directory in stata
              di "`wd'"
              sysuse auto, clear // here's our dataset
              capture rcall vanilla:                        ///
              library("tableone") ;         ///
              data<- st.data() ;             ///
              rows <- c("price", "mpg", "rep78", "headroom", "trunk", "weight", "length") ; ///
              columns <-c("foreign") ;     ///
              categorical <- c("rep78") ;     ///
              medianiqr <- c("weight") ;    ///
              table1 <-CreateTableOne(data=data, vars=rows, strata=columns, factorVars=categorical, addOverall = TRUE) ; ///
              output<-print(table1, nonnormal = medianiqr, missing=TRUE, quote=FALSE, noSpaces=TRUE, printToggle = FALSE, formatOptions = list(scientific = FALSE)) ;
              
              use "`wd'/_load.matrix.output.dta", clear
              export excel using "`wd'/text.xls", replace // export above dataset in R's working directory

              Comment


              • #8
                Tim, thanks for taking a moment to provide the solution. You might be able to slightly improve this by writing the output to an excel file directly with R's xlsx package. I'm not sure it matters too much; a couple of lines of R replaces the last two lines of Stata. It might be a bit faster for your computer to do it this way because you don't have to load the data into Stata's working memory before writing the xls file. You just work directly from R's working memory where the data is already loaded. The hard drive cable is a significant bottleneck. It could be a time saver with a sufficiently large dataset.

                Capturing the error and letting R continue to do its thing is clever. I imagine that technique could be useful for any future readers who are interested in taking advantage of Rcall, whatever the problem.

                Comment


                • #9
                  Thanks for the suggestion, Daniel Schaefer -- I had looked into using the writexl package since it, unlike other R packages, doesn't require Java for its function. The problem is that I can't get Rcall to make a dataframe in R, it dependably crashes when using the as.data.frame() command. Writexl only writes an Excel file using dataframes. This is a bit of a janky workaround, but it works!

                  Comment


                  • #10
                    Ah, yes, xlsx does have a Java dependency. If it were up to me, all R packages would be written either in native R, in C, or with Rcpp, but alas. There are some good JVM languages (I'm thinking Kotlin in particular) but the JVM always comes with a lot of overhead. I can see why you might want to avoid it if possible.

                    Are you sure as.data.frame() is the right function for your problem? I usually avoid as.data.frame() in favor of the data.frame() function (without the as). Data frame coercion with as.data.frame() is not as straightforward as you might expect. I find it is usually easier to construct a new data frame with data.frame().

                    Just my two cents.

                    Comment


                    • #11
                      Hi everyone. I am the developer of rcall and I try to give a quick explanation how to avoid errors like this in the future. In your original code, if you were avoiding the creation of the object 'output', the code would run without any error:

                      Code:
                      sysuse auto, clear // here's our dataset
                      rcall vanilla:                        ///
                      library("tableone") ;         ///
                      data<- st.data() ;             ///
                      rows <- c("price", "mpg", "rep78", "headroom", "trunk", "weight", "length") ; ///
                      columns <-c("foreign") ;     ///
                      categorical <- c("rep78") ;     ///
                      medianiqr <- c("weight") ;    ///
                      table1 <-CreateTableOne(data=data, vars=rows, strata=columns, factorVars=categorical, addOverall = TRUE) ; ///
                      print(table1, nonnormal = medianiqr, missing=TRUE, quote=FALSE, noSpaces=TRUE, formatOptions = list(scientific = FALSE)); ///
                      However, when you define a new object while using rcall, it tries its best to return it to Stata, so that you can continue working with the object and not just printed values. The aim is to improve reproducibility, making R a working horse for Stata. If you needed an excel table, you could simply export it from R or create a dataframe and save it or load a dataframe into Stata with st.load().

                      I checked the class of object output and it is "tableone", so probably, somewhere in the rcall there is a bug that tells rcall this is a "matrix". rcall returns matrices to Stata, making them available for further analyses and the tableone object has strings, which crash the process of creating matrices in Stata. matrices can be of variant classes in R and at the time of writing the package, I tried to make it as general and inclusive as possible.

                      Code:
                      sysuse auto, clear // here's our dataset
                      rcall vanilla:                        ///
                      library("tableone") ;         ///
                      data<- st.data() ;             ///
                      rows <- c("price", "mpg", "rep78", "headroom", "trunk", "weight", "length") ; ///
                      columns <-c("foreign") ;     ///
                      categorical <- c("rep78") ;     ///
                      medianiqr <- c("weight") ;    ///
                      table1 <-CreateTableOne(data=data, vars=rows, strata=columns, factorVars=categorical, addOverall = TRUE) ; ///
                      output <- print(table1, nonnormal = medianiqr, missing=TRUE, quote=FALSE, noSpaces=TRUE, formatOptions = list(scientific = FALSE), printToggle = FALSE); ///
                      write.csv(output, file = "tableone.csv") ; /// OR EXPORT AN EXCEL FILE, IF YOU WILL
                      /// AND THEN REMOVE THE OBJECT FROM R
                      rm(output);
                      Anyway, I think the error is due to a sloppy practice in stata.output.r file in the package. I created a bug report on GitHub, in case anyone can offer a solution. It will take me some time until I find some time to reread rcall and fix this bug myself...
                      https://github.com/haghish/rcall/issues/33

                      Last edited by haghish; 13 Nov 2023, 00:25.
                      ——————————————
                      E. F. Haghish, IMBI, University of Freiburg
                      [email protected]
                      http://www.haghish.com/

                      Comment

                      Working...
                      X