An interesting conversation came up during a tea break in London meeting this week. How do run R scripts from within Clojure? One was simple, the other (mine) was far more complicated (see the “More Complicated Ways” section below).
So here’s me busking my way through the simple way.
Run it from the command line
The Clojure Code
Using the clojure.java.shell
package gives you access the Java system command process tools. I’m only interested in running a script so all I need is the sh
command.
(ns rinclojure.example1 (:use [clojure.java.shell :only [sh]]))
The sh
function produces a map with three keys: an exit code (:exit
), the output (:out
) and an error (:err
). I can evaluate the output map and ensure there’s no error code, anything that’s not zero, and dump the error or if all is well send out the output.
(defn run-command [r-filepath] (let [command-output (sh "Rscript" r-filepath)] (if (= 0 (:exit command-output)) (:out command-output) (:err command-output))))
The R Code
I’ve kept this function simple, I’m only interested in running Rscript and checking the error code. If all is well then we show output, otherwise we send out the error.
The now preferred way to run R scripts from the command line is the Rscript
command which is bundled with the R software when you download it. If I have R scripts saved then it’s a case of running them through Rscript
and evaluating the output.
Here’s my R script.
myvec <- c(1,2,3,2,3,4,5,4,3,4,3,2,1) mean(myvec)
Not complicated I know, just a list of numbers and a function to get the average.
Running in the REPL
Remember the error is from the running of the command and not within your R code. If you mess that up then those errors will appear in the :out value.
A quick test in the REPL gives us…..
rinclojure.example1> (def f "/Users/jasonbell/work/projects/rinclojure/resources/r/meantest.R") #'rinclojure.example1/f rinclojure.example1> (run-command f) "[1] 2.846154\n" rinclojure.example1>
Easy enough to parse by removing the \n and the [1] line which R have generated. We’re not interacting with R only dumping out the output from it. After that there’s an amount of string manipulation to do.
Expanding to Multiline Output From R
Let’s modify the meantest.R
file to give us something multiline.
myvec <- c(1,2,3,2,3,4,5,4,3,4,3,2,1) mean(myvec) summary(myvec)
Nothing spectacular I know but it has implications. Let’s run it through our Clojure command function.
rinclojure.example1> (def f "/Users/jasonbell/work/projects/rinclojure/resources/r/meantest.R") #'rinclojure.example1/f rinclojure.example1> (run-command f ) "[1] 2.846154\n Min. 1st Qu. Median Mean 3rd Qu. Max. \n 1.000 2.000 3.000 2.846 4.000 5.000 \n" rinclojure.example1>
Using clojure.string/split will give us the output in each line into a vector.
rinclojure.example1> (clojure.string/split x #"\n") ["[1] 2.846154" " Min. 1st Qu. Median Mean 3rd Qu. Max. " " 1.000 2.000 3.000 2.846 4.000 5.000 "] rinclojure.example1>
There’s still an amount of tidying up to do though. Assuming I’ve created x to hold the output from the Rscript. Firstly split the \n’s out.
rinclojure.example1> (def foo (clojure.string/split x #"\n")) #'rinclojure.example1/foo rinclojure.example1> foo ["[1] 2.846154" " Min. 1st Qu. Median Mean 3rd Qu. Max. " " 1.000 2.000 3.000 2.846 4.000 5.000 "] rinclojure.example1>
If, for example, I wanted the summary values then I have do some string manipulation to get them.
rinclojure.example1> (nth foo 2) " 1.000 2.000 3.000 2.846 4.000 5.000 "
Split again by the space.
rinclojure.example1> (clojure.string/split (nth foo 2) #" +") ["" "1.000" "2.000" "3.000" "2.846" "4.000" "5.000"]
The final step is then to convert the values to numbers, forgetting the first as it’s blank. So I would end up with something like:
rinclojure.example1> (map (fn [v] (Double/valueOf v)) (rest (clojure.string/split (nth foo 2) #" +"))) (1.0 2.0 3.0 2.846 4.0 5.0) rinclojure.example1>
We have no referencing to what the number means, if the min, max, average etc. At this point there would be more string manipulation required and you could convert them to keywords or just add your own.
More Complicated Ways.
With the R libraries exists the RJava package. This lets you run Java from R and R from Java. I wrote a chapter on R in my book back in 2014.
It’s not the easiest thing to setup but worth the investment. There is a Clojure project on Github that acts as a wrapper between R and Clojure, clj-jri. Once setup you run R as a REngine and evaluate the output that way. There’s far more control but it comes at the cost of complexity.
Keeping Things Simple
Personally I think it’s easier to keep things as simple as possible. Use Rscript to run the R code but it’s worth considering the following points.
- Keep your R scripts as simple as possible, output to one line where possible.
- Ensure that all your R packages are installed and working, it’s not idea to install them during the Clojure runtime as the output will become hard to parse. Also make sure that all the libraries are running on the same instance as your Clojure code.
- In the long run have a set of solid string manipulation functions to hand for dealing with the R output. Remember, t’s one big string.
2 responses to “How to run R scripts from Clojure – #clojure #r #datascience #data #java”
That is pretty cool! What if this data structure ‘(1,2,3,2,3,4,5,4,3,4,3,2,1) came from clojure? would it be doable to use R fn on it?
I don’t see any reason why not.
The main thing to keep in mind is that you need to think through the chain of events from Clojure to R.
There are some things you’ll need to prepare within the Clojure code and there will be potentially transforms to do on the R script side too.