Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4535

Java-R Integration

$
0
0

1. Overview

R is a popular programming language used for statistics. Since it has a wide variety of functions and packages available, it's not an uncommon requirement to embed R code into other languages.

In this article, we'll take a look at some of the most common ways of integrating R code into Java.

2. R Script

For our project, we'll start by implementing a very simple R function that takes a vector as input and returns the mean of its values. We'll define this in a dedicated file:

customMean <- function(vector) {
    mean(vector)
}

Throughout this tutorial, we'll use a Java helper method to read this file and return its content as a String:

String getMeanScriptContent() throws IOException, URISyntaxException {
    URI rScriptUri = RUtils.class.getClassLoader().getResource("script.R").toURI();
    Path inputScript = Paths.get(rScriptUri);
    return Files.lines(inputScript).collect(Collectors.joining());
}

Now, let's take a look at the different options we have to invoke this function from Java.

3. RCaller

The first library we're going to consider is RCaller which can execute code by spawning a dedicated R process on the local machine.

Since RCaller is available from Maven Central, we can just include it in our pom.xml:

<dependency>
    <groupId>com.github.jbytecode</groupId>
    <artifactId>RCaller</artifactId>
    <version>3.0</version>
</dependency>

Next, let's write a custom method which returns the mean of our values by using our original R script:

public double mean(int[] values) throws IOException, URISyntaxException {
    String fileContent = RUtils.getMeanScriptContent();
    RCode code = RCode.create();
    code.addRCode(fileContent);
    code.addIntArray("input", values);
    code.addRCode("result <- customMean(input)");
    RCaller caller = RCaller.create(code, RCallerOptions.create());
    caller.runAndReturnResult("result");
    return caller.getParser().getAsDoubleArray("result")[0];
}

In this method we're mainly using two objects:

  • RCode, which represents our code context, including our function, its input, and an invocation statement
  • RCaller, which lets us run our code and get the result back

It's important to notice that RCaller is not suitable for small and frequent computations because of the time it takes to start the R process. This is a noticeable drawback.

Also, RCaller works only with R installed on the local machine.

4. Renjin

Renjin is another popular solution available on the R integration landscape. It's more widely adopted, and it also offers enterprise support.

Adding Renjin to our project is a bit less trivial since we have to add the bedatadriven repository along with the Maven dependency:

<repositories>
    <repository>
        <id>bedatadriven</id>
        <name>bedatadriven public repo</name>
        <url>https://nexus.bedatadriven.com/content/groups/public/</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>org.renjin</groupId>
        <artifactId>renjin-script-engine</artifactId>
        <version>RELEASE</version>
    </dependency>
</dependencies>

Once again, let's build a Java wrapper to our R function:

public double mean(int[] values) throws IOException, URISyntaxException, ScriptException {
    RenjinScriptEngine engine = new RenjinScriptEngine();
    String meanScriptContent = RUtils.getMeanScriptContent();
    engine.put("input", values);
    engine.eval(meanScriptContent);
    DoubleArrayVector result = (DoubleArrayVector) engine.eval("customMean(input)");
    return result.asReal();
}

As we can see, the concept is very similar to RCaller, although being less verbose, since we can invoke functions directly by name using the eval method.

The main advantage of Renjin is that it doesn't require an R installation as it uses a JVM-based interpreter. However, Renjin is currently not 100% compatible with GNU R.

5. Rserve

The libraries we have reviewed so far are good choices for running code locally. But what if we want to have multiple clients invoking our R script? That's where Rserve comes into play, letting us run R code on a remote machine through a TCP server.

Setting up Rserve involves installing the related package and starting the server loading our script, through the R console:

> install.packages("Rserve")
...
> library("Rserve")
> Rserve(args = "--RS-source ~/script.R")
Starting Rserve...

Next, we can now include Rserve in our project by, as usual, adding the Maven dependency:

<dependency>
    <groupId>org.rosuda.REngine</groupId>
    <artifactId>Rserve</artifactId>
    <version>1.8.1</version>
</dependency>

Finally, let's wrap our R script into a Java method. Here we'll use an RConnection object with our server address, defaulting to 127.0.0.1:6311 if not provided:

public double mean(int[] values) throws REngineException, REXPMismatchException {
    RConnection c = new RConnection();
    c.assign("input", values);
    return c.eval("customMean(input)").asDouble();
}

6. FastR

The last library we're going to talk about is FastR. a high-performance R implementation built on GraalVM. At the time of this writing, FastR is only available on Linux and Darwin x64 systems.

In order to use it, we first need to install GraalVM from the official website. After that, we need to install FastR itself using the Graal Component Updater and then run the configuration script that comes with it:

$ bin/gu install R
...
$ languages/R/bin/configure_fastr

This time our code will depend on Polyglot, the GraalVM internal API for embedding different guest languages in Java. Since Polyglot is a general API, we specify the language of the code we want to run. Also, we'll use the c R function to convert our input to a vector:

public double mean(int[] values) {
    Context polyglot = Context.newBuilder().allowAllAccess(true).build();
    String meanScriptContent = RUtils.getMeanScriptContent(); 
    polyglot.eval("R", meanScriptContent);
    Value rBindings = polyglot.getBindings("R");
    Value rInput = rBindings.getMember("c").execute(values);
    return rBindings.getMember("customMean").execute(rInput).asDouble();
}

When following this approach, keep in mind that it makes our code tightly coupled with the JVM. To learn more about GraalVM check out our article on the Graal Java JIT Compiler.

7. Conclusion

In this article, we went through some of the most popular technologies for integrating R in Java. To sum up:

  • RCaller is easier to integrate since it's available on Maven Central
  • Renjin offers enterprise support and doesn't require R to be installed on the local machine but it's not 100% compatible with GNU R
  • Rserve can be used to execute R code on a remote server
  • FastR allows seamless integration with Java but makes our code dependent on the VM and is not available for every OS

As always, all the code used in this tutorial is available over on GitHub.


Viewing all articles
Browse latest Browse all 4535

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>