Handling multiple types in Rcpp

Oliver Keyes bio photo By Oliver Keyes Comment

One of the most common problems I run into is how to deal nicely with multiple types. Integers and numeric values (or: integers and doubles, in C++) look very similar in R, but play very differently in compiled code.

Luckily it’s not actually tremendously hard to write Rcpp code that deals nicely with multiple values - unluckily the advice on how to do so is split up over multiple guides and found in guides for more specific or more general problems. So this is a brief writeup of tips and tricks for dealing with multiple types in Rcpp code, dedicated to just that, and at least partly just for my benefit.

And, well: for the first time in all of history, R’s C-types and API are actually your friend here.

The most common pattern in Rcpp guides is to see an example blob of code that looks something like:

//[[Rcpp::export]]
NumericVector timesTwo(NumericVector x) {
  return x * 2;
}

(This example, as you may have noticed, I’ve stolen from the RStudio default .cpp template)

Simple enough. Takes a numeric vector, multiplies every value by 2 using Rcpp’s excellent syntactic sugar, returns a numeric vector.

But what if we want to handle not only numeric vectors, but also integer vectors? We can pass integer vectors into that, of course, but they come out numeric, which may not always be what we want:

str(timesTwo(as.numeric(12)))
# num 24

str(timesTwo(as.integer(12)))
# num 24

One way of doing this would be to write multiple compiled versions of the same function and have a non-compiled function that wraps around them and detects the incoming type and sends the arguments to the ‘right’ function. But that’s gnarly and somewhat non-intuitive, because type detection in R isn’t a simple process (that nice integer vector you have is indeed an integer vector, but if you ask is.numeric you’ll find it’s also numeric, so I hope you structured your if(…) check in just the right way!)

The answer comes in, of all things, R’s C types, which it’s normally best to avoid like the plague. On the surface, R doesn’t have to recognise the difference between different types - they’re just SEXPs, or S-Expressions. And “SEXP” is a totally valid type to pass into or out of an Rcpp function.

When you combine that with Kevin’s writeup on dynamic wrapping

//[[Rcpp::export]]
SEXP timesTwo(SEXP x) {

	//For each supported type, turn it into the 'real' type and
  //perform the operation. We can use TYPEOF to check the type.
  switch(TYPEOF(x)){
  
  	case REALSXP: { //REALSXP == numeric
      NumericVector holding = (as<NumericVector>(x) * 2);
      return wrap(holding);
    }
      
    case INTSXP: { //Integer values
      IntegerVector holding = (as<IntegerVector>(x) * 2);
      return wrap(holding);
    }
      
    default: {
      stop("Only integer and numeric vectors are supported");
    }
  }
}

So, what we’re doing is:

  1. Passing in a generic object;
  2. If it’s a supported type, converting it into an “actual” type and performing the operation, and then returning it in wrap() to turn it back into a generic object;
  3. if it’s not a supported type, throwing an error and explaining what’s up.

This isn’t tremendously elegant, in some ways - the return() statements have to be in the switch statements, because C++ blows its lid if you try returning an object that only exists in conditionals, even if those conditionals are all-encompassing, and so there’s actually no “final” return call in the function.

But it means you get fine-grained control over the output type, and don’t have to write two totally different functions and handle them R-side. In exchange for substantially less code, I will absolutely take “an extra return statement and a very minor compiler warning”:

str(timesTwo(as.numeric(12)))
# num 24
str(timesTwo(as.integer(12)))
$ int 24