piton released on CRAN

By Os Keyes

I’m pleased to announce the release of piton, a software library for writing Parsing Expression Grammars that can be embedded in and accessed from R.

Parsing whats?

Parsing Expression Grammars, or PEGs, are a formal language for recognising and parsing strings. They produce unambiguous matches, are more powerful than regular expressions, and (in the case of piton) are implemented entirely in compiled code.

R already uses PEGs - Hadley Wickham’s readr package uses a Parsing Expression Grammar to recognise file formats and element formats, which is one of the reasons it’s so dang fast (that and Hadley’s a bloody genius). But the few implementations there are have been hand-written each time.

Piton!

piton (so named because it’s the term for a type of climbing peg - I know, I know) doesn’t actually implement a parsing expression grammar. Rather, it wraps the Parsing Expression Grammar Template Library (PEGTL) library by Colin Hirsch and Daniel Frey and so provides a framework for you to implement arbitrary PEGs (and define actions to be taken upon a match happening).

As an exceedingly simple example of what PEGs can be used for, piton contains a simple PEG that recognises comma-separated lists of numbers and (if presented with one, stored as a string) can reliably extract all the numbers and sum them together:

library(piton)

peg_sum("1,2,  5, 91, 34")
[1] 133

Because the package’s internal library is header-only it can be imported and used by other Rcpp packages. As a result, R programmers who work with C++ can now (should they choose) define arbitrary grammars and actions using piton, and incorporate those into their own packages or codebases.

Use

As said, this is designed for package and codebase development where you need to reliably parse an arbitrary thing, not random use. If you’re interested in implementing a PEG in your package, you need to:

  1. Link piton into your package as a dependency, using //[[Rcpp::depends()]] (see this post for an example of how it works);
  2. #include <pegtl.hpp>
  3. Write and expose your grammar
  4. Done!

The PEGTL docs contain quite a bit of documentation on how the underlying library works, and various examples of PEGs are included within the package itself. Integration with Rcpp is pretty clean, although developers may benefit from not explicitly using the Rcpp namespace due to a couple of collisions with the PEGTL namespace.

piton is already in use by Scott Chamberlain’s experimental pegax library for parsing taxonomy data, and is now stable and on CRAN. It can be installed with:

install.packages("piton")

To see the source code, raise issues or suggest improvements, simply visit the GitHub repository and let me know what’s up!