I love the R community deeply. That’s the first thing I should say. Well, really, the first thing I should say is that people I cite, quote and praise in this essay do not necessarily endorse it - and in many cases don’t know that it’s being written. It’s slightly sad that I need to open with a disclaimer to prevent possible blowback on to lovely people, but there we have it.
The second thing, though, is that I love the R community deeply. R is not a beautiful language: it is a terrible language. But it is a terrible language with a vast array of possible applications, a vast array of possible packages, and a vast array of wonderful, supportive people who do fantastic work to open statistical computing up to a wider audience, and support a wider set of use cases every day. We started off as a FORTRAN wrapper and ended up with most of a server environment.
We have the rOpenSci and rOpenGov groups, working to increase the things we can do in this language, the services we can connect to, the possibilities available to us. We have people like Hadley, spending every day thinking about where people need the environment to go. We have Dirk making things faster, Jereoen making things more refined, we have Scott and Hilary and Bob and Karthik and everyone else who works to build a community on both hackathons and levity (shout out to the #RCatLadies!)
Every time I work on an R library, or just look at the pertinent Twitter hashtag, I get overjoyed that this is a community I get to be a part of. That, day in, day out, I get to hang out with so many welcoming, thoughtful, productive people over the interwebs, always available for feedback or pointers or venting. I owe Adam tremendously for getting me involved in what is, ultimately, a wonderful movement of wonderful people, separated by diverse use cases but joined by a desire to make those use cases as easy to fulfil as possible.
Unfortunately this is only one part of the R community - a majority, I’d like to think, but still only one half.
The reason I pull out Twitter as an illustration is that there’s a well-known meme within the R community: when someone tells you they want to learn R, or are learning R and have a problem, every piece of advice opens with “don’t go to the mailing lists”. Because while one chunk of the community is welcoming and open, the other is… well, not. We have people whose primary desire is to help people Get Stuff Done: we also have people whose primary desire seems to be to minimise the work they have to put in.
A great example of the latter is the CRAN maintainers. To avoid any ambiguity here (I’ve never had anything but good experiences with Uwe) I’m primarily talking about Brian Ripley, who, from what I can see as a package submitter, is the person responsible for ultimately giving the thumbs up or down to incoming submissions.
For free, and in his spare time, Ripley handles package submissions and the R-help mailing list. This is much appreciated: it is not an easy pile of work to handle. Simply looking at the number of packages reviewed and found acceptable, via CRANberries, shows that a tremendous amount of work goes into reviewing anything. This is work that is tremendously appreciated, and to find the time to make contributions on top of that - to R-core, to the mailing lists - indicates tremendous dedication. I certainly don’t have the time to do the work that Ripley does, nor do I have the expertise for much of it.
The problem is that while doing this work, Ripley does not come off as welcoming, or thoughtful. Ripley comes off as arrogant and scornful. There are countless stories of him ignoring or chewing people out - newbies and experienced contributors alike - for asking the wrong questions, or asking the right questions in the wrong way, or sometimes, it seems, simply because he felt like it.
I’ve run into this problem myself, on several occasions: the first substantial clash was over my work on meanit/averageimage, and the most recent over a new submission of urltools. I submitted a new version and was informed there’s an unterminated string somewhere in the C++. “Great!”, I said. “My build and check process doesn’t show any such error: could you let me know how you’re building so that I can amend my process accordingly, and detect this sort of error in the future?”
“Do your own homework!”
That’s, first, not a helpful answer (although he did inform me that the thing I needed was somewhere in the 75,000-word “Writing R Extensions” manual, which is at least theoretically helpful): it’s also not an encouraging one. It does not make me want to patch the bug, it makes me want to avoid CRAN entirely. It makes me want to limit my R contributions to GitHub-stored packages and Twitter and StackOverflow. And this is not the first, fifth or tenth time that I’ve seen this sort of issue turn up: Ripley’s name is a watchword, in the R community, for the sort of person who prizes their own time far more than they do that of other people, and actively refuses to put in the minimum amount of work to treat people with a basic level of respect. That’s not acceptable in an R Core Team member, and it’s absolutely not acceptable in someone who, as the primary POC for CRAN, is the main interface between the recognised repository of R code and the world.
The problem is that, as said, he’s the primary POC for CRAN. This means that, first, there is absolutely no incentive to be nice, or cost to failing to be (who do you replace Ripley with? Our bus factor, here, is 0), and second, that I know of a lot of people who simply refuse to argue back, or address the problem of tone, or do anything in dealing with him, because they’re afraid of blowback in the form of “good luck getting anything approved ever again”. Whether this is a plausible outcome or not is unknown, but it’s certainly a reasonable one given the approach he takes to people being polite to him.
So this is me, probably being tremendously stupid if my goals include ever getting code approved again, standing up and saying that something is tremendously broken in the package submission process, and it seems to center on who handles that process. The R community is at its best when it is welcoming, and thoughtful: when it is the sort of people who hang out on Twitter and StackOverflow and GitHub. But it should say something that that’s where such people choose to hang out when, theoretically, we have CRAN and RForge and the mailing lists. It should say something about the attitude of people who run those services, and how welcoming (or not) they make them.
I don’t know, actually, where we go from here, or what I’d like as an outcome, because I honestly don’t have any reasonable expectation that there will be a change. I’ve pretty much given up on anything substantive happening to the social dynamics of the Core Team and the interplay between them and the people who write everything else, because it’s a dynamic entirely determined by Ripley et al. But I can see the community going one of three ways, in the next few years, if the status quo continues.
The first - the worst-case scenario - is that we simply wilt. Without a wide contributor base, without enough fresh blood to bring in new ideas and perspectives, and without any kind of welcoming attitude being displayed by the people who currently run things, core stagnates and CRAN follows it as people are driven more towards environments such as NumPy and Julia, which are more inchoate and more open to new participants.
The second, which I consider the most likely, is that we do a Gilmore and route around the damage. We start treating GitHub, with the support that libraries like DevTools provide, as the hosting-place and meeting-place for R libraries and their maintainers. We use Twitter and StackOverflow and hackathons for communication. I can already see this happening, in some ways: I know that my daily cycle of community involvement has very much migrated to spending as little time dealing with the “actual” R project as possible, and that pretty much every library I see of any value is already on GitHub.
The third, is that something actually changes - that we begin a process of opening up and modernising how we approach incoming people and incoming code, and that eventually we hit the point where every introduction to R is opened with “go to the mailing lists”. That’s the future I’d like to see: it’s also the one I consider most implausible.
Whatever we do, the status quo is not working for me. It’s not working for a lot of people I know. And in the absence of any change to it, that second outcome is going to happen: a community theoretically based around CRAN, but practically based around GitHub repositories with Travis integration: a distributed, Ripley-proof network of many different people doing many different things, with a far lower barrier to joining up. People using Twitter and StackOverflow for rapid responses to questions and feedback that has an honest-to-god social incentive for being nice about things. In a lot of ways, that is A Good Thing (tm): we want our processes to be robust and individual-proof. But it would be sad to see something that people have put so much effort into building, allowed to slowly wilt on the vine.