When life gives you lemons, make science

Ever since my writeup on leaving R, my blog has been getting a lot more traffic than usual. Usually this would be fine except it's also resulting in many more comments, and the topic means that a lot of those comments are blathering about whiny SJW babies. Or death threats. 28 of those at the last count.

But, sure, it's the social justice people who are oversensitive and fly off the handle.

My immediate response to the storm of attention was to just apply the kitten setting - and after I got bored of that, the puppy, quokka, rail and guinea pig settings.

Then I remembered that my day job is as a HCI researcher who knows kind of a lot about analysing web data and every commenter had given me the website they came from, the IP address they used, and a comment I could easily hand-code for shittiness.

Accordingly I present my newest feature:

In which Oliver Keyes Sciences the Shit Out of the Arseholes on his Blog

Defining "arseholes"

This isn't a formal study so my definition of arsehole can be basically whatever I want it to be. I settled for any comment which exhibited one of the following traits:

  1. Accused me of lying about everything that had happened to get some benefit that apparently comes alongside threats, harassment and weird emails. Nobody has explained to me what this benefit is but I eagerly await my cheque in the mail from the nefarious SJW cabal apparently causing me to make things up;
  2. Contained threats, goading-towards-suicides, or generally obscene and targeted harassment;
  3. Used terms like "SJW" or "pissbaby" or "whinging" or really anything else that indicated the author had, at best, a tenuous grasp on how the world works;
  4. Was premised on the idea that I was "oversensitive" or "overreacting" which is pretty rich coming from people whose idea of acceptability includes insulting people they've never met on somebody else's website.

So I took this definition and hand-coded the comments and grabbed the data. We ended up with 107 users, of whom a mere 40 weren't arseholes, producing 183 comments in total. Then I worked out their referring site and geolocated their IP address, et voila.

Looking at referers

Every time you make a web request (with some exceptions we won't get into here) browsers send along to the new page or server the place you're coming from. If you click from here to this Wikipedia link, the Wikipedia request logs will show you came from my website.

Similarly, if you come from another site to my website, most of the time I can work out where that other site is. So I took the referers for people leaving comments. Then I turned them into human-readable text, stripped out those referers with fewer than 5 distinct users, and the results look a little something like:

A chart of the probability of commenters being unpleasant, depending on which site they came from

Unsurprisingly, Vox Day's readers are arseholes. Not just some of them, but all of them: every one of them who managed to painfully peck at their keyboard and hit save was a pillock of the highest calibre, contributing absolutely nothing of value to to the conversation.

I was surprised by how low the proportion of arseholes was for Twitter and Facebook. Knowing, as I do, a lot of the sharing that went on, I suspect it's because it was largely done by people who sort of get the whole "not being totally ignorant of anyone who doesn't look like you" thing and, correspondingly, read by networks of people who like those people.

Reddit isn't entirely awful, but honestly when the nicest thing you can say about a site is "the users are not as bad as people who hang on to a racist misogynistic creep's every word"...you're not doing that well.

Wikipediocracy, amusingly, has been discussing this, because sure I don't edit much and this is nothing to do with my job or Wikipedia at all but you can't spell "obsessive creeps" without "obsessive". Their users actually made it into the dataset, but there were too few of them to include in the graph.

Wikipedians will be unsurprised to learn that 100% of Wikipediocracy-sourced commentators were utter shits of only the highest quality. While the population was too small to make it into the dataset, I suspect that this is one tiny sample that is actually generalisable.

If we look at comments instead of the people making them:

A chart of the probability of comments being unpleasant, depending on which site the commenters came from

Twitter comments were shockingly likely to be useful, for which I think we can blame Gavin Simpson (bless you Gavin). Facebook jumped up a bit because apparently scumbags are just more enthusiastic than non-scumbags and so post more. Everyone else looks pretty much the same.

Looking at geography

As well as referers we have IP addresses - and by extension unless people are using proxies we have where they live. To avoid being too creepy (and too specific for useful analysis) I'm just looking at countries:

A chart of the probability of commenters being unpleasant, depending on which country they came from

Most commenters were either super-distributed or super-clustered, so only three countries qualified to get on - all of them, shock horror, primarily English-speaking.

In a continuation of the "water remains wet" theme we started with "wow, Vox Day's readers really are horrible", it turns out Canadians are actually nicer than Americans. It's alright, America, you're nicer than Brits - albeit not by much.

Things look much the same if we look at comments:

A chart of the probability of comments being unpleasant, depending on which country the commenter came from

You know the phrase "one bad apple spoils the whole bunch"? Sorry, Sweden. You have A Shithead, and he's a really prolific one.

Once again, Canadians are super-nice and everyone else sucks, although there's a much bigger difference here. Again, I blame Gavin.

Wrapping up

In summary, we've shown:

  • Vox Day readers are uniformly shitheads;
  • Reddit readers are only mostly shitheads;
  • There are, in fact, some mean Canadians;
  • If you're going to harass people for science bear in mind that they may science your harassment.

Happy browsing to all. And remember, kids: nobody likes total strangers offering their very important opinion about how you are totally wrong. So, please: don't be that stranger.