Is Google tanking Wikipedia's traffic?

Short answer: probably not.
Longer answer: prooooobably not.
Longest answer:

So, some company called SimilarWeb published numbers that got picked up by BusinessInsider and boiled down to (quoting SimilarWeb here):

Wikipedia lost an insane amount of traffic in the past 3 months. And by insane I mean that the free encyclopedia site lost more than 250 million desktop visits in just 3 months!

Their theory? Google. Ignoring, for a second, the statistical worth of 'insane', let's dig into that number and see if we can validate it against Wikimedia sources. Is Google responsible for some kind of traffic loss?

Tracking traffic

Unlike SimilarWeb, Wikimedia's sites, very deliberately, don't measure 'visits' or 'sessions' because both require some form of consistent unique ID, and we care about our users' privacy. But if we were losing 'visits' (lord knows how those are even quantified) we'd also expect to be losing pageviews. That means we can use pageviews with a google referer as a proxy, of sorts, for incoming Google-sourced traffic.

Again, for privacy reasons, we don't actually keep unsampled access logs >3 months old, which means historical comparisons are potentially a pain here. But we do keep sampled logs, at a 1:1000 rate, and if they're sampled well presumably sampled*1000 == unsampled, or close enough. So let's test that. I grabbed the sampled logs from 1 January to 1 August 2015 and filtered them down to 'pageviews', and pageviews calculated with the unsampled logs from May to August. If we look at May-August, sampled versus unsampled, and see pretty much the same behaviour, the sampled logs can be trusted. If not, they can't.

sampled versus unsampled

It's actually visually difficult to see there are two different lines here, so I'm going to call that "can be trusted". You'll note there's a drop in June-August; that's pretty standard (people go on holiday) although it's odd to see it this severe. One probable source is the HTTPS switchover, which messed with pretty much everything.

Google referers

So let's grab the sampled logs from 2015 and count the proportion of pageviews, each day, with Google referers. If it's going down there's a very clear indication Google is doing something weird - if not it gets more complicated.

google referers

Okay, so Wikipedia's 'insane' loss of traffic from Google comes to... the proportion of traffic from Google going up. Huh. On the caveats front, the actual amount of traffic from Google is going down...because all traffic is going down, because it's the summer, and we see a dip due to seasonal patterns. Google's traffic, as shown here, decreases less than traffic from other sources (it's at 95% of the norm instead of 89%).

Nul referers

There is actually one caveat: HTTPS. One of the weird things about the HTTPS schema is that to minimise the information users reveal, transferring from a HTTPS site to a non-HTTPS site involves voiding the referer. In other words, if some proportion of Google traffic is appearing on our sites with a nul referer due to HTTPS constraints, that population wouldn't appear in the graph above - and so could be exhibiting vastly different behaviour. But we can identify requests with nul referers.

nul referers

And that traffic is going down and could (plausibly) come from Google - but given that Wikimedia sites are on HTTPS it really shouldn't be. And, of course, a nul referer is a nul referer: it could come from a Google-to-us HTTPS transition, or an anyone-else-to-us HTTPS transition, or just genuinely be people opening Wikipedia in new tabs or from bookmarks - things that genuinely don't have a referer. It's impossible to tell.

Nothing to see here

So there really doesn't seem to be any conclusive or even strongly suggestive evidence that those claims hold water. Google-sourced traffic is going up, not down, and nul-referer traffic shouldn't be from Google since our HTTPS switchover. One theory I've heard bandied around for why this claim got made (and is believed) include that our strip-referers-when-going-to-HTTP-sites policy means we're a source of 'dark traffic' to third-parties, and thus hard to quantify through existing analytics means.

Traffic has definitely been going down, but it always goes down at this time of year (I really need to do a post on seasonality in reader behaviour some time). And if it is going down more than usual, it doesn't look like Google is the source.