Data Leaks: a Case Study

By Os Keyes

Way back in March, I was idly browsing for tallitot and saw a little popup hit my screen that looked a little something like:

yopify popup!

My first, second and thirty-fifth reactions were “huh”, but after recovering I dug into where it came from and what it is. Turns out that it’s an eCommerce extension called Yopify, marketed as Yo and plugging into Shopify, BigCommerce and lots of other common platforms. The idea is that by showing incoming customers evidence of previous, recent purchases, it replicates the bustling urgency of an in-person store.

After digging a bit I found a flaw in the way the data streaming is handled, resulting in a leak of potentially-identifiable information about customers to anyone who visits the site. The bug has been reported (and is now fixed) - while we haven’t seen any evidence it was exploited prior to the fix, that doesn’t mean it wasn’t (IDing the use of personal information and sourcing that personal information to a particular breach is a hard problem).

The bug has been reported (and is now fixed): while we haven’t seen any evidence that it was exploited prior to the fix, it’s extremely hard to

but that doesn’t mean it wasn’t exploited while it was live.

The actual bug and what it looks like, well, see the company blog. What I’d like to talk about here is the user implications and the way otherwise-innocent seeming data can be identifying (and potentially aggregated for malicious purposes).

Data misuse: a case study

Yopify displays first name, last initial, and general location. Yopify transmitted first name, entire last name, and city-level location. At first glance this might look like it falls into the category of “weird but probably okay”. Who cares if John Smith in LA is buying chocolate?

Answer: none of those things (John Smith, LA, chocolate) are the worst-case scenario. Really all of them are the best-case. Chocolate is not identifying; LA and John Smith, with the population sizes of each, aren’t either. But if you shift all of those frames just a bit - otherwise-identifying purchases, small population cities and rare names - you get sensitive information about individual people that could easily be misused.

An obvious, generic example: pregnancy or baby supplies. If someone is shopping for those, you can reasonably say they are (or were) pregnant. If that someone hasn’t told anyone about their pregnancy, either because it’s their own damn business or because they’re in an unsafe environment, they’re now at risk.

Similarly, shop choices or individual items can be strongly tied to marginalised identities - binders aren’t often bought by cis people, and with the exception of situations like this, tallitot tend to be purchased by Jewish people. Suddenly we’ve gone from “John Smith, living in Los Angeles, is buying chocolate” to “John Smith, living in rural Texas, is a closeted trans person”.

These aren’t hypothetical examples; as said, I discovered the bug precisely because I was browsing a site largely used by Jewish people. Thankfully, the bug is now known and mitiated, but we don’t know whether it was exploited before I found it.

Designing for evil

The larger problem is the same as that with the Tinder breach - people should already be thinking about these things. When you design a system that’s used by or has an impact on humans, you need to factor in the near-certainty that someone might twist it around to hurt people. ‘Designing for Evil’ is not just a great essay, it’s also a thing that engineers, designers and researchers should be doing every time a product manager comes to them with an idea.

There are some shifts in the wind, at least in an academic context - Santa Clara’s wonderful Software Engineering Ethics Casebook shows that people are beginning to take this seriously. But in the meantime there’s a lot of code, written by a lot of people who don’t think about these problems, out there in the wild with your personal data woven into it. Despite the small impact of this bug and its correction, it serves as a nice demonstration of the implications even small API design choices can have on customer safety.