Facial recognition and Hollywood values

In reading, writing and reviewing, I have to consume a lot of the literature on facial recognition. In particular, I end up reading a lot of the literature on bias mitigation around race, thanks to an upcoming project with Renata (and my perennial willingness to say yes to review requests).

The broad politics of and problems with debiasing are for another time, and it’s not like I haven’t shot my mouth off about them quite a bit. One particular thing I’ve been thinking about lately is what we might, formally and computationally, call “within-group bias”. The question of how a dataset, racially diverse or not, can still smuggle broader politics in through who makes up that diversity.

A canonical example of this is CelebA, a facial recognition dataset widely used in debiasing due to the presence of race/ethnicity labels on the data. It’s nt only used as a standalone test or training dataset but as a corrective to other datasets - as a way of “patching” the underrepresentation of particular groups within larger datasets Doing this depends on a lot of presumptions, including that classification process within each dataset is comparable. It also depends on the assumption that dataset entries are mappable to society more broadly - and there’s really no reason to think that would be the case. In fact, for both reasons of injustice and reasons of convenience, we’d really expect the datasets to be entirely unrepresentative.

As the name suggests, CelebA (originally “celebfaces”) is made up entirely of celebrities. The dataset contains “87,628 face images of 5,436 celebrities from the web”, and was generated basically by taking a load of celebrity names and googling them. It’s rather difficult to find much more detail than that (who thought of the names? Who did the searching? Where?) but the important thing is the “celebrity”. Film, music, politics - people who are famous.

Now, fame is hardly neutral. It’s accessible to some people more than others. It prioritises some values (particularly: aesthetic values) more than others. And by this I’m not just referring to racism in a categorical sense, but also the way that beauty and power are so-often aesthetically white-coded (and wealth-coded), and have been for a while. Those marginalised people who achieve fame are often those who, in other axes of power and identity, embody normalcy (or some sort of ideals). The result is, amongst other things, rampant colorism; those who are lighter-skinned, closer to white, are afforded an easier (although not easy) passage to success.

Correspondingly, the use of a dataset of celebrities - however evenly-distributed categorically - simply moves the location of the disconnect. Even in an ethnically balanced dataset (not that balance itself is an unambiguous concept) we would expect to see disproportionately-more light-skinned people of colour; far fewer dark-skinned people; far fewer people with (say) bad teeth, or visible disabilities. But despite this, this dataset is widely used, and the methodology behind it is commonplace: shout-out to the researchers who tried to build non-binary people into gender classification algorithms with the assumption we all look like Halsey.

Obviously I don’t think debiasing is an inherently good thing (I’d quite like to drop every facial recognition system into the sun). But the absence of any effort to address these problems, and the reliance on this kind of methodology, says two interesting things about efforts to confront bias. First: that many researchers in this space still have only a shallow understanding of the problem, and persist in treating it as a technical issue addressable through purely-technical means. Second: that despite claims from big organisations about how seriously they take bias, resourcing remains highly low. Nobody relies on google image searches to build a computer vision dataset unless they lack the resources and structural support (or: the inclination) to build things more robustly. Celebrity datasets are a problem, but they’re a sensitising problem: they draw attention to the more-serious gap between rhetoric and resources.