Gender classification and bias mitigation: a post-publication review

There’s a new paper out, “Gender Classification and Bias Mitigation in Facial Images”, which tries to make non-binary-inclusive gender recognition software. To me, that’s not what the work does; what it does is instead legitimise fundamentally unjust systems, through a highly dangerous (and inconsistent) model of gender. So I figured I’d do a sort of post-publication review to summarise my concerns.

The researchers behind this paper have clearly put a lot of work in, and have the best of intentions, I’m sure. But the project is fundamentally flawed in both theory and execution, in ways that make me both doubt the degree to which they’ve thought their work through and engaged with the communities they see fit to computationally classify, and whether the end outcome of this is a net positive at all.

The morality of facial recognition

Over the last year or so we’ve seen a lot of concerns about facial recognition - some of which have been about systems’ implementation and some of which have been about whether we want to build it at all. As well as concerns about bias, there are concerns about the implications of designing systems that are fundamentally about surveillance and securitisation - processes that are deeply, structurally racist and transphobic, even if the system’s abstract design is “neutral”.

Now: strictly speaking, gender recognition is not the same as facial recognition; it’s trying to map a photo of someone to a gender category, not to a specific identity or another photo. But some of the same concerns apply; it’s about surveillance, it’s about control, and queer people’s experiences of either are (to be understated in the most British of ways) not good. As far as I can tell, the authors don’t engage with abolitionist critiques at all, which is a pity, because if they did they would (at the very least) have written a very different paper. They don’t address the fact that, as Zoé eloquently puts it, “the tech industry is enthusiastically supporting the state’s mandate to surveil. Any facilitation in improving its accuracy and efficiency in that support is, at best, a pyrrhic victory”. And enthusiastic support is precisely what they’re providing - they note that law enforcement is a primary user and customer in the first sentence of the paper.

And, well, they don’t address the fact that, even setting aside the direct surveillance aspect - you can’t really have trans-inclusive gender recognition. Gender recognition is about rigidly determining how to treat and classify someone, based on what they look like. In such a system, there’s no space for queerness. And as I wrote two years ago “a trans-inclusive system for non-consensually defining someone’s gender is a contradiction in terms”. It’s an argument they should’ve addressed, and definitely knew about, given that they cited my paper.

In the absence of that - what does this kind of project do? Answer: just as Zoé says, it facilitates injustice. It whitewashes (or in this case, queerwashes?) surveillance infrastructure, and by taking a position of “the problem is inaccuracy, not the premise” legitimises more widespread data collection and tracking as the solution. Which is actually how the authors end - noting that their model is still inaccurate for non-binary people, they suggest “the importance of assembling more inclusive databases for facial recognition”. Speaking on behalf of myself, but also, I’m guessing, on behalf of a hell of a lot of other people captured by “inclusive”: I Would Rather You Did Not.

Biased representations in debiasing representations

Even if the consequences of this kind of work weren’t a massive red flag, though, the actual execution has its own limitations. The authors gathered two datasets; one of queer people overall (more on that later), and one specifically of non-binary people. The construction of this database of non-binary faces started with grabbing photos of non-binary people from Wikipedia.

Of course: not every non-binary person is on Wikipedia. Most of the non-binary people on Wikipedia (indeed, most of the people full stop) are celebrities. And celebrity isn’t an equal opportunity status - for the most part, no disabled people, people of colour and/or poor people need apply. Which means that even the non-binary people you do get are going to have a pretty narrow range of appearances and presentations. I’d love to look like Cara Delevingne, but that doesn’t change the fact that I don’t. Representing non-binary people through “people with Wikipedia articles” is about as robust as representing women through “women with braids”; strictly speaking it’s more accurate than nothing, but not by much.

(As an aside; while searching for that example I found Wikimedia Commons has the category “nude or partially nude women in libraries” which: way to validate the stereotypes)

Gender itself

So, okay, it might be a bad idea and it probably doesnt work - but could it? The answer, ultimately, is no; you can’t model people’s gender consistently and exclusively from how they present, and the authors sometimes sort of acknowledge this. But much more frequently, there’s a weird (and sometimes biological) essentialism to how non-binary people in particular and gender and sexuality in general are treated.

The authors acknowledge that non-binary people frequently use different, more specific labels and identities - but assume that this inherently reflects differences in presentation or treatment, which isn’t always the case. Whether someone is “genderqueer” or “non-binary” could indicate different senses of gender or desired perceptions or treatment, or could literally just indicate that they articulated their gender in different contexts, with different language available. Inversely, the assumption is that non-binary people overall have something in common in presentation or trajectory - but this, too, isn’t the case. Some non-binary people are androgynous; some are not; what “androgynous” means varies in different contexts, for different people; what options are available to people differ from person to person and trajectory to trajectory, and all of this (yet again) problematises the presentation-to-identity link that this software tries to make. And as Amy Davis beautifully writes, talking about legal recognition, the reduction to and clumping of all these different paths and possibilities to a single X marker presupposes that non-binary is a way of articulating not just a difference, but the same difference, whoever is deploying it as a term. That isn’t the case.

And speaking of difference and presentation - the reason for the assessment of queer faces overall is Wang & Kozinski’s infamous “gay face” study, which the authors very politely describe as “controversial” and rely on as the exclusive evidence of the claim that gay and lesbian people just look different from the heterosexuals. Now, this isn’t actually the claim that Wang & Kozinski made explicitly, but the claims they made didn’t come from a neutral place. Instead, they were building on some hyper creepy evolutionary biology and sexology spitballing over the last few decades which does things like measure people’s ring finger length to test the hypothesis that queerness comes from deviant testosterone exposure in the womb; literally, that gay men were mutated to be deviantly feminine, and lesbians deviantly masculine. It’s been taken down time after time (I recommend “Reinventing the Sexes” and “An American Obsession” as really good books on the science, politics and history there), and it’s rather concerning to see it glibly resurrected in a work that aims to reduce discrimination.

Alternate futures

To summarise, then; this can’t work in theory. It definitely doesn’t work in practice. And most worryingly, attempts to make it work enable and legitimise a whole host of violences - violences at a time when we’re seeing a massive backlash against trans existences. In the face of that, the authors - however well intentioned - help a lot less than they mean to.

I don’t know the authors’ backgrounds, and I don’t want to assume. There are a lot of hints in the paper that the authors do not come at this heavily rooted in the internal nuances and complexities of trans and queer community. Misgendering is defined as “structural violence”; queer and gender non-conforming are seen as subcategories of “non-binary”; at one point the authors confuse cisgender people for transgender. And I’m sympathetic to people growing and learning through writing; I learned a lot through writing my paper, which is partially why I wrote it. I’ve learned a lot since that has added nuance and depth and uncertainty to how I think about these questions, in a lot of different ways. We are always becoming ourselves.

But ultimately, we have a responsibility - as designers and anthropologists learn and relearn and never quite learn enough - to ensure that our attempts to help, might. That doesn’t just mean trying to make things inclusive: it means taking the time to meaningfully engage with and center the people who are Being Helped. As Tagonist put it both more eloquently and obscenely:

“trans people have already been studied. We’ve been interviewed, sampled, tested, cross-referenced, experimented upon, medicated, shocked, examined, and dissected post-mortem…You’ve listened to our ears. You’ve listened to our fucking ears! But you’ve never listened to our voices and you need to do that now”.

And I suspect that if the authors had done that more substantively, they would’ve learned not just different language to use or ways to think about gender, but that we neither need nor want inclusive facial recognition. What we want is better healthcare, better education, better jobs. What we want is an end to policing, an end to rigid conceptions of gender, and this study is premised on the neutrality, or benificence, of both.