A few weeks ago, in collaboration with Nikki Stevens and Jacqueline Wernimont, I published an expose on NIST’s use of non-consensually gathered, highly-sensitive facial recognition datasets in Slate. For the purposes of testing and benchmarking facial recognition systems, NIST - a branch of the U.S. government - is using immigrant visa photos, photos of child abuse, and myriad other sickening datasets, without the consent of those within them. Many are tremendously racially biased, much like the software facial recognition produces. When the media cycle and hubub was over, after a week of press calls and tweets and emails, when I had wrestled my inbox back under control, I quietly went home. And I cried.
Not all of their horrifying datasets are created equal. One particular set of files stands out; the innoculously named Multiple Encounter Dataset, or MEDS. The dataset contains photographs - mugshots, specifically - of people arrested multiple times by the police; people who are now dead. As with immigrant datasets, and as one expects from the U.S. carceral “justice” system, the dataset is incredibly biased. Black people make up just shy of 13% of the U.S. population: they make up of just shy of 50% of the photographs. Uniquely, however, the dataset is publicly available. It is uploaded on to NIST’s website, for retrieval by all and sundry - all the better to train machine learning systems with. And that meant I could download it, and look at the photographs they are using.
So: I did. I downloaded it, and I looked through it, with every photo hitting me like a fist. People screaming, people crying; people with bloodstained faces, people with gazes that communicated nothing, because the process was that familiar, or because they had nothing left. Roughly bandaged wounds from their arrests. Tears streaked through stubble (which could be mine) around lipglossed mouths (which could be mine). Some old and collecting social security, some so young they couldn’t even vote.
The one photo that hit me the hardest, though, wasn’t found in the dataset but in the documentation. It was a photo of a Black man in his 50s, arching away from the camera and screaming out of fear or fury or a mix of both. And it was in the documentation as a specimen example - one of what the researchers who put the dataset together called:
“[An] Example of Stasm Output Requiring Manual Editing”
That’s it. That was the only note. Where I saw a middle-aged, screaming Black man, Andrew P. Founds, Nick Orlans, Genevieve Whiddon (of the MITRE Corporation) and Craig Watson (of NIST) saw: an example of the sort of image requiring manual editing. Which they subsequently did, overlaying the screaming person’s face with linked red dots to tell the algorithm how to adjust the face to fit their expectations. And I can’t get it out of my head, and I can’t stop asking how exactly it is that these people got to the point where they saw a screaming person as an example output. That’s what I’ve been thinking about ever since. What is it that let researchers get to this mindset?
Science and the Gardener’s Vision
Writing about drone warfare, Neil C. Renic surfaces the “gardeners vision” of warfare, one in which the stronger party in a war loses “the basic recognition that those we fight are fellow moral agents…when this bond of shared humanity, victimhood and moral equality is sundered, an erosion of behavioural restraint often follows”. One in which “what remains is a gardener’s vision of war; a war of pest control”.
In other words, to generalise, when those subject to an action are separated from the actor - in discourse, in relation, in framing - it becomes easy to take a “gardener’s vision”; to treat them as less human, or not human at all, and to see the process of interacting with them as one of grooming, of control, of organisation.
This is not a new problem: large parts of the state are oriented around maintaining that separation. As a state action, the preparation of this dataset could be explained by that orientation towards separation - but this is a small team of named individuals, not an abstract monolith, and such an explanation does not explain the comfortable reuse of this data by scientists in public and private institutions.
Science, too, has always had an issue in this area - a problem, stemming from the positivist treatment of people as a means to an end (that end being “data”), with seeing research subjects as truly human. Think about what it says that the term is frequently “subjects”. On countless occasions over the last few centuries, and into the present, positivist views of science have assisted scientists in dehumanising the “subjects” of their research. This has been assisted by larger-scale societal biases that already treat certain populations as less human. The case of Henrietta Lacks or of the victims at Tuskegee shows how structural and individual racism play a role in dehumanisation - but the case of Dan Markingson shows scientists perfectly capable of justifying inhumanity in the absence of race as a factor.
This is ultimately a component of the scientific method; of the dominant view of science as a thing done by a set of people (scientists) upon data to give rise to theory. That this data stems from human bodies, lives and traces is an unfortunate corollary: a distraction from the purity of science. Those bodies and people are treated as disposable, consumable, used as a source of data and then often discarded, particularly under a capitalist regime, as feminist epistemologists have long noted. But there’s something more going on here; something more than just dismissing humanity on the way to data, abstraction by way of discarding the body. Something about facial recognition in particular, but data science and “big data” in general.
Machine Neoplatonism and its Ethical Consequences
People talk a lot about the implications data science has for how science is done and technology is developed, both skeptically (rejecting the idea it fundamentally changes science) and forebodingly (pointing to, amongst other things, the changes it makes to disciplines such as sociology as well as the traditional sciences). One particularly evocative and resonant piece is by Dan McQuillan, who argues that algorithmic, AI-based approaches to science embody what he calls “machinic neoplatonism”.
Neoplatonism was a philosophical approach to science premised on a belief in “a hidden mathematical order that is ontologically superior to the one available to our day-to-day senses”. In other words, there was one grand unified theory of reality, which could be approached without considering the “instruments” that collected our data (be they telescopes or bodies). McQuillan, when he sees machine learning, sees this same philosophy make itself manifest: the idea that algorithms, fed the right data, just spit out the one secret hidden truth of how the universe functions in a particular domain, generating not theory (because you can’t necessarily see what they’re doing, or why!) but the guidance we need to structure society and reality. A machine learning system which folds proteins tells you which proteins fit, which should be further explored; it does not tell you why, but you don’t need it to. A machine learning system which assesses facial matches tell you what faces match; it does not tell you why, but…you get the (hah) picture.
In other words: there’s no theory. Really, there’s no need for a human scientist at all. You just get decisions, outcomes, things you can use. There is yet another degree of abstraction away from the bodies and minds that data comes from, and those bodies and minds are even more disposable and interchangeable - particularly when “big data” is often premised on getting as many datapoints (or photos) as possible, fading any individual into just…numbers. A gardener’s view, again.
That, to me, is the central ethical difference that machine learning based approaches feature - and make no mistake, facial recognition is one such approach. Not only do we have the issues of traditional science to contend with (single, objective Truths, abstraction from the people who participate, voluntarily or not, in science), those are magnified. The truth does not have to be known by the scientist, or even knowable; that the algorithm “works” is proof it is there and a justification for the algorithm. The abstraction is not just through a one-way mirror or epidemiological records, but separated by time, distance, anonymity, scale, so that it is even harder for human scientists (if they are involved at all) to see people rather than data sources. In pursuit of the one big truth, the truth that only an algorithm can determine, anything is permissible. That truth is simply superior; qualitative concerns, subjective concerns do not even register as legitimate concerns.
Abstraction and AI ethics
McQuillan has some great suggestions for how science might avoid these traps; adapting feminist and postcolonial critiques and methods, explicitly building “antifascist AI”, which, you know, I empathise with. It’s good work and work people should build on, but this issue of machine learning creating particular space for heartlessness and inhumanity is really not the primary reason I cried. I do not expect benificence out of scientists under traditional science, let alone machine neoplatonism - but I do expect it out of ethicists.
When Nikki, Jacqueline and I published our piece, it made waves. A lot of people took it up and pushed it on - the same was true with the trans-led campaign against Google’s transphobic ethics advisor. But just as noticeable in both cases were the people who did not. The people who hesitated because well, don’t bigots have power we should draw on? The people who called themselves allies but would never deign to boost either problem in a way that might diminish their media standing. The people like Luciano Floridi and Joanna Bryson who took the time to specify that they, in fact, knew best about how to secure trans rights.
In all of these cases what we see is a centering of the self; it is the Official Ethicist who determines the One True Way to make change, a way that usually benefits them. It is the Official Ethicist who determines which voices count, who demands we work with the system rather than against it. We see a delegitimisation of marginalised populations and our own knowledge - that same delegitimisation that aids in scientific dehumanisation and exploitation, that postcolonial and feminist scientific critiques critique. We see, in other words, precisely the same problems that science has always faced, this time in the form of its self-appointed monitors.
I have to believe that these people had simply not read James’s transphobic statements, or understood them. I have to believe that these people had simply not downloaded the datasets, looked through the chest-hammering images and amoral descriptions produced by the people they suggest we work with. I have to assume there is naivety at play, here. Because the alternative is that for all their airy critiques of existing practice, many AI ethicists have little problem with it: their issue is that it does not involve them in a position that befits their self-importance.
AI ethics cannot just inherit the problems of science and its patriarchal, conservative model: we cannot leave them unquestioned while striving to regulate technology because we must, above all things, be prefigurative in our politics. This can’t be an abstract theory game, or a reputational game; it is not about your stature it is about our lives. And for you to genuinely change things and make things better, you must first center our lives, center our sources of knowledge - do precisely what Dan suggests of science, because all of the evidence I have suggests we are subject to just the same risks and causing just the same harm. We cannot have a gardener’s view of ethics.