Whose bodies matter?

By Os Keyes

Also published in Real Life Magazine

Over the past few years, my work as a PhD student has increasingly focused on how classification and data-oriented approaches to the world can cause harm, particularly to queer and/or trans lives. From biases caused by underrepresentation to the fundamental difficulty of representing queerness in machine learning — which depends on fixed categories and is rather allergic to fluidity — there are many problems and no clear solutions.

Pointing out the harm in these approaches often means recognizing their potential for direct and material harms. When facial recognition systems expose trans women of color to police violence, for example, it’s pretty easy to explain why they’re a problem. But beyond these kinds of examples of obvious material consequences are, as Anna Lauren Hoffmann notes, discursive and epistemic harms: when and how data-driven decision making (in theory or in practice) affects how we come to know things, reduces who has the ability to “know” in a meaningful way, and narrows what can be known at all.

Such concepts are as woolly as they are weighty, so it’s often difficult to find illustrative case studies. But an almost perfect example of how the design and framing of algorithmic systems can reshape knowledge and knowers came across my Twitter feed in the form of “We Used AI to Identify the Sex of 340,000 People Harmed by Medical Devices”, an account of how the International Committee of Investigative Journalists teamed with Christopher Ré, a computer science professor at Stanford, to examine the records of an FDA database of device-failure reports to classify patients as male or female or “unknown,” which indicates that “the narrative in the incident report didn’t have enough markers that would allow us to predict the sex of the patient.” The FDA, citing patient anonymity, is unwilling to release information about the sex of patients in the database, which is important to have in order to shed light on the rates at which different sorts of people experience medical device or implant failure. There are longstanding gender biases in where medical-research funding is directed and how medical problems are treated.

So rather than go through literally millions of medical reports by hand to gather this information, the ICIJ worked with Ré to develop a machine-learning system that would infer sex from the failure records algorithmically. After humans went through a small batch of records and classified them as male, female, or unknown based on specific cues (pronouns, references to sex organs, and so on), the machine-learning system used these to extract the probable sex of any record’s associated patient with 96% accuracy.

At first glance, this intervention seems like a beneficial workaround of bureaucratic obstacles. The National Center for Health Research, the ICIJ itself, and a whole host of prominent public health figures have hailed it as a promising interdisciplinary approach toward better understanding of health biases. Rather than being stymied by the database or the FDA, now anyone with a Python interpreter can investigate this particular demographic bias in device-failure rates for themselves, replicating the analysis and keeping it up to date as new data is released. The project is touted as pretty emancipatory: We can identify harms that the FDA doesn’t, and we can keep them honest!

I don’t dispute that this is an important vein of research: Gendered biases in medicine are as old as medicine. But this project is not just a material intervention in health-care knowledge, clarifying some important and otherwise obvious data, were it not obscured by privacy laws. It is making a discursive intervention: that is, it not only produces particular knowledge but also a particular frame of what can count as knowledge and who can provide it. In this example, that plays out as framing sex and gender as interchangeable and binary concepts, and expertise in questions of medical harm as a matter of journalists coding medical records rather than a matter of the lived experience of patients.

By making the project’s codebase public, the team behind it is suggesting that its methodology should be replicated and built on. But that methodology, and the narrative around it, contains some worrying exclusions. To begin, let’s look at the research framing: The attempt to classify people into one of three categories based on textual cues and associated patterns may seem clean, simple, scientific, and easily replicable, but the actual methods used, however, are nowhere near as clear or objective as the rhetoric around “artificial intelligence” usually portrays them. In this case, the process depends on coding not only embodied attributes (a “vagina” is coded female and a “prostate” is coded male) but also gendered attributes: the use of “he” in the medical records indicates “male”; reference to a patient having a husband indicates “female,” and so on.

The interweaving of such terms — made more confusing by the same inconsistent use of both sex and gender in the writeup — suggests that rather than looking for sex or gender, this project constructs them as one and the same. These methods and classification choices assume — and construct — a very particular view of the world and of medical patients. A proper medical patient is heterosexual (no men with husbands) and cisgender, since vaginas signify “female.” If breast is “female,” then gynecomastia doesn’t exist, and if “she” refers to a person’s sex then trans women don’t exist.

Beyond the problematic assumptions about what particular terms and patterns tell us about a person’s sex and gender, as well as the categorical confusion of sex and gender themselves, there is the problem of the study’s limited taxonomy. “Sex” is nowhere near as simple as “male,” “female,” or “unknown.” What about intersex people, many of whom don’t even learn they’re intersex for decades because of how assumptions about the simplicity and binary nature of sex permeate society and medical practice?

Adopting the category of “biological sex” doesn’t avoid the issue of social construction and complexity. Sex, no less than gender, is itself constructed and maintained socially, as Thomas Laqueur, Anne Fausto-Sterling and many other academics have argued – and as many intersex people have lived. The evaluation of “vagina” as inherently female exemplifies the problem; the research team proceeds as if intersex people not only don’t exist but don’t frequently have them. As doctors insistently “normalize” intersex people with a scalpel, so too do researchers “normalize” them with a model.

This doesn’t just exclude queer bodies and lives from this analysis of failure rates; crucially, it makes it more difficult for other researchers to do any different. That is, by framing “male” and “female” as the only options when it comes to sex and framing gender and sexuality as interchangeable, this work not only ignores people who fall outside that frame but builds a conception of the world in which such people cannot exist. One cannot take this code and investigate lives outside the narrow conceptions of its designers – certainly not without a drastic rewrite – and when it’s framed as “solving” for sex detection, one is less likely to think to. If sex and gender are identical and binary, if heterosexuality is the only game in town, then asking an algorithm to identify whether defect rates disproportionately impact queer bodies and lives makes as much sense as asking it what the smell of blue sounds like.

Of course, discursive framings (and research) go further than just “what can be known.” Who can know it? How can things be known? In the case of the failed-device research, we see framings that similarly undermine the emancipatory rhetoric of the writeup. Who can know things? In this case, academic researchers, obviously, but researchers tempered by journalists: “Journalists were crucial ‘domain experts’ who fact-checked the algorithm’s results in real-time so computer scientists could tweak it to improve accuracy.”

This choice about what constitutes expertise is interesting and telling. The research team might have considered the domain expertise of medical sociologists, who could have told the team that doctors’ use of pronouns in medical notes is about as reliable as a marzipan blowtorch. But more vitally, given the rhetoric of emancipation, is that there’s no role for the patient here. While the effort is portrayed as better than a purely technical approach, more grounded and closer to reality, it is still treated as something best undertaken by a certain type of expert and professional. The patient is absent — except, of course, through their data. As data, the patient is vital; the patient serves as a resource, an object to be classified and assessed and refined on the heteronormative and essentialist premise the research team is deploying. That patients may not have explicitly consented to be in the database is secondary; that the research team may be wrong, may be denying the patient’s experiences and identities, is irrelevant to the individuals doing the analysis, to whom the patient is a resource for deployment in the name of “patients” overall.

If access to data is a powerful tool for illuminating injustices, what happens when we continue to portray professionals as the only viable “domain experts” for participating in its interpretation and analysis? As a researcher within the academy, I am constantly and painfully aware that, while holding me up as an authority on queer lives is marginally better than holding up some cis professor, I can ultimately speak only for myself. If we seek to understand medical biases solely through an expert view, when those not afforded expertise are already marginalized within medicine, it’s hard to see how we’ve substantially changed the power relations that cause these injustices. In fact, by appearing (but only appearing) to tackle them, we may make things worse.

Yes, the failed-device research illuminates a matter on which we lacked data. But the frame of knowing it reinforces has the effect of silencing marginalized voices while failing to be accountable or effectively transfer power to the patients who suffer the after-effects of medical bias and malpractice. That’s what we mean by discursive harms: articulating reality in a way that moves it forward but at the cost of silencing people who don’t fit the frame.

Data can be a powerful tool to challenge normative and dominant ways of thinking and doing things, but for that to be the case, it has to actually challenge. In particular it must challenge how problems are framed and who has the power to tell their own story. When researchers undertake data-driven work, even “for good,” they must do more than identify questions we don’t have answers to. We also have to attend to the questions we can’t ask, the speakers who can’t be heard, and how the frame we set in our research undermines (or reinforces) those boundaries or barriers.