Finding the Blank Spots in Big Data

imi Onuoha is an artist who works mostly with algorithms, data sets, and digital systems, but her best known work may be a file cabinet. White, metal, and unassuming, it’s the kind that used to line the carpeted halls of office buildings before the advent of Google Drive and iCloud. Sliding open Onuoha’s cabinet reveals a column of familiar brownish-green folders, hooked at the sides and marked on top by plastic tabs. The handwritten labels include: “Publicly available gun trace data,” “Trans people killed or injured in instances of hate crime,” “Muslim mosques/communities surveilled by the FBI/CIA.” But when you open any one of the folders, there’s nothing inside.

This is Onuoha’s Library of Missing Datasets, a physical catalog of digital absence. She created the piece in 2016 (and a second version in 2018), after realizing that even with all of the esoteric, eccentric datasets you can find online — every word in the Broadway musical Hamilton, a yearly estimate of hotdogs eaten by Americans on the 4th of July — there’s a lot of urgent, necessary data that’s suspiciously missing. “In spaces that are oversaturated with data, there are blank spots where there’s nothing collected at all,” she says in a video for Data & Society. “When you look into them, you start to realize that they almost universally intersect with the interests of the most vulnerable.”

How often do we think of data as missing? Data is everywhere — it’s used to decide what products to stock in stores, to determine which diseases we’re most at risk for, to train AI models to think more like humans. It’s collected by our governments and used to make civic decisions. It’s mined by major tech companies to tailor our online experiences and sell to advertisers. As our data becomes an increasingly valuable commodity — usually profiting others, sometimes at our own expense — to not be “seen” or counted might seem like a good thing. But when data is used at such an enormous scale, gaps in the data take on an outsized importance, leading to erasure, reinforcing bias, and, ultimately, creating a distorted view of humanity. As Tea Uglow, director of Google’s Creative Lab, has said in reference to the exclusion of queer and transgender communities, “If the data does not exist, you do not exist.”

‘In spaces that are oversaturated with data, there are blank spots where there’s nothing collected at all.’

This is something that artists and designers working in the digital realm understand better than most, and a growing number of them are working on projects that bring in the nuance, ethical outlook, and humanist approach necessary to take on the problem of data bias. This group includes artists like Onuoha that have the vision to seek out and highlight these absences (and offer a blueprint for others), as well as those like artist and software engineer Omayeli Arenyeka, who are working on projects that collect necessary data. It also includes artist and researcher Caroline Sinders and the collective Feminist Internet, who are working on building AI models, chatbots, and systems that take into account data bias and exclusion in every step of their processes. Others are academics like Catherine D’Ignazio and Lauren Klein, whose book Data Feminism considers how a feminist approach to data science would curb widespread bias. Still others are activists, like María Salguero, who saw there was a lack of comprehensive data on gender-based killings in Mexico and decided to collect it herself.


written by
Bruce Robertson
Jhonson and Jhonson