Reframing “Bias” in AI Research
As a PhD student in the School of Informatics, I’ve been researching gender bias in language and language technologies. Time and again, I’m surprised by how simple people try to make biased language. As Abeba Birhane stresses in her interview on the podcast The Good Robot, not everything can be conceptualized as a straightforward problem with a straightforward solution. Our cultures are dynamic and complex. Our languages evolve slowly over decades, but also change rapidly based on our relationships with the people we are speaking to or writing for. Moreover, language does not exist in a vacuum.
In the branch of Linguistics called Critical Discourse Analysis, language is studied in its context of use, considering how it legitimizes and maintains power, and how it incites social change. [1] Nevertheless, AI research approaches bias as a problem to be fixed, as if bias is an error that can be removed from a dataset, or a mistake that a model can be taught to avoid. In reality, however, bias is an ongoing challenge.
Bias changes with time, place, and culture. Bias will always be with us, because there is no universal, neutral, or objective perspective. We all are shaped by our own unique viewpoint, by our own experiences of the world.
We need to reframe research questions about bias in data and technology. Rather than focusing on removing bias, we need to better understand bias. We need to study how bias comes through in language and other types of data. We need to consider the risks bias poses and the harms bias may cause. Researchers such as Abeba Birhane and Kate Crawford are among a small but growing group of people in the computational research community trying to do this. There is a wealth of research in the Humanities and Social Sciences that the computational research community can look to; people have been studying and theorizing about language and bias for much longer than the existence of AI as a field. The School of Informatics has been an exciting place for me to research bias in language technologies because I’ve had the opportunity to talk about new ways to approach bias and ethical AI research with fellow PhD students like Nina Markl and Bhargavi Ganesh.
To reframe questions about bias in data and technology, we need a culture shift in the AI field. Currently, efficiency, convenience, and quantity drive dataset curation and model creation. Being the first to publish something is highly valued, so gathering data and building models quickly are done at the expense of critical approaches to dataset and model development. To gather data quickly, language and images are taken from the Internet without consent from the people who own them or are represented in them. Datasets are evaluated based on how large they are rather than how representative they are.
Instead, we need accuracy over efficiency, balance and representativeness over convenience, and quality over quantity. Then we will realize that bias comes not from the model or the data, but from us, people and society. Then we can focus on changing the power structures that cause harmful biases.
References
[1] For more on Critical Discourse Analysis, see Analysing Discourse: Textual Analysis for Social Research (Fairclough, 2003) and Uses of Heritage (Smith, 2006).