What is text to speech?
The purpose of text to speech (TTS) is to take written text and reproduce it as a voice that is both intelligible and natural, that is, human-like. However, there is a perceived lack of diversity in the voices used in TTS.
Why does diversity of voices matter?
Narrowness in the range of voices used, in TTS and beyond, can lead to dominance in terms of language, dialect, accent, tone etc. which, at the very least, can lead to bias in the way words are pronounced but could also promote mispronunciation. More importantly however, the voices used demonstrate ‘who’ gets to speak. Failing to hear their voice represented could mean an audience fails to see themselves reflected in the industry in which they work, the technology they are given to use, or even in technology itself.
But even as the range of voices available to TTS increases it is important to pay attention to how they are deployed. Since an audience subconsciously make links between qualities of a voice, such as accent or perceived gender, and human qualities, such as trustworthiness, voices may be chosen specifically as a means of persuasion. However, choices in the voices used may also demonstrate underlying bias. Certain voices used repeatedly or only in certain roles may act to reinforce ideas about the people those voices represent and their place in society.
Increasing the diversity of voices in TTS could help to promote diversity and inclusion, but it needs to be part of a wider examination of how those voices are used.
What is behind the lack of diversity of voices in TTS?
Voices for TTS can be created either by voice cloning, which involves training a machine learning model on a large dataset of voices or using voice actors. The lack of diversity in voices is in part a reflection of the lack of diversity in the available datasets but the continued failure to address this can be seen a lack of understanding that representation matters on many levels.
How can voices in TTS be made more diverse?
Firstly, the lack of diversity and bias in voices used, needs to be recognised as worthy of attention as well as investment. If this need is recognised, then it would require building more diverse datasets and/or employing a more diverse set of voice actors, which could take considerable resource.
However, if this development it is left to those who do not themselves embrace diversity and inclusion any work done may still retain their cultural biases, plus if left to the commercial sector, this development might rely on a clear link being made to increased profits rather than EDI. Therefore, it requires a wider and more critical attention from the technology sector to be paid to the decisions made around the choices in how more diverse voices are developed and used in TTS.
Bibliography
Adapt, 2021. Gender Bias in AI: Why Voice Assistants Are Female. https://www.adaptworldwide.com/insights/2021/gender-bias-in-ai-why-voice-assistants-are-female
Danielescu, A., Horowit-Hendler, S.A., Pabst, A., Stewart, K.M., Gallo, E.M. and Aylett, M.P., 2023. Creating inclusive voices for the 21st century: A non-binary text-to-speech for conversational assistants. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1-17.
Goodman, K.L. and Mayhorn, C.B., 2023. It’s not what you say but how you say it: Examining the influence of perceived voice assistant gender and pitch on trust and reliance. Applied Ergonomics, 106, p.103864.
Karakaş, A., 2017. English voices in ‘Text-to-speech tools’: representation of English users and their varieties from a World Englishes perspective. Advances in Language and Literary Studies, 8(5), pp.108-119.
Kaur, N., Singh, P., 2023. Conventional and contemporary approaches used in text to speech synthesis: a review. Artif Intell Rev 56, pp.5837-5880.
Markopoulos, K., Maniati, G., Vamvoukakis, G., Ellinas, N., Vardaxoglou, G., Kakoulidis, P., Oh, J., Jho, G., Hwang, I., Chalamandaris, A., Tsiakoulis, P., Raptis, S. , 2023. Generating Multilingual Gender-Ambiguous Text-to-Speech Voices. Proc. INTERSPEECH 2023, pp.621-625, doi: 10.21437/Interspeech.2023-1467.
Murf.AI, 2022. Eliminating the Lack of Diversity in Text to Speech
Readspeaker, 2022. How Text-to-Speech Technology Can Help Improve Diversity and Inclusion in Business. https://www.readspeaker.com/blog/how-text-to-speech-technology-can-help-improve-diversity-inclusion-business/