Any views expressed within media held on this service are those of the contributors, should not be taken as approved or endorsed by the University, and do not necessarily reflect the views of the University in respect of any particular issue.
Can you hear ME? Diversity of voices in text to speech

Can you hear ME? Diversity of voices in text to speech

Reading Time: 3 minutes

What is text to speech? 

The purpose of text to speech (TTS) is to take written text and reproduce it as a voice that is both intelligible and natural, that is, human-like. However, there is a perceived lack of diversity in the voices used in TTS.  

Why does diversity of voices matter? 

Narrowness in the range of voices used, in TTS and beyond, can lead to dominance in terms of language, dialect, accent, tone etc. which, at the very least, can lead to bias in the way words are pronounced but could also promote mispronunciation. More importantly however, the voices used demonstrate ‘who’ gets to speak. Failing to hear their voice represented could mean an audience fails to see themselves reflected in the industry in which they work, the technology they are given to use, or even in technology itself.  

But even as the range of voices available to TTS increases it is important to pay attention to how they are deployed. Since an audience subconsciously make links between qualities of a voice, such as accent or perceived gender, and human qualities, such as trustworthiness, voices may be chosen specifically as a means of persuasion. However, choices in the voices used may also demonstrate underlying bias. Certain voices used repeatedly or only in certain roles may act to reinforce ideas about the people those voices represent and their place in society.  

Increasing the diversity of voices in TTS could help to promote diversity and inclusion, but it needs to be part of a wider examination of how those voices are used. 

What is behind the lack of diversity of voices in TTS? 

Voices for TTS can be created either by voice cloning, which involves training a machine learning model on a large dataset of voices or using voice actors. The lack of diversity in voices is in part a reflection of the lack of diversity in the available datasets but the continued failure to address this can be seen a lack of understanding that representation matters on many levels. 

How can voices in TTS be made more diverse? 

Firstly, the lack of diversity and bias in voices used, needs to be recognised as worthy of attention as well as investment. If this need is recognised, then it would require building more diverse datasets and/or employing a more diverse set of voice actors, which could take considerable resource.  

However, if this development it is left to those who do not themselves embrace diversity and inclusion any work done may still retain their cultural biases, plus if left to the commercial sector, this development might rely on a clear link being made to increased profits rather than EDI. Therefore, it requires a wider and more critical attention from the technology sector to be paid to the decisions made around the choices in how more diverse voices are developed and used in TTS. 

Bibliography  

Adapt, 2021. Gender Bias in AI: Why Voice Assistants Are Female. https://www.adaptworldwide.com/insights/2021/gender-bias-in-ai-why-voice-assistants-are-female 

Danielescu, A., Horowit-Hendler, S.A., Pabst, A., Stewart, K.M., Gallo, E.M. and Aylett, M.P., 2023. Creating inclusive voices for the 21st century: A non-binary text-to-speech for conversational assistants. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1-17.  

Goodman, K.L. and Mayhorn, C.B., 2023. It’s not what you say but how you say it: Examining the influence of perceived voice assistant gender and pitch on trust and reliance. Applied Ergonomics, 106, p.103864. 

Karakaş, A., 2017. English voices in ‘Text-to-speech tools’: representation of English users and their varieties from a World Englishes perspective. Advances in Language and Literary Studies, 8(5), pp.108-119. 

Kaur, N., Singh, P., 2023. Conventional and contemporary approaches used in text to speech synthesis: a review. Artif Intell Rev 56, pp.5837-5880.  

Markopoulos, K., Maniati, G., Vamvoukakis, G., Ellinas, N., Vardaxoglou, G., Kakoulidis, P., Oh, J., Jho, G., Hwang, I., Chalamandaris, A., Tsiakoulis, P., Raptis, S. , 2023. Generating Multilingual Gender-Ambiguous Text-to-Speech Voices. Proc. INTERSPEECH 2023, pp.621-625, doi: 10.21437/Interspeech.2023-1467. 

Murf.AI, 2022. Eliminating the Lack of Diversity in Text to Speech 

Readspeaker, 2022. How Text-to-Speech Technology Can Help Improve Diversity and Inclusion in Business. https://www.readspeaker.com/blog/how-text-to-speech-technology-can-help-improve-diversity-inclusion-business/  

Share

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

css.php

Report this page

To report inappropriate content on this page, please use the form below. Upon receiving your report, we will be in touch as per the Take Down Policy of the service.

Please note that personal data collected through this form is used and stored for the purposes of processing this report and communication with you.

If you are unable to report a concern about content via this form please contact the Service Owner.

Please enter an email address you wish to be contacted on. Please describe the unacceptable content in sufficient detail to allow us to locate it, and why you consider it to be unacceptable.
By submitting this report, you accept that it is accurate and that fraudulent or nuisance complaints may result in action by the University.

  Cancel