Voice Creator: Giving Customized Voice to the Voiceless for Online Communication
Hyeon Jeong Byeon · 2021 · Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS) · doi:10.1145/3441852.3476476
Summary
This extended abstract presents Voice Creator, a web-based prototype that allows people with speech or hearing impairments to create customized synthetic voices for online communication. The work is motivated by research showing that voice-based communication increases intimacy and bonding in online settings compared to text-only interaction, yet people with speech or hearing impairments are effectively excluded from these benefits. Current speech synthesizers are noted as insufficient for intimate communication due to their lack of emotional expression, low adaptability to conversational context, and poor authenticity compared to natural speech. The author first conducted a crowdsourced survey on Amazon Mechanical Turk to build a labeled audio dataset, using 2,176 recordings from the Speech Accent Archive. Three crowd workers rated each recording across eight voice attributes including breathiness, hoarseness, pitch, smoothness, speed, variation, volume, and overall preference. This data was then used to build the Voice Creator website, where users can customize a voice by selecting gender, age group, and adjusting four quality parameters (breathiness, smoothness, hoarseness, and variation) on sliders scaled 1 to 3.
Key findings
From 6,528 crowdsourced responses rating voice samples, the top 27 most-preferred recordings (rated above 4.5 out of 5) showed clear demographic patterns: 19 were female voices and 8 were male, suggesting a preference for female voices. Speakers in their 20s and 30s were most preferred (12 and 7 respectively). Whether the speaker was a native or non-native English speaker did not significantly affect preference. Among the voice quality attributes, precision and volume showed the highest standard deviation across raters (1.02 and 0.97), indicating these are the most subjectively perceived qualities. The remaining attributes (breathiness, hoarseness, pitch, speed, variation) had standard deviations between 0.92 and 0.95, suggesting reasonable agreement among annotators and that crowdsourced voice attribute labeling is feasible. The Voice Creator prototype was deployed but had not yet been tested with the target user population of people with speech or hearing impairments at the time of publication.
Relevance
This work addresses an underexplored gap in accessibility: the exclusion of people with speech and hearing impairments from voice-based online communication, which is increasingly important in remote work and social contexts. The concept of giving users control over the qualities of their synthetic voice — rather than being limited to a handful of generic text-to-speech voices — connects to broader themes of identity and self-expression in AAC. However, as an extended abstract, this is early-stage work: the prototype had not been evaluated with its intended users, and the voice customization relies on matching to pre-recorded samples rather than generating novel voices. Future work planned includes user studies with people who have speech or hearing disorders and adding intelligibility ratings to the dataset. The approach could inform more personalized AAC and text-to-speech tools.
Tags: speech synthesis · voice customization · speech impairment · hearing impairment · computer-mediated communication · text-to-speech · AAC