Voice Cloning

Also known as: Voice Synthesis Cloning, Personalized Text-to-Speech

The use of machine-learning models to synthesise a target speaker's voice from a short reference recording, enabling text-to-speech output that sounds like that specific person. For accessibility, voice cloning has transformative potential: people whose voices are at risk of loss (ALS, laryngectomy, progressive neurological conditions) can 'bank' their voices for later use in AAC devices; AAC users in general can replace generic synthetic voices with ones that feel personally appropriate; gender-affirming voice training can use cloned voices as targets. Voice cloning also raises serious ethical concerns around consent, identity, and fraud, and the technology's equity is uneven — models trained on fluent typical speech reproduce speech-diversity biases in the synthesised output.

Category: Artificial Intelligence · Speech Technology · AAC · Assistive Technology · Generative AI

Related: Text-to-speech · Synthetic Voice · Speech Language Model · Augmentative and alternative communication

Sources

https://arxiv.org/abs/2505.02707