EasyVoice: Integrating Voice Synthesis with Skype

Paulo A. Condado, Fernando G. Lobo · 2007 · Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '07) · doi:10.1145/1296843.1296889

Summary

This paper presents EasyVoice, a system developed at the University of Algarve that integrates text-to-speech synthesis with Skype to enable people with voice disabilities to have real-time phone conversations. The core insight is that while TTS systems and VoIP applications had both existed for some time, nobody had combined them to create a communication tool for people who cannot speak. EasyVoice works by injecting synthesised speech directly into the Skype network stream via its API, rather than routing audio through computer speakers — this avoids the echo problem that would occur if the remote caller heard their own voice fed back through a microphone. The system runs on Microsoft Windows and works with any SAPI 5-compliant speech synthesiser. Recognising that many people with voice disabilities also have motor coordination difficulties (particularly those with cerebral palsy), EasyVoice includes several features to accelerate text input: an archive of recent messages for quick reuse during conversations, word completion using a dictionary seeded with the 8 most frequent words from the British National Corpus, a user-definable abbreviation system that expands shorthand into full phrases, and a virtual keyboard with scanning input for users who can only operate a single switch.

Key findings

EasyVoice successfully demonstrated the feasibility of combining existing accessible input technologies with VoIP to create a novel communication channel for people with voice disabilities. The system addressed the key technical challenge of echo prevention by injecting synthesised audio directly into the network stream rather than through speakers and microphone. Preliminary usability testing was conducted with three people with cerebral palsy who had voice disabilities, validating the basic approach. The input acceleration features — message archive, word completion, abbreviation expansion, and scanning virtual keyboard — addressed the reality that conversation requires reasonable typing speed, which is a significant barrier for users with motor impairments. The authors planned to expand testing to a larger population, incorporate voice synthesisers for other languages, design alternative interfaces for people with very severe motor disabilities, and port to macOS and Linux.

Relevance

This paper addresses an often-overlooked accessibility gap: real-time voice conversation for people who cannot speak. While much assistive technology work focuses on face-to-face AAC or text-based communication, EasyVoice recognised that phone and VoIP conversations are a distinct and important social context. The approach of combining existing technologies (TTS, VoIP APIs, scanning keyboards, word prediction) into a purpose-built tool rather than building from scratch is a pragmatic model for assistive technology development. For practitioners, the paper highlights that voice disabilities rarely occur in isolation — motor impairments typically co-occur, meaning any communication tool must also address input speed and alternative input methods. The work also demonstrates how platform APIs (Skype's developer API, Microsoft SAPI) can be leveraged to create accessibility solutions without modifying the underlying platforms themselves.

Tags: voice disabilities · text-to-speech · VoIP · augmentative communication · virtual keyboard · cerebral palsy · scanning input · word completion

Standards referenced: SAPI 5