(Voice) website creation and access using phones

Arun Kumar, Sheetal K. Agarwal, Priyanka Manwani, Ketki Dhanesha · 2010 · Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A) · doi:10.1145/1805986.1806026

Summary

This challenge paper from IBM Research India presents Spoken Web, a platform that enables people to create and access "voice websites" (VoiceSites) entirely through phone calls using speech and DTMF (touch-tone) input. The system takes a fundamentally different approach to web accessibility: rather than adapting existing visual web content for excluded populations (what the authors call the "adapter approach"), it enables non-literate, visually impaired, and economically disadvantaged people to create their own content natively in voice, in their local language. The Spoken Web Application Framework provides three interface levels: an XML editor for site template designers, a voice-based phone interface for end users creating their own sites, and a caller interface for people browsing existing VoiceSites. Site creation is reduced to a single phone call where the user customizes a template by recording voice content, after which the system generates and deploys the site with its own phone number serving as its URL. Two types of VoiceSites are supported: individual sites (analogous to personal homepages) and community portal sites where content is contributed by an entire community.

Key findings

The paper argues that most accessibility solutions are fundamentally limited because they are afterthoughts — adapting visual web content to non-visual modalities results in non-natural interaction patterns. Screen readers, while effective, force users to adapt to an unnatural mode of interaction with content designed for visual consumption. Spoken Web inverts this by letting excluded users create content in a form natural to them — voice — from the start. The system supports multiple local languages through template design that minimizes speech recognition requirements, with most content and prompts recorded in the user's language of choice. The four-step creation process (template creation, user customization, site generation, deployment with phone number provisioning) abstracts technical complexity so that end users only participate in the customization step. The community portal model, demonstrated with a working system accessible via phone in India, enables collective content creation — important in developing regions where shared community resources are more practical than individual services.

Relevance

This paper offers a thought-provoking challenge to the mainstream accessibility paradigm of adapting existing web content. The distinction between "adapter" accessibility (making visual content work for non-visual users) and "native" accessibility (creating content in the user's natural modality from the start) remains a valuable conceptual framework. While Spoken Web as a platform did not achieve mass adoption, the underlying ideas anticipated the voice-first computing paradigm that later emerged with smart speakers and voice assistants. For accessibility practitioners in developing regions, the paper highlights that true digital inclusion may require entirely new content creation paradigms rather than retrofitting existing web standards. The emphasis on enabling content creation, not just consumption, addresses a persistent gap in accessibility work — most efforts focus on making existing content perceivable, while far less attention goes to empowering people with disabilities or low literacy to become content creators themselves.

Tags: voice interface · developing countries · digital divide · low literacy · user-generated content · mobile accessibility · multilingual access

Standards referenced: VoiceXML