MySpeechWeb: software to facilitate the construction and deployment of speech applications on the web

Richard A. Frost, Ali Karaki, David A. Dufour, Josh Greig, Rahmatullah Hafiz, Yue Shi, Shawn Daichendt, Shahriar Chandon, Justin Barolak, Randy J. Fortier · 2008 · Proceedings of the 10th International ACM SIGACCESS Conference on Computers and Accessibility (Assets '08) · doi:10.1145/1414471.1414522

Summary

This short paper presents MySpeechWeb, an open-source platform for creating and deploying voice-in/voice-out speech applications on the web. The authors identify a significant gap: despite the potential of speech interfaces to improve web access for people with visual, cognitive, and motor disabilities, very few voice-based applications exist online. The paper attributes this to three problems with existing approaches: screen readers that provide voice access to web pages do not support conversational question/answer interactions and are expensive; telephone call-centre architectures are not suitable for public creation of speech applications; and VXML-based hyperlinked speech pages require specialized markup knowledge. MySpeechWeb addresses these limitations through the Local Speech Recognition and Remote Processing (LRRP) architecture, where speech applications are stored as three components on a web server — an executable program, a grammar file defining input language, and a speech markup script. End users access applications through any browser with speech capabilities, using the freely available Opera browser and IBM speech plug-in. The platform enables non-expert developers to build and deploy question/answer speech applications in minutes through web-accessible forms, while also supporting expert developers in creating natural-language database query processors as executable specifications.

Key findings

The platform includes a diverse suite of exemplar applications demonstrating its versatility: a question/answer system for geography, a natural-language interface to a solar system database, a natural-language calculator and units converter, a "read a book" application, an English/French word translator, a time and weather service for Canadian cities, an arithmetic skills tester, and multiple-choice test applications supporting spoken dialogue through user-dependent sessions. Two multimodal applications were also developed — one seamlessly integrating voice and visual access to conventional web pages with varying combinations of visual and non-visual features, and a game using images and voice. The platform allows non-programmers to create simple speech applications through an online form by simply entering questions and answers, clicking submit, and immediately having a deployed voice application. More complex natural-language applications can be built by developers with only introductory-level programming skills — the authors note that an arithmetic calculator was built in three days by a first-year computer science student.

Relevance

This work tackles an important and still largely unresolved challenge in web accessibility: making speech-based interaction a viable alternative to visual browsing, going beyond the limitations of screen readers that simply read existing visual content aloud. The open-source, low-barrier approach is particularly noteworthy — by enabling non-experts to create speech applications through simple web forms, the platform could democratize the development of accessible voice interfaces. The LRRP architecture's ability to work through standard browsers without proprietary software (beyond a free plug-in) was forward-thinking for 2008. While the specific technologies mentioned (Opera browser, IBM speech plug-in, X+V markup) have since been superseded by modern Web Speech APIs and voice assistants, the core vision of a public-domain "SpeechWeb" where anyone can create and link voice applications remains relevant. The paper also highlights an important distinction between voice access to visual content (what screen readers do) and genuine conversational voice applications designed for speech-first interaction — a distinction that continues to matter in accessible web design.

Tags: speech interfaces · voice interaction · web accessibility · open source · assistive technology · screen readers · natural language processing

Standards referenced: VXML · X+V