EPG: Speech Access to Program Guides for People with Disabilities

Michael Johnston, Amanda J. Stent · 2010 · Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2010) · doi:10.1145/1878803.1878859

Summary

This demo paper from AT&T Labs Research presents an Electronic Program Guide (EPG) prototype that uses speech input and text-to-speech output to make television listing navigation accessible to people with visual disabilities or limited hand mobility. The authors identify that while digital cable, IPTV, and satellite television have expanded entertainment options dramatically, the interfaces for browsing listings remain visually oriented grid-based guides that are difficult or impossible for many disabled users to navigate. Existing research on advanced listing interfaces (including some with speech input) had not focused on accessibility. EPG runs on Windows Media Center on a Microsoft set-top box or any Windows computer, with input via a remote control with a "Speak" button or an iPhone application. The system supports rich natural language search across listing metadata including titles, actors, directors, genres, and schedules — for example, users can say "movies with Angelina Jolie," "sports" (genre search), or "new episodes of reality TV on Thursday" (multi-field search). Speech output from EPG stays synchronized with the visual GUI in Media Center, enabling shared viewing experiences where sighted and visually impaired family members can use the same interface.

Key findings

The prototype demonstrates two key interaction stages — search and browsing — both accessible through speech. For search, users press a push-to-talk button and speak natural language queries that are processed by automatic speech recognition. Results are displayed in Media Center with a spoken summary (e.g., "I found twelve results matching the query Golden Girls"), followed by listings read one by one (title and show time) with automatic pacing that users can override with browsing commands. For browsing, users can say "next," "back," "details," "save," "play," "start," or "stop" to navigate and control listings using just one button and voice commands. The system augments rather than replaces the existing visual interface, meaning it requires no separate assistive hardware and works on commodity software and hardware. This design choice lowers barriers to entry and — critically — allows the interface to be shared with sighted family members without requiring a separate accessible setup.

Relevance

This paper addresses an accessibility gap in everyday entertainment technology that remains partially unresolved today. While modern smart TVs and streaming services have improved voice search capabilities, comprehensive speech-based browsing of listings and results is still inconsistent across platforms. The key design principles demonstrated are highly relevant: augmenting existing interfaces rather than creating separate accessible versions; supporting natural language queries rather than rigid command syntax; maintaining visual-audio synchronization so the interface works for mixed-ability households; and running on commodity hardware to reduce cost barriers. For accessibility practitioners, the shared-interface approach is particularly noteworthy — accessibility features that benefit disabled users without changing the experience for others are more likely to be adopted and maintained than parallel systems that serve only a subset of users.

Tags: speech recognition · voice interface · television accessibility · visual impairment · motor impairment · assistive technology · text-to-speech