AAC with Automated Vocabulary from Photographs: Insights from School and Speech-Language Therapy Settings

Mauricio Fontana de Vargas, Jiamin Dai, Karyn Moffatt · 2022 · Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '22) · doi:10.1145/3517428.3544805

Summary

This paper presents Click AAC, a prototype mobile application that automatically generates situation-specific communication boards from photographs using computer vision and natural language processing. Traditional symbol-based AAC devices organize vocabulary hierarchically by linguistic categories (e.g., "food" then "dessert"), imposing substantial meta-linguistic and memory demands on users with complex communication needs. Click AAC instead uses a Visual Scene Display layout centred on a photograph, combining three vocabulary generation methods: descriptive (scene labels and captions), related/expanded (semantically associated words drawn from the SWOW mental lexicon), and narrative (storytelling phrases matched from the VIST visual storytelling dataset). The app generates up to 20 verbs, 20 nouns, 15 adjectives, and 6 phrases, colour-coded by part of speech using the Modified Fitzgerald Key system, with symbols sourced from ARASAAC. The researchers conducted a user study with 20 AAC professionals — 14 speech-language pathologists who used Click AAC with their clients in routine therapy sessions and school activities over at least four weeks, and 6 consultants/evaluators who assessed the app independently. Participants worked across diverse settings (special education schools, private therapy, public schools) with learners ranging from non-verbal children with autism and cerebral palsy to young adults with intellectual disabilities.

Key findings

The study identified three major themes. First, Click AAC served as a flexible, complementary tool across a wide range of learner profiles — from emergent communicators using body language and gestures to context-dependent communicators building sentence construction skills. Professionals used it alongside existing robust AAC devices rather than as a replacement. Second, the immediacy of automatically generated vocabulary delivered four key benefits: it reduced the workload of selecting and programming situation-specific vocabulary for professionals, which led to increased opportunities for aided language stimulation; it facilitated symbolic understanding and sentence construction by anchoring symbols to visible real-world referents; it supported communication about personal interests that were difficult to access through traditional AAC navigation; and it positively impacted learner motivation and confidence, with one SLP reporting a minimally verbal student became verbal during a session using the app. Third, while AI-generated vocabulary contained biases and errors (misidentifying 2D images, gender stereotyping from training data, age-inappropriate language), professionals found that filtering and correcting the AI was far less work than programming vocabulary from scratch, and the AI-generated vocabulary often served as a "prime" — sparking ideas for words professionals would not have thought to include themselves.

Relevance

This research demonstrates a practical application of AI and computer vision to a longstanding accessibility challenge: making AAC vocabulary contextually relevant without overwhelming conversation partners with programming burden. The finding that automatically generated vocabulary reduced workload enough to increase the frequency of aided language stimulation — recommended in at least 70% of interactions but rarely achieved — has significant implications for AAC practice. For accessibility practitioners, the study highlights how AI can meaningfully augment (rather than replace) human expertise in communication support, with professionals cooperating with the system to achieve outcomes neither could reach alone. The work also points to important design considerations: the need for core vocabulary that persists across all boards, consistent spatial arrangement for motor planning, support for multiple access methods (switches, scanning), and culturally appropriate vocabulary generation. The identified limitations around scene recognition with 2D images, cartoon characters, and culturally specific objects inform future training dataset development for AAC-specific AI tools.

Tags: augmentative and alternative communication · AAC · autism · computer vision · just-in-time vocabulary · visual scene display · aided language stimulation · complex communication needs · artificial intelligence · symbol-based communication