← All terms

AAC Corpus

Also known as: AAC Text Corpus, Augmentative Communication Corpus

A collection of text produced by or representative of Augmentative and Alternative Communication (AAC) device users, used for training and evaluating language models and word prediction systems. AAC corpora are notoriously difficult to assemble because AAC users produce text slowly (often 2-10 words per minute compared to 150+ for speech), the population is relatively small, privacy concerns limit data collection, and the text may have unique characteristics such as formulaic phrases, abbreviated vocabulary, and topic patterns that differ from general-purpose text. The scarcity of AAC-specific corpora forces researchers to approximate using general-purpose spoken and written text sources, though studies show that even small amounts of genuine AAC data combined with larger out-of-domain corpora can significantly improve word prediction performance.

Category: AAC · Natural Language Processing · Data · Research Methods

Related: Augmentative and Alternative Communication · Word Prediction · Language Model · Keystroke Savings

Sources