A Predictive Blissymbolic to English Translation System

Annalu Waller, Kris Jack · 2002 · Proceedings of the Fifth International ACM Conference on Assistive Technologies (Assets '02) · doi:10.1145/638249.638283

Summary

This paper addresses a fundamental limitation of Blissymbolics in electronic AAC: while Blissymbol users can compose messages by selecting symbols, converting those symbol sequences into grammatically correct English is non-trivial. Blissymbolics is a semantic graphic language used primarily by people with physical disabilities who cannot speak, where users point to symbols on a communication board and a speaking partner interprets the message. On electronic AAC devices, each Blissymbol has an associated English gloss (e.g., "boy", "to go", "home"), but simply concatenating glosses produces ungrammatical output like "boy to go home" instead of "The boy is going home." Previous approaches used grammar and parse trees with hand-built word dictionaries for each Bliss-word, but these were labour-intensive to create, language-dependent, and required that Bliss sentences be written using Bliss-words for every English word needed in the translation. The authors propose a statistical approach using word tri-gram probabilities derived from English source texts. The system builds a word association dictionary that stores tri-gram frequency information, then uses a Markov model to find the most likely English word sequences matching a Blissymbol sentence. The algorithm handles verb declinations, synonym substitution, and insertion of function words like articles and prepositions that have no Blissymbol equivalents.

Key findings

The system was evaluated on 20 Blissymbol sentences of one to five symbols using word association dictionaries built from source texts of 50,000, 1,000,000, and 10,000,000 words. Translation accuracy improved substantially with larger source texts: with 50,000 words, 17 of 20 sentences had zero-confidence translations (meaning required word sequences were not found in the source); with 1,000,000 words this dropped to 13; and with 10,000,000 words only 8 sentences had zero confidence. A total of 12 sentences were translated with confidence using the 10,000,000-word source, with all but one producing grammatically correct output. The dictionary-building process was fast (under 30 seconds regardless of source size), and individual sentence translation took 1-8 seconds. The algorithm could successfully transform glosses — changing verb forms ("walks" instead of "to walk") and inserting function words — but struggled when target word sequences did not appear in the source corpus. The system could not generalise from similar examples: for instance, it could not infer that "orange" could follow "colour" even if "red" and "blue" did. Translation accuracy was entirely dependent on whether specific word combinations existed in the training text.

Relevance

This paper tackles a problem that remains relevant in AAC: bridging the gap between symbol-based communication systems and grammatically correct spoken or written language output. The statistical approach pioneered here — using corpus-based language models rather than hand-crafted grammar rules — anticipated the direction that natural language processing would take over the following two decades. For contemporary AAC practitioners, the core challenge identified in this paper persists: symbol-based communicators deserve output that sounds natural and grammatically correct, not stilted gloss concatenations that mark the user as different. Modern large language models have dramatically improved the feasibility of this kind of translation, but the fundamental design question — how to convert a compact symbolic representation into fluent natural language — remains central to AAC system design. The paper also highlights the tension between system complexity and practical deployment in AAC, where users may have limited ability to correct errors and communication speed is already severely constrained.

Tags: AAC · Blissymbolics · natural language processing · symbol communication · predictive text · communication rate · augmentative and alternative communication · translation · language generation